Inlined defers in Go

Go’s defer keyword allows us to schedule a function to run before a function returns. Multiple functions can be deferred from a function. defer is often used to cleanup resources, finish function-scoped tasks, and similar. Deferring functions are great for maintability. By deferring, for example, we reduce the risk of forgetting to close the file in the rest of the program:

func main() {
    f, err := os.Open("hello.txt")
    if err != nil {
        log.Fatal(err)
    }
    defer f.Close()

    // The rest of the program...
}

Deferring helps us by delaying the execution of the Close method while allowing us to type it when we have the right context. This is how deferred functions also help the readability of the source code.

How defer works

Defer handles multiple functions by stacking them hence running them in LIFO order. The more deferred functions you have, the larger the stack will be.

func main() {
	for i := 0; i < 5; i++ {
		defer fmt.Printf("%v ", i)
	}
}

The above program will output “4 3 2 1 0 ” because the last deferred function will be the first one to be executed.

When a function is deferred, the variables accessed by it are stored as its arguments. For each deferred function, compiler generates a runtime.deferproc call at the call site and call into runtime.deferreturn at the return point of the function.

0: func run() {
1:    defer foo()
2:    defer bar()
3:
4:    fmt.Println("hello")
5: }

The compiler will generate code similar to below for the program above:

runtime.deferproc(foo) // generated for line 1
runtime.deferproc(bar) // generated for line 2

// Other code...

runtime.deferreturn() // generated for line 5

Defer performance

Defer used to require two expensive runtime calls explained above. This made deferring functions to be significantly more expensive than non-deferred functions. For example, consider to lock and unlock a sync.Mutex deferred and not-deferred.

var mu sync.Mutex
mu.Lock()

defer mu.Unlock()

The program above will work 1.7x slower than the non-deferred version. Even though it only takes ~25-30 nanoseconds to lock and unlock a mutex by deferring, it makes a difference in large scale use or in cases where a function call need to be completed under 10 nanoseconds.

BenchmarkMutexNotDeferred-8   	125341258	         9.55 ns/op	       0 B/op	       0 allocs/op
BenchmarkMutexDeferred-8      	45980846	        26.6 ns/op	       0 B/op	       0 allocs/op

This overhead is why Go developers started to avoid defers in certain cases to improve performance. Unfortunately this situation make Go developers compromise readability.

Inlining deferred functions

In the last few versions of Go, there have been gradual improvements to defer’s performance. But with Go 1.14, some common cases will see a highly significant performance improvement. The compiler will generate code to inline some of the deferred functions at return points. With this improvement, calling into some deferred functions will be only as expensive as making a regular function call.

0: func run() {
1:    defer foo()
2:    defer bar()
3:
4:    fmt.Println("hello")
5: }

With the new improvements, above code will generate:

// Other code...

bar() // generated for line 5
foo() // generated for line 5

It is possible to do this improvement only in static cases. For example, in a loop where the execution is determined by the input size dynamically, the compiler doesn’t have the chance to generate code to inline all the deferred functions. But in simple cases (e.g. deferring at the top of the function or in conditional blocks if they are not in loops), it is possible to inline the deferred functions. With 1.14, easy cases will be inlined and runtime coordination will be only required if the compiler cannot generate code.

I already tried the Go 1.14beta with the mutex locking/unlocking example above. Deferred and non-deferred versions perform very similarly now:

BenchmarkMutexNotDeferred-8   	123710856	         9.64 ns/op	       0 B/op	       0 allocs/op
BenchmarkMutexDeferred-8      	104815354	        11.5 ns/op	       0 B/op	       0 allocs/op

Go 1.14 is a good time to reevaluate deferring if you avoided defers for performance gain. If you are looking for more about this improvement, see the Low-cost defers through inline code proposal and GoTime’s recent episode on defer with Dan Scales.


Disclaimer: This article is not peer-reviewed but thanks to Dan Scales for answering my questions while I was investigating this improvement.