Go: Work-Stealing in Go Scheduler
ℹ️ This article is based on Go 1.13.
Creating goroutines in Go is easy and fast. However, Go can run them, at most, one per core at the same time and need a way to park the other goroutines and make sure the load is well balanced across the processors.
Goroutines queues
Go manages the awaiting goroutines at two levels thanks to the local queues and the global one. The local queues are attached to each processor, while the global queue is unique and available through all processors:
Each local queue has a maximum capacity of 256, and any new incoming goroutine is pushed to the global queue after that. Here is an example with a program that spawns thousands of goroutines:
func main() {
var wg sync.WaitGroup
for i := 0;i < 2000 ;i++ {
wg.Add(1)
go func() {
a := 0
for i := 0; i < 1e6; i++ {
a += 1
}
wg.Done()
}()
}
wg.Wait()
}
Here are the traces of the scheduler done with two processors:
The traces show the number of goroutines in the global queue with runqueue
and the local queues (respectively P0
and P1
) in the bracket [3 256]
. When the local queue is full and reaches 256 awaiting goroutines, the next ones will stack in the global queue as we can see with the runqueue
attribute growing.
Goroutines do not go in the global queue only when the local queue is full; it is also pushed in it when Go inject a list of goroutines to the scheduler, e.g. from the network poller or goroutines asleep during the garbage collection.
Here is the diagram of the previous example:
However, we could wonder why the local queue of P0
was not empty in the previous…