More than thirteen years have passed since Herb Sutter declared that the free lunch is over and concurrency is upon us, and yet it's hard to claim that most mainstream languages have made a strong shift towards concurrent modes of programming. We have to admit that concurrency is just hard, and the struggles of some of the world's leading programming languages bear witness to this challenge.

Unfortunately, most languages didn't yet move past the threads vs. asynchronous dichotomy. You either use threads, or a single-threaded event loop decorated with a bunch of bells and whistles to make code more palatable. Mixing threads with event loops is possible, but so complicated that few programmers can afford the mental burden for their applications.

Threads aren't a bad thing in languages that have good library support for them, and their scalability is much better than it used to be a decade ago, but for very high levels of concurrency (~100,000 threads and above) they are still inadequate. On the other hand, event-driven programming models are usually single-threaded and don't make good use of the underlying HW. More offensively, they significantly complicate the programming model. I've enjoyed Bob Nystrom's What Color is Your Function for explaining how annoying the model of "non-blocking only here, please" is. The core idea is that in the asynchronous model we have to mentally note the blocking nature of every function, and this affects where we can call it from.

Python took the plunge with asyncio, which is so complicated that many Python luminaries admit they don't understand it, and of course it suffers from the "function color" problem as well, where any blocking call can ruin your day. C++ seems to be going in a similar direction with the coroutines proposal for C++20, but C++ has much less ability to hide magic from users than Python, so I predict it will end up with a glorious soup of templates that even fewer will be able to understand. The fundamental issue here is that both Python and C++ try to solve this problem on a library level, when it really needs a language runtime solution.

What Go does right

As you've probably guessed from this article's title, this brings us to Go. I'm happy to go on record claiming that Go is the mainstream language that gets this really right. And it does so by relying on two key principles in its core design:

  1. Seamless light-weight preemptive concurrency across cores
  2. CSP and sharing by communicating

These two principles are implemented very well in Go, and in unison make concurrent programming in it the best experience, by far, compared to other popular programming languages today. The main reason for this is that they are both implemented in the language runtime, rather than being delegated to libraries.

You can think of goroutines as threads, it's a fairly good mental model. They are truly cheap threads - because the Go runtime implements launching them and switching between them without deferring to the OS kernel. In a recent post I've measured goroutine switching time to be ~170 ns on my machine, 10x faster than thread switching time.

But it's not just the switching time; goroutines also have small stacks that can grow at run-time (something thread stacks cannot do), which is also carefully tuned to be able to run millions of goroutines simultaneously.

There's no magic here; consider this claim - if threads in C++ or JS or Python were extremely lightweight and fast, we wouldn't need async models. Well, this is the case with Go. As Bob Nystrom says in his post - Go has eliminated the distinction between synchronous and asynchronous code.

That's not all, however. The second principle is critical too. The main objections to threads are not just about performance, there's also correctness issues and complexity. Programming with threads is hard - it's hard to synchronize access to data structures without causing deadlocks; it's hard to reason about multiple threads accessing the same data, it's hard to choose the right locking granularity, etc.

And this is where Go's sharing by communicating principle comes in. In idiomatic Go programs you won't see a lot of mutexes, condition variables and critical areas protecting shared data. In fact, you probably won't see much locking at all. This is because Go encourages programmers to use channels instead, and channels are built into the language, with awesome features like select, and so on. Proper use of channels removes the need for more explicit locking, is easier to write correctly, tune for performance, and debug.

Moreover, building these capabilities into the runtime, Go can implement great tools like the race detector, which make concurrency bugs easier to smoke out. It all just fits together so nicely! Obviously many challenges of concurrent programming remain in Go - these are the essential complexities of the problem that no language is likely to remove; Go does a great job at removing the incidental complexities, though.

For these reasons, I believe Ryan Dahl - the creator of Node.js - is absolutely right when he says:

[...] if you’re building a server, I can’t imagine using anything other than Go. [...] I think Node is not the best system to build a massive server web. I would use Go for that. And honestly, that’s the reason why I left Node. It was the realization that: oh, actually, this is not the best server-side system ever.

Different languages are good for different things, which is why programmers should have several sufficiently different languages in their arsenal. If concurrency is central to your application, Go is the language to use.