Making the Switch from Node.js to Golang

Written by: Alexandra Grant

July 25, 2016

10 min read

"This article was originally published on Medium by Alexandra Grant, and with their permission, we are sharing it here for Codeship readers."

I’ve dabbled in JavaScript since college, made a few web pages here and there and while JS was always an enjoyable break from C or Java, I regarded it as a fairly limited language, imbued with the special purpose of serving up animations and pretty little things to make users go “ooh” and “aah”. It was the first language I taught anyone who wanted to learn how to code because it was simple enough to pick up and would quickly deliver tangible results to the developer. Smash it together with some HTML and CSS and you have a web page. Beginner programmers love that stuff.

Then something happened two years ago. At that time, I was in a researchy position working mostly on server-side code and app prototypes for Android. It wasn’t long before Node.js popped up on my radar. Backend JavaScript? Who would take that seriously? At best, it seemed like a new attempt to make server-side development easier at the cost of performance, scalability, etc. Maybe it’s just my ingrained developer skepticism, but there’s always been that alarm that goes off in my brain when I read about something being fast and easy and production-level.

Then came the research, the testimonials, the tutorials, the side-projects and 6 months later I realized I had been doing nothing but Node since I first read about it. It was just too easy, especially since I was in the business of prototyping new ideas every couple months. But Node wasn’t just for prototypes and pet projects. Even big boy companies like Netflix had parts of their stack running Node. Suddenly, the world was full of nails and I had found my hammer.

Fast forward another couple months and I’m at my current job as a backend developer for Digg. When I joined, back in April of 2015, the stack at Digg was primarily Python with the exception of two services written in, wait for it, Node. I was even more thrilled to be assigned the task of reworking one of the services which had been causing issues in our pipeline.

Our troublesome Node service had a fairly straightforward purpose. Digg uses Amazon S3 for storage which is peachy, except S3 has no support for batch GET operations. Rather than putting all the onus on our Python web server to request up to 100+ keys at a time from S3, the decision was made to take advantage of Node’s easy async code patterns and great concurrency handling. And so Octo, the S3 content fetching service, was born.

Node Octo performed well except for when it didn’t. Once a day it needed to handle a traffic spike where the requests per minute jump from 50 to 200+. Also keep in mind that for each request, Octo typically fetches somewhere between 10–100 keys from S3. That’s potentially 20,000 S3 GETs a minute. The logs showed that our service slowed down substantially during these spikes, but the trouble was it didn’t always recover. As such, we were stuck bouncing our EC2 instances every couple weeks after Octo would seize up and fall flat on its face.

The requests to the service also pass along a strict timeout value. After the clock hits X number of milliseconds since receiving the request, Octo is supposed to return to the client whatever it has successfully fetched from S3 and move on. However, even with a max timeout of 1200ms, in Octo’s worst moments we had request handling times spiking up to 10 seconds.

The code was heavily asynchronous and we were caching S3 key values aggressively. Octo was also running across 2 medium EC2 instances which we bumped up to 4.

I reworked the code three times, digging deeper than ever into Node optimizations, gotchas, and tricks for squeezing every last bit of performance out of it. I reviewed benchmarks for popular Node webserver frameworks, like Express or Hapi, vs. Node’s built-in HTTP module. I removed any third party modules that, while nice to have, slowed down code execution. The result was three, one-off iterations all suffering from the same issue. No matter how hard I tried, I couldn’t get Octo to timeout properly and I couldn’t reduce the slow down during request spikes.

A theory eventually emerged and it had to do with the way Node’s event loop works. If you don’t know about the event loop, here’s some insight from Node Source:

Node’s “event loop” is central to being able to handle high throughput scenarios. It is a magical place filled with unicorns and rainbows, and is the reason Node can essentially be “single threaded” while still allowing an arbitrary number of operations to be handled in the background.