Qureet.com : Find Customers On Twitter


The Go Beartrap Lying in the Shadows

The Go Beartrap Lying in the Shadows

 

Avoid an almost-invisible Golang bug as shown in the examples below.
 

Go is an awesome language. I’ve used it to build a production web server and a CMS. These power several live websites. I’ve also got built several cron binaries. These form the backend of my Twitter Lead-Generation service, which helps businesses and contractors with their online marketing. Go’s compiler is just so powerful. It takes the best of C++, Java and Python … and then it really tightens everything up.

Here’s an example of the versatility of Go’s compiler. Even the simplest snippet of code – easily typed in the haste of the moment – can be flagged as an error:

package main

import "fmt"

func main() {

	i := 0
	for n := 0; n < 20; n++ {
    		i := n * n
    		fmt.Println(i)
  	}
}

[ Note: the examples in the post were all run using go1.3 on the linux/amd64 platform ]

This code won’t compile. Do you see why? The compiler complains as follows:

line 7: i declared and not used

Ah, right. Variable “i” was declared at the outermost scope of the function. Then it was (inadvertently) shadowed inside the for-loop by a new inner variable – also called “i”. This is so easy to do, and yet so easy to miss.

That the compiler actually flags this as a Build Error is simply AWESOME. This one single compiler rule catches SO many bugs … before they even have a chance to occur at runtime. Beautiful stuff. See for yourself by running this code on the Go Playground.

When gotchas like this find their way into your code-base, they’re almost certain to cause a bug further down the line. These bugs are hidden and are nearly impossible to track down without very careful debugging.

“But … how can a bug like this find its way into your code if, it causes a compiler error??!” I hear you say!

Excellent question. The problem is that the Go compiler – wonderful as it is – can’t catch all instances of such typos. So, how might such an insidious bug remain undetected by the Go compiler … and therefore manifest in some real-world code? Let’s explore a more authentic example – one which actually found its way into my live Go stream processing service.

An Invisible Shadow

Suppose I want to use Go’s excellent net package to net.Dial(…) out a connection to a remote server. The address I’m given might be a standard http:// url, or it might be a secure https:// url. In the case of a secure url, I need to use tls.Dial(…) instead of net.Dial(…). (Note that tls.Dial(…) is located in the crypto/tls package.)

How might this look in in real life?

package main

import "fmt"
import "crypto/tls"
import "net"
import "time"

func main() {
	server := "www.google.com"
	err := DoDial(server, true/*=isSecure*/)
	if err != nil {
		fmt.Println(err.Error())
	} else {
		fmt.Printf("Connection to %s opened successfully.\n", server)
	}
}

func DoDial(server string, isSecure bool) error {

	var conn net.Conn
	var err error

	if isSecure {
		conn, err = tls.Dial("tcp", server + ":443", &tls.Config{})
	} else {
		conn, err = net.Dial("tcp", server)
	}

	if err != nil {
		return err
	}

	timeoutNs := 30 * 1e9
	conn.SetReadDeadline(time.Now().Add(time.Duration(timeoutNs)))

	// Do more with conn here, eg writing a request string etc.

	return nil
}

Ok – so far, so good. There’s no problem here. The Go compiler doesn’t complain, and the code does what it says on the tin. But look at the tls.Dial call. It needs a &tls.Config parameter. Above, I’m just constructing an empty one. But what if I wanted to construct a real tls.Config with some certificates inside it? Suppose I do this in another function called GetTlsConfig(). Now, the code might look like this:

package main

import "net"
import "crypto/tls"
import "time"

func main() {
	server := "www.google.com"
	err := DoDial(server, true/*=isSecure*/)
	if err != nil {
		fmt.Println(err.Error())
	} else {
		fmt.Printf("Connection to %s opened successfully.\n", server)
	}
}

func DoDial(server string, isSecure bool) error {

	var conn net.Conn
	var err error

	if isSecure {
                config, err := GetTLSConfig()
		if err != nil {
			return err
		}
		conn, err = tls.Dial("tcp", server + ":443", &config)
	} else {
		conn, err = net.Dial("tcp", server)
	}

	if err != nil {
		return err
	}

	timeoutNs := 30 * 1e9
	conn.SetReadDeadline(time.Now().Add(time.Duration(timeoutNs)))

	// Do more with conn here, eg writing a request string etc.

	return nil
}

func GetTLSConfig() (tls.Config, error) {

        var config tls.Config
        var err error

        // Construction of config goes here - omitted as not relevant here.

        return config, err 
}

Still looking good? Perhaps. But now, suppose that – in the tls.Dial call above – I forget to include the “:443″ port suffix in the server string. Yes, I managed to do this when I first wrote this code in my dev environment. It turns out this is a problem. But the way it manifests at runtime is almost completely inscrutable.

We get runtime a panic which is cryptic at best. Try it for yourself: compile and run the above code snippet in your Go environment, but without the “:443″ suffix on the tls.Dial line. See if you can figure out what’s going on.

I’m checking for non-nil errors everywhere … right? So why is a panic happening? And how did this nil inner member arise? This seems all wrong.

The Hidden Danger With Short Variable Declarations

It turns out that the error is in the call to getTLSConfig():

config, err := GetTLSConfig()

This line has the same shadowed-variable issue as that shadowed “i” variable in the first example (back at the top of this blog post). The inner “err” variable masks the outer “err”. But crucially: because the err is then used later in the same inner block (in the tls.Dial call), the compiler fails to flag this as an error.

Disaster. This is a serious bug that went uncaught. What’s more, because of the stack trace yielded when the runtime panic occurs, its very unclear what is actually going on. After much routing around – and with some excellent support from the indomitable Dave Cheney, here’s the fix I eventually deduced:

var config tls.Config
config, err = GetTLSConfig()

All is now well with the world. The missing “:443″ port suffix now triggers an error in the tls.Dial call. This err value is *retained* once the inner-scoped code of the “if isSecure” conditional exits. The err value was getting lost before, due to the shadowed “err” variable. So now, when the error is checked immediately after the “if isSecure” block, err is found to be non-nil. So the function bails immediately, and the error’s message is then printed out by the caller in our main() function.

Take it to the Vet?

Wow. This was a nasty one. Spotting this insidious gotcha might seem like a no-brainer to you, when seen clearly in the light of day as above. But in the heat of a Bootstrapped Startup, you’re dealing with user support, bug-fixes, feature-requests, marketing, business development and all the rest … So sometimes bugs like this will slip through the net. Ouch!

It’s looks as though “go vet” would flag problems such as the one above with a warning. But only if the -shadow flag is supplied when the tool is invoked. Of course it is prudent to always run your code through “go vet” before release. However, I still thought this gotcha was worth putting out there. Especially as the tool’s -shadow flag is experimental and is currently off by default [as of October 2014].

If this blog post can help someone out there avoid falling into this same bear-trap – well, then my work for the day is done. What do you think? Are there other ways to protect from such bugs when using Go? I’d love to hear about them!

(Kind thanks go to Ralph Corderoy for reading a draft of this post.)



Leave a Reply

Subscribe: rss | email | twitter | Google+