Go Quirks

2020-03-28

I've recently started working with Go on a full time basis. It's a fun language to use, and it comes with a rich standard library. Shipping a production ready HTTP server in the standard library is no small feat. It's not free from issues and quirks though.

In this post, I'll discuss some of the issues and quirks I've encountered in my journey with Go. I've deliberately chosen to not talk about often raised issues such as the lack of generics and the err != nil error handling pattern because they've been discussed at length and they're being addressed right now by the Go team for Go 2.

Zero initialization

Go permits variables and struct fields to not be explicitly initialized with a value, in which case it gives the variables or fields a zero value. I believe that this is quite dangerous. It can be a subtle source of bugs and unexpected behavior.

I first encountered issues related to this when one of our micro services started running out of file descriptors and having spurious errors because of it. This is the code that caused the issue:

client := &http.Client {
    Transport: &http.Transport{
        TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
    }
}

At first glance this looks pretty inoffensive, but it actually contains an error that causes TCP sockets to be leaked.

What happens is that we create a new http.Client where the transports' timeouts are all left unspecified. Since they are unspecified, Go initializes them to their zero value. This is the issue.

In the docs for http.Transport,

// IdleConnTimeout is the maximum amount of time an idle
// (keep-alive) connection will remain idle before closing
// itself.
// Zero means no limit.
IdleConnTimeout time.Duration // Go 1.7

you can see that a zero value means that the timeout is infinite, so the connections are never closed.

Over time the sockets accumulate and you end up running out of file descriptors. The amount of time it takes to manifest depends on how much activity your service gets and the ulimit setting for file descriptors.

The fix for this problem is simple: provide non-zero timeouts when initializing an http.Transport. This stackoverflow answer demontrates how to copy the default values from the http library.

Still, this is an easy trap to fall into, and to my knowledge there is no lint to help with this kind of issue at the moment.

There are other pervert side effects to this. For instance, unexported fields will always be initialized to their zero values since the fields cannot be initialized from outside the package.

Here's an example package:

package utils

type Collection struct {
    items map[string]string
}

func (c *Collection) Set(key, val string) {
    c.items[key] = val
}

Here's an example use of this package:

package main

func main() {
    col := utils.Collection{}
    col.Set("name", "val") // panic: assignment to nil map
}

The solution to this in not elegant. It's defensive programming. Before accessing the map, the package author must check if it has been initialized:

func (c *Collection) Set(key, val string) {
    if c.items == nil {
        c.items = make(map[string]string)
    }
    c.items[key] = val
}

This can get hairy quickly if the struct has multiple fields.

A solution to this is to provide a constructor function for the type, such as utils.NewCollection() which always initializes the fields, but even with this constructor, nothing prevents the user from initializing their struct with utils.Collections{} and causing a heap of issues down the line.

Over zealous linting

I believe that the compiler is too strict about unused variables. It's not rare for me to comment a function call and end up having to modify multiple lines above the call. I'll demontrate this with an example.

Here, I have an API client on which I can send requests and receive responses.

client, err := NewClient()
if err != nil {
    return err
}
defer client.Close()

resp, err := client.GetSomething()
if err != nil {
    return err
}

process(resp)

Now let's say that I want to debug my code and that I comment out the call to the process function.

client, err := NewClient()
if err != nil {
    return err
}
defer client.Close()

resp, err := client.GetSomething()
if err != nil {
    return err
}

//process(resp)

Now the compiler complains that resp declared and not used. Ok, I'll use _ instead of resp.

client, err := NewClient()
if err != nil {
    return err
}
defer client.Close()

_, err := client.GetSomething()

// process(resp)

Now the compiler complains that no new variables on left side of :=. Ah! Right! err is declared before. I'll use = instead of :=.

client, err := NewClient()
if err != nil {
    return err
}
defer client.Close()

_, err = client.GetSomething()

// process(resp)

Finally it compiles! But I had to change my code twice just to comment out a line. It's not rare that I have to do more edits than this before my program compiles.

I wish the compiler had a development mode where unused variables are simply warnings and do not prevent compilation, so that the edit-compile-debug cycle is not as excruciating as it currently is.

Returning errors

There's been a lot of talk about error management in Go. I personally don't mind the if err != nil { return err } pattern. It can be improved upon, and there's been proposals to improve it in Go 2.

What bugs me the most is the tuple style returns. When a function can produce an error, you must still provide a valid dummy value when an error occurs. For instance if your function returns (int, error) then you must return 0, err, i.e. still provide a value for the int we normally return if everything goes right.

I believe this is fundamentally wrong. Firstly, I shouldn't have to figure out some dummy value to return when there's an error. This leads to the over-use of pointers, because it's much easier and much cleaner to return nil, err than to return an empty struct with zero values and the error such as return User{}, err.

Secondly, having provided a valid dummy value makes it easy to not handle the error by mistake on the calling side, and move on assuming the dummy value is the right one.

// The fact that err is declared and used here makes it so
// there's no warnings about it being unused below.
err := hello()
if err != nil {
    return err
}
x, err := strconv.ParseInt("not a number", 10, 32)
// Forget to check err, no warning
doSomething(x)

This kind of error is even harder to find than if we had just returned nil. Because if we had returned nil, we'd hopefully have a nil pointer panic somewhere down the line.

I believe languages with support for sum types, such as Rust, Haskell or OCaml solve this more elegantly. When an error occurs there is no need to provide a value for the non-error return value.

enum Result<T, E> {
    Ok(T),
    Err(E),
}

A result is either Ok(T) or Err(E), never both.

fn connect(port u32) -> Result<Socket, Error> {
    if port > 65536 {
        // note that I don't have to provide a value for Socket
        return Err(Error::InvalidPort);
    }
    // ...
}

The nil slice and JSON

The recommended way of creating a slice in Go is using a var declaration such as var vals []int. This statement creates a nil slice, meaning that there's no array backing this slice: it's just a nil pointer. The append function supports appending to a nil slice, which is why you can use the pattern vals = append(vals, x). The len function also supports nil slices, returning 0 when a slice is nil. In practice this works well in most cases but it can lead to weird behavior.

For instance, imagine that we are building a JSON API. We query things from a database and convert them to objects so they can be serialized as JSON. Here's how the service layer could look like:

package models

import "sql"

type Customer struct {
    Name  string `json:"name"`
    Email string `json:"email"`
}

func GetCustomers(db *sql.DB) ([]*Customer, error) {
    rows, err := db.Query("SELECT name, email FROM customers")
    if err != nil {
        return nil, err
    }

    var customers []*Customer
    for _, row := range rows {
        customers = append(customers, &User {
            Name: row[0].(string)
            Email: row[1].(string)
        })
    }

    return customers, nil
}

This is fairly straight forward. Here's how a HTTP controller using this service could look like:

package controllers

import "http"
import "encoding/json"
import "github.com/me/myapp/models"

func GetCustomers(req *http.Request, resp http.ResponseWriter) {
    ...
    customers, err := models.GetCustomers(db)
    if err != nil {
        ...
    }
    resp.WriteHeader(200)
    if err := json.NewEncoder(resp).Encode(customers); err != nil {
        ...
    }
}

This is all basic stuff, but it actually contains a quirk which might trigger bugs in consumers of this API. When there are no customers in the database, the SQL query will not return any rows. Therefore the loop which appends to the customers slice will never process any item. The customers slice will be returned as nil.

When the JSON encoder sees a nil slice, it will write null to the response, instead of writing [] which should be the case when there are no results. This is bound to create problems for consumers of the API that expect an empty list when there are no items.

The solution is simple, either use a slice literal customers := []*Customer{} or use a call to make such as customers := make([]*Customer, 0). Note that some Go linters will warn you against using an empty slice literal and suggest to use var customers []*Customer instead, which does not have the same semantics.

There are also other places where this can cause trouble. To the len function, an empty map and a nil map are the same. They have 0 elements. But to other functions, such as reflect.DeepEqual these maps are not the same. I think that given how len behaves, it could be expected that a function which checks if both maps are the same would say they are. But reflect.DeepEqual disagrees, probably a by-product of using reflection to compare two objects, which is not a very good idea, but the only option available for now in Go.

Go modules and Gitlab

While relying on Git repositories to download modules might seem like a good idea at first, as soon as a more complicated use case shows up, Go modules completely fall apart. My team has had a heap of issues getting Go modules to work with our private Gitlab instance. There are two major issues that we've had.

The first issue is that Gitlab permits users to have recursive groups of projects. For instance you can have a git repository located at gitlab.whatever.com/group/tools/tool-1. This is not something that Go modules support out of the box. Go modules will try to download gitlab.whatever.com/group/tools.git because it assumes that the website uses a pattern similar to GitHub where there can only ever be 2 levels of nesting. We've had to use a replace inside of our go.mod file to point Go modules to the right place.

There is another way of solving this issue using an HTML <meta> tag which points to the right git repository, but this requires the Git plaform to support it. It doesn't feel like a good design decision to require Git platforms to add this special case for Go modules. Not only does it require upstream changes in the Git plaforms, but it also requires the deployed software to be upgraded to a recent version, which is not always a fast process in enterprise deployments.

The second issue is that, since our Gitlab instance is private and Go tries to download git repositories via https, we get 401 errors when we try to download Go modules without any authentication. Using our Gitlab password to authenticate is not a practical option, especially if CI/CD is involved. The solution we have found is to force git to use ssh when https requests are made with this piece of .gitconfig.

[url "git@gitlab.whatever.com:"]
	insteadOf = https://gitlab.whatever.com

This solution works well in practice, but it is not easy to fix it when you first encounter it. It also assumes that your SSH public key is registered in Gitlab and that your private key is not encrypted with a password. It might be possible to use an encrypted private key if you register your password in a keyring agent, like GNOME Keyring or KDE Wallet and that git integrates with it, but I haven't tried doing this so I can't say if it works.

Date formatting API

Go's date formatting format is quite surprising to say the least. Instead of using the commonly used strftime %Y-%m-%d format or the yyyy-mm-dd format, Go uses placeholder numbers and words which have a special meaning. If you want to format a date with the yyyy-mm-dd format in Go, you must use the "2006-01-02" format string. 2006 is a placeholder for the year, 01 is a placeholder for the month, and 02 is a placeholder for the day. The word Jan stands for date's month in the three letter abbreviation form Jan, Feb, Mar, etc.

I find this unnecessarily painful. It's hard to remember without looking at the documentation, it's extremely confusing, and it breaks for no good reason from the strftime standard format which has been in use for half a century.

I also find that the official documentation for the time package does a terrible job of explaining this. It barely mentions how this is supposed to work and you have to rely on third party sources to explain it in a concise and clear way.

Untyped constants

Take a look at this piece of code.

sess, err := mongo.Connect("mongodb://...")
if err != nil {
    return err
}

defer mongo.Disconnect(sess)

ctx, cancel := context.WithTimeout(context.Background(), 15)
defer cancel()

if err := sess.Ping(ctx, nil) {
    return err
}

This looks pretty innocuous. We connect to a MongoDB database, we defer the disconnection for when the function exits, then we create a context with a 15 second timeout and we run a ping command using this context to health check the database. This should work, right? Except it doesn't. It will return a context deadline exceeded error every time.

Why? Because the context we created does not have a 15 second timeout. It has a 15 nanosecond timeout. Might as well be no timeout at all, this is instant failure.

The context.WithTimeout function accepts and a context.Context and a time.Duration. time.Duration is a newtype defined as type Duration int64. We were able to pass an int to this function because of Go's untyped constants. That is, a constant does not have a type before one is assigned to it. So the 15 is not an int literal or an int constant. When we pass it as a time.Duration is becomes typed as a time.Duration.

This all means that there is no type error or lint telling us that we didn't give this function a proper time.Duration. Normally you want to pass this function time.Second * x to timeout in x seconds. The multiplication with time.Second which is of type time.Duration could perform the type cast and make this usage type safe. But this is not the case at the moment, and an untyped constant is just as valid as a real time.Duration, which is what creates this footgun.

Conclusion

Go is a fun and very productive language. It aims for simplicity and it manages to deliver it on the most part. However simplicity should not be the priority over correctness. If you choose simplicity over correctness you end up cutting corners and delivering broken solutions.

I think the Go modules interaction with Gitlab is a good example of this. Instead of doing what every other language does, which is to have package registries, Go has decided to use a "simple" solution and just fetch things from git servers. Except that it breaks spectacularly when authentication to the git server is required. And it also breaks when the git server does not have the same naming/grouping conventions than GitHub. In the end you end up wasting a day looking at stackoverflow questions trying to fix the "simple" package system.

I've been following the proposals for Go 2 and I'm happy to see that the Go team is taking their time with it. They are gathering a lot of community feedback which is great. Users often have very interesting input to give. Go 2 is a great opportunity to fix some of the quirks mentionned in this post, or at the very least, empower the users to create their own set of data structures and types which address some of these quirks.

I know for sure that I'll be writing a bunch of utilities and data structures to make my programs safer and more ergonomic when Go 2 arrives.

sbstp's blog