Tags Go

One common kind of data stored in a configuration file is options. In this post, I'll talk about some nuances we have to be aware of when storing options in JSON and unmarshaling them to Go.

Specifically, the most important difference between options and any other data is that options are often, well... optional. Our program can have a large number of possible configuration options, but we may want to configure any particular invocation with only a subset - leaving all the others at their default values.

Basics - partial unmarshaling, omitempty, and unknown fields

Let's start with the basics. Consider the following struct that represents the options for an imaginary program:

type Options struct {
  Id      string `json:"id,omitempty"`
  Verbose bool   `json:"verbose,omitempty"`
  Level   int    `json:"level,omitempty"`
  Power   int    `json:"power,omitempty"`
}

This struct has 4 options, but in real programs there may be dozens.

Suppose we want to specify these options in a JSON configuration file. A full listing of options may look something like:

{
  "id": "foobar",
  "verbose": false,
  "level": 10,
  "power": 221
}

If all options are always specified in our configuration files, there's not much to talk about. Just call json.Unmarshal and all is well.

In reality, things are rarely so simple. We may want to handle a number of special cases:

  1. The JSON configuration can be missing some fields, and we'll want our Go struct to have default values for those.
  2. The JSON configuration can have extra fields which our struct doesn't have. Depending on the scenario, we may want to either ignore these or report an error.

For (1), Go's json package will assign values only to fields found in the JSON; other fields will just keep their Go zero values. For example, if the JSON didn't have the level field at all, the Options struct unmarshaled from it would have 0 for Level. If this behavior is undesirable, check out the next section.

For (2), the json package is very permissive by default and will ignore unknown fields. That is, suppose the input JSON is:

{
  "id": "foobar",
  "bug": 42
}

json.Unmarshal will happily parse this into Options, setting Id to "foobar", Level and Power to 0 and Verbose to false. It will ignore the bug key.

This behavior is what you want in some cases, but not in others. Luckily, the json package makes it configurable by providing an explicit option to a JSON decoder with DisallowUnknownFields:

dec := json.NewDecoder(bytes.NewReader(jsonText))
dec.DisallowUnknownFields()

var opts Options
if err := dec.Decode(&opts); err != nil {
  fmt.Println("Decode error:", err)
}

Now parsing the aforementioned JSON snippet will result in an error.

Finally, you may have noticed that our Options struct has the omitempty tag specified for all fields. This means that fields with zero values will not be emitted to JSON. For example:

opts := Options{
  Id:    "baz",
  Level: 0,
}
out, _ := json.MarshalIndent(opts, "", "  ")
fmt.Println(string(out))

Will print out:

{
  "id": "baz"
}

Because all the other fields have their zero values. If you want to always emit all the fields instead, don't specify omitempty.

Setting default values

In the example above we've seen that missing fields in the JSON representation will be decoded to zero values in Go. This is fine if your options' default values are also their zero values, but this isn't always the case. What if the default value of Power should be 10, not 0? That is, when the JSON doesn't have a "power" field, you want to set Power to 10, but instead Unmarshal sets it to zero.

You may think - this is easy to solve! I'll just know to set Power to its default 10 whenever it's unmarshaled as 0 from the JSON! Hold on, though. What happens if the JSON really had "power" specified as 0?

The way to solve it is in reverse, actually. We'll set the default values first, and then let json.Unmarshal override fields as needed:

func parseOptions(jsn []byte) Options {
  opts := Options{
    Verbose: false,
    Level:   0,
    Power:   10,
  }
  if err := json.Unmarshal(jsn, &opts); err != nil {
    log.Fatal(err)
  }
  return opts
}

Now instead of calling json.Unmarshal directly for Options, we'll have to call parseOptions.

Alternatively, we can cleverly hide this logic in a custom UnmarshalJSON method for Options:

func (o *Options) UnmarshalJSON(text []byte) error {
  type options Options
  opts := options{
    Power: 10,
  }
  if err := json.Unmarshal(text, &opts); err != nil {
    return err
  }
  *o = Options(opts)
  return nil
}

With this method, any call to json.Unmarshal for the Options type will populate the default of Power correctly. Note the usage of the options type alias - this is to prevent infinite recursion in UnmarshalJSON.

This approach is simple and clean, but it has some downsides. First, it strongly ties the default values of fields with the parsing logic. It's conceivable that we want to let user code down the line set its defaults; right now, the defaults have to be set before unmarshaling.

The second downside is that it only works in simple cases. If our Options struct has a slice or map of other structs, we can't populate defaults this way. Consider:

type Region struct {
  Name  string `json:"name,omitempty"`
  Power int    `json:"power,omitempty"`
}

type Options struct {
  Id      string `json:"id,omitempty"`
  Verbose bool   `json:"verbose,omitempty"`
  Level   int    `json:"level,omitempty"`
  Power   int    `json:"power,omitempty"`

  Regions []Region `json:"regions,omitempty"`
}

If we want to populate defaults for the Power of each Region, we can't just do it on the level of Options. We have to write a custom unmarshal method for Region. This is difficult to scale for arbitrarily nested structs - spreading our default logic across multiple UnmarshalJSON methods is sub-optimal.

An alternative is to use a completely different approach, by pushing default logic to the users. We can accomplish this with pointer fields.

Default values with pointer fields

We can define our Options as:

type Options struct {
  Id      *string `json:"id,omitempty"`
  Verbose *bool   `json:"verbose,omitempty"`
  Level   *int    `json:"level,omitempty"`
  Power   *int    `json:"power,omitempty"`
}

It's very similar to the original definition, except that all the fields are now pointers. Suppose we have the following JSON text:

{
  "id": "foobar",
  "verbose": false,
  "level": 10
}

Note that all fields except "power" are specified. We can Unmarshal this as usual:

var opts Options
if err := json.Unmarshal(jsonText, &opts); err != nil {
  log.Fatal(err)
}

But now we can actually distinguish between fields that were not specified at all (these will get unmarshaled to a nil pointer) and fields that were specified with zero values (these will get unmarshaled to valid pointers to values with zero values). For example, we can write the following parsing wrapper to unmarshal Options and set default values as needed:

func parseOptions(jsn []byte) Options {
  var opts Options
  if err := json.Unmarshal(jsonText, &opts); err != nil {
    log.Fatal(err)
  }

  if opts.Power == nil {
    var v int = 10
    opts.Power = &v
  }

  return opts
}

Note how we set opts.Power; this is one of the inconveniences of working with pointers, because there is no syntax in Go to take the address of literals of built-in types like int. This isn't too much trouble, though, as some simple helper functions can make our life more pleasant:

func Bool(v bool) *bool       { return &v }
func Int(v int) *int          { return &v }
func String(v string) *string { return &v }
// etc...

With these in hand, we could have simply written opts.Power = Int(10).

The most useful trait of this approach is that it doesn't force us to assign default values at the point where the JSON is parsed. We can pass Options into user code and let that deal with defaults when nil fields are encountered.

So are pointers the magic solution to our "distinguish unspecified values from zero values" problem? Sort of. Pointers are certainly a viable solution that should work well. The official protobuf package uses the same approach for proto2 Protocol Buffers that distinguish between required and optional fields. So this method has absolutely been battle tested!

That said, it's not perfect. First of all, even though Go is really good at hiding the extra syntactic burden of dealing with pointers most of the time, in some cases a bit still leaks through (like taking the address of a built-in literal, as shown above). Another potential issue is performance. Pointers often mean heap allocation and may cause performance issues in some scenarios, though when talking about option structs this is unlikely to be a problem.