Skip to content

Some details about byte slices and rune slices in Go

Go101 edited this page Oct 24, 2022 · 7 revisions

(Update: since Go 1.19, the second interpretation mentioned below is officially confirmed as the definition of byte slice.)


Official Go documentations mention the terminologies byte slice and rune slice in many places, but never give them clear definitions, which causes some inconsistencies between the standard Go compiler (gc) and gccgo.

There are two interpretations for byte slice:

  1. a byte slice is a slice which underlying type is []byte.
  2. a byte slice is a slice with an element type which underlying type is byte.

Mostly, the definition inaccuracies don't affect Go programming, but sometimes, they do.

And it looks official Go documentations also call a byte slice as a slice of bytes and a rune slice as a slice of runes. Both of the terminologies byte slice and slice of bytes exist in both Go specification and the builtin standard package docs. From contextes, it looks these documentations view the two terminologies as the same thing. I think a slice of bytes is more prone to be interpreted as a slice with an element type which underlying type is byte (the second interpretation mentioned above).

Go specification mentions:

As a special case, append also accepts a first argument assignable to
type []byte with a second argument of string type followed by .... 
This form appends the bytes of the string. 

...

As a special case, copy also accepts a destination argument assignable to
type []byte with a source argument of a string type.
This form copies the bytes from the string into the byte slice.

    copy(dst []byte, src string) int

...

Conversions to and from a string type:
1. ...
2. Converting a slice of bytes to a string type yields a string
   whose successive bytes are the elements of the slice. 
3. Converting a slice of runes to a string type yields a string that is
   the concatenation of the individual rune values converted to strings. 
4. Converting a value of a string type to a slice of bytes type yields
   a slice whose successive elements are the bytes of the string.
5. Converting a value of a string type to a slice of runes type yields
   a slice containing the individual Unicode code points of the string.

...

A non-constant value x can be converted to type T in any of these cases: 
- ...
- x is an integer or a slice of bytes or runes and T is a string type.
- x is a string and T is a slice of bytes or runes. 

The builtin standard package mentions:


The copy built-in function copies elements from a source slice
into a destination slice. (As a special case, it also will copy
bytes from a string to a slice of bytes.) 

....

As a special case, it is legal to append a string to a byte slice,
like this:

    slice = append([]byte("hello "), "world"...)

It looks gccgo mainly adopts the second interpretation (a byte slice is a slice with an element type which underlying type is byte), but gc mainly adopts the first interpretation (a byte slice is a slice which underlying type is []byte). The following program compiles okay when using the gccgo compiler, but fails when using gc compiler.

package main

type MyByte byte
type MyRune rune

func main() {
	var rs []rune
	var myrs []MyRune
	var bs []byte
	var mybs []MyByte
	var str = "abc"
	
	// These lines compile okay for both gc and gccgo.
	copy(bs, str)
	bs = append(bs, str...)
	rs = []rune(str)
	str = string(rs)
	bs = []byte(str)
	str = string(bs)
	
	// The two lines also compiles oaky for both gc and gccgo.
	myrs = []MyRune(str)
	mybs = []MyByte(str)
	
	// But the two lines only compile okay for gccgo.
	str = string(myrs)
	str = string(mybs)
	
	// The two lines also only compile okay for gccgo.
	copy(mybs, str)
	mybs = append(mybs, str...)
}

The reflection mechanism adopts the first interpretation (a byte slice is a slice which underlying type is []byte).

package main

import "reflect"

type MyByte byte
type MyRune rune

func main() {
	var myrs []MyRune
	var mybs []MyByte
	var str = "abc"
	
	typMyRuneSlice := reflect.TypeOf(myrs)
	typMyBytesSlice := reflect.TypeOf(mybs)
	typString := reflect.TypeOf(str)
	
	println(typMyRuneSlice.ConvertibleTo(typString))  // false
	println(typString.ConvertibleTo(typMyRuneSlice))  // false
	println(typMyBytesSlice.ConvertibleTo(typString)) // false
	println(typString.ConvertibleTo(typMyBytesSlice)) // false
}

So I looks both the implementations of gc and gccgo violate the restrictions of Go type system, with gccgo violates more than gc (with the assumption that the reflection results are correct). Though, personally, I don't think the violations are harmful. I hope gc can violate more as gccgo, so that the violations can be viewed as unintended semantics sugars.

In fact, the reflect package also violates some restrictions of Go type system. Go type system forbids us converting a []MyByte value to []byte, but with the help of the method Bytes() of reflect.Value, such conversions are possible.

package main

import "fmt"
import "reflect"

type MyByte byte

func main() {
	var mybs = []MyByte{'a', 'b', 'c'}
	var bs []byte
	
	// bs = []byte(mybs) // this line fails to compile
	
	v := reflect.ValueOf(mybs)
	bs = v.Bytes() // okay. Violating Go type system.
	bs[1], bs[2] = 'r', 't'
	fmt.Printf("%s \n", mybs) // art
}

Again, personally, I think the violation is also not harmful. We can view it as an unintended sugar.

Clone this wiki locally