Some notes on the structure of Go binaries (primarily for ELF)
I'll start with the background. I keep around a bunch of third party
programs written in Go, and one of the things that I do periodically
is rebuild them, possibly because I've updated some of them to
their latest versions. When doing this,
it's useful to have a way to report the package that a Go binary was
built from, ideally a fast way. I have traditionally used binstale
for this, but it's not
fast. Recently I tried out gobin
,
which is fast and looked like it had great promise, except that I
discovered it didn't report about all of my binaries. My attempts to
fix that resulted in various adventures but only partial success.
All of the following is mostly for ELF
format binaries, which is the binary format used on most Unixes
(except MacOS). Much of the general information applies to other
binary formats that Go supports, but the specifics will be different.
For a general introduction to ELF, you can see eg here.
Also, all of the following assumes that you haven't stripped the
Go binaries, for example by building with '-w
' or '-s
'.
All Go programs have a .note.go.buildid
ELF section that has the
build ID (also).
If you read the ELF sections of a binary and it doesn't have that,
you can give up; either this isn't a Go binary or something deeply
weird is going on.
Programs built as Go modules contain an
embedded chunk of information about the modules used in building
them, including the main program; this can be printed with 'go
version -m <program>
'. There is no official interface to extract
this information from other binaries (inside a program you can use
runtime/debug.ReadBuildInfo()
), but it's
currently stored in the binary's data section as a chunk of plain
text. See version.go
for how Go itself finds and extracts this information, which is
probably going to be reasonably stable (so that newer versions of
Go can still run 'go version -m <program>
' against programs built
with older versions of Go). If you can extract this information
from a binary, it's authoritative, and it should always be present
even if the binary has been stripped.
If you don't have module information (or don't want to copy
version.go's code in order to extract it), the only approach I know
to determine the package a binary was built from is to determine
the full file path of the source code where main()
is, and then
reverse engineer that to create a package name (and possibly a
module version). The general approach is:
- extract Go debug data from the binary and use debug/gosym to create a
LineTable
and aTable
. - look up the
main.main
function in the table to get its starting address, and then useTable.PCToLine()
to get the file name for that starting address. - convert the file name into a package name.
Binaries built from $GOPATH
will have file names of the form
$GOPATH/src/example.org/fred/cmd/barney/main.go
. If you take the
directory name of this and take off the $GOPATH/src
part, you
have the package name this was built from. This includes module-aware
builds done in $GOPATH
. Binaries built directly from modules with
'go get example.org/fred/cmd/barney@latest
' will have a file path
of the form $GOPATH/pkg/mod/example.org/fred@v.../cmd/barney/main.go
.
To convert this to a module name, you have to take off '$GOPATH/pkg/mod/
'
and move the version to the end if it's not already there. For
binaries built outside some $GOPATH
, with either module-aware
builds or plain builds, you are unfortunately on your own; there
is no general way to turn their file names into package names.
(There are a number of hacks if the source is present on your local
system; for example, you can try to find out what module or VCS
repository it's part of if there's a go.mod
or VCS control directory
somewhere in its directory tree.)
However, to do this you must first extract the Go debug data from
your ELF binary. For ordinary unstripped Go binaries, this debugging
information is in the .gopclntab
and .gosymtab
ELF sections of
the binary, and can be read out with debug/elf/File.Section()
and Section.Data()
. Unfortunately,
Go binaries that use cgo do not have these Go ELF sections. As
mentioned in Building a better Go linker:
For “cgo” binaries, which may make arbitrary use of C libraries, the Go linker links all of the Go code into a single native object file and then invokes the system linker to produce the final binary.
This linkage obliterates .gopclntab
and .gosymtab
as separate
ELF sections. I believe that their data is still there in the final
binary, but I don't know how to extract them. The Go debugger Delve doesn't even try; instead, it
uses the general DWARF
.debug_line
section (or its compressed version), which seems
to be more complicated to deal with. Delve has its DWARF code as
sub-packages, so perhaps you could reuse them to read and process the
DWARF debug line information to do the same thing (as far as I know
the file name information is present there too).
Since I have and use several third party cgo-based programs, this
is where I gave up. My hacked branch of the which
package can deal
with most things short of "cgo" binaries, but unfortunately that's
not enough to make it useful for me.
(Since I spent some time working through all of this, I want to write it down before I forget it.)
PS: I suspect that this situation will never improve for non-module builds, since the Go developers want everyone to move away from them. For Go module builds, there may someday be a relatively official and supported API for extracting module information from existing binaries, either in the official Go packages or in one of the golang.org/x/ additional packages.
|
|