Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: arena: new package providing memory arenas #51317

Open
danscales opened this issue Feb 22, 2022 · 388 comments
Open

proposal: arena: new package providing memory arenas #51317

danscales opened this issue Feb 22, 2022 · 388 comments

Comments

@danscales
Copy link
Contributor

danscales commented Feb 22, 2022

Note, 2023-01-17. This proposal is on hold indefinitely due to serious API concerns. The GOEXPERIMENT=arena code may be changed incompatibly or removed at any time, and we do not recommend its use in production.


Proposal: arena: new package providing memory arenas

Author(s): Dan Scales (with input from many others)

Last updated: 2022-2-22

Discussion at https://golang.org/issue/51317

Abstract

We propose implementing memory arenas for Go. An arena is a way to allocate a set of memory objects all from a contiguous region of memory, with the advantage that the allocation of the objects from the arena is typically more efficient than general memory allocation, and more importantly, the objects in the arena can all be freed at once with minimal memory management or garbage collection overhead. Arenas are not typically implemented for garbage-collected languages, because their operation for explicitly freeing the memory of the arena is not safe and so does not fit with the garbage collection semantics. However, our proposed implementation uses dynamic checks to ensure that an arena free operation is safe. The implementation guarantees that, if an arena free operation is unsafe, the program will be terminated before any incorrect behavior happens. We have implemented arenas at Google, and have shown savings of up to 15% in CPU and memory usage for a number of large applications, mainly due to reduction in garbage collection CPU time and heap memory usage.

Background

Go is a garbage-collected language. Application code does not ever explicitly free allocated objects. The Go runtime automatically runs a garbage-collection algorithm that frees allocated objects some time after they become unreachable by the application code. The automatic memory management simplifies the writing of Go applications and ensures memory safety.

However, large Go applications spend a significant amount of CPU time doing garbage collection. In addition, the average heap size is often significantly larger than necessary, in order to reduce the frequency at which the garbage collector needs to run.

Non-garbage-collected languages also have significant memory allocation and de-allocation overhead. In order to deal with complex applications where objects have widely varying lifetimes, non-garbage-collected languages must have a general-purpose heap allocator. Because of the differing sizes and lifetimes of the objects being allocated, such an allocator must have fairly complex code for finding memory for a new object and dealing with memory fragmentation.

One approach to reducing the allocation overhead for non-garbage-collected languages is region-based memory management, also known as arenas. The idea is that applications sometimes follow a pattern where a code segment allocates a large number of objects, manipulates those objects for a while, but then is completely done with those objects, and so frees all (or almost all) of the objects at roughly the same time. The code segment may be allocating all the objects to compute a result or provide a service, but has no need for any of the objects (except possibly a few result objects) when the computation is done.

In such cases, region-based memory allocation using an arena is useful. The idea is to allocate a large region of memory called an arena at the beginning of the code segment. The arena is typically a contiguous region, but may be extensible in large chunk sizes. Then all the objects can be allocated very efficiently from the arena. Typically, the objects are just allocated consecutively in the arena. Then at the end of the code segment, all of the allocated objects can be freed with very low overhead by just freeing the arena. Any result object that is intended to be longer-lived and last past the end of the code segment should not be allocated from the arena or should be fully copied before the arena is freed.

Arenas have been found to be useful for a number of common programming patterns, and when applicable, can reduce memory management overhead in non-garbage collected languages. For instance, for a server serving memory-heavy requests, each request is likely independent, so most or all of the objects allocated while serving a particular request can be freed when the request has been fulfilled. Therefore, all the objects allocated during the request can be allocated in an arena, and then freed all at once at the completion of the request.

In a related vein, arenas have been useful for protocol buffer processing, especially when unmarshalling the wire format into the in-memory protocol message object. Unmarshalling a message's wire format to memory can create many large objects, strings, arrays, etc., because of the complexity of messages and the frequent nesting of sub-messages inside other messages. A program may often unmarshal one or more messages, make use of the in-memory objects for a period of time, and then be done with those objects. In this case, all of the objects created while unmarshalling the message(s) can be allocated from an arena and freed all at once. The C++ protocol buffer documentation provides an example of using arenas. Arenas may similarly be useful for other kinds of protocol processing, such as decoding JSON.

We would like to get some of the benefits of arenas in the Go language. In the next section, we propose a design of arenas that fits with the Go language and allows for significant performance benefits, while still ensuring memory safety.

Note that there are many applications where arenas will not be useful, including applications that don't do allocation of large amounts of data, and applications whose allocated objects have widely varying lifetimes that don't fit the arena allocation pattern. Arenas are intended as a targeted optimization for situations where object lifetimes are very clear.

Proposal

We propose the addition of a new arena package to the Go standard library. The arena package will allow the allocation of any number of arenas. Objects of arbitrary type can be allocated from the memory of the arena, and an arena automatically grows in size as needed. When all objects in an arena are no longer in use, the arena can be explicitly freed to reclaim its memory efficiently without general garbage collection. We require that the implementation provide safety checks, such that, if an arena free operation is unsafe, the program will be terminated before any incorrect behavior happens.

For maximum flexibility, we would like the API to be able to allocate objects and slices of any type, including types that can be generated at run-time via reflection.

We propose the following API:

package arena

type Arena struct {
	// contains filtered or unexported fields
}

// New allocates a new arena.
func New() *Arena

// Free frees the arena (and all objects allocated from the arena) so that
// memory backing the arena can be reused fairly quickly without garbage
// collection overhead.  Applications must not call any method on this
// arena after it has been freed.
func (a *Arena) Free()

// New allocates an object from arena a.  If the concrete type of objPtr is
// a pointer to a pointer to type T (**T), New allocates an object of type
// T and stores a pointer to the object in *objPtr.  The object must not
// be accessed after arena a is freed.
func (a *Arena) New(objPtr interface{})

// NewSlice allocates a slice from arena a.  If the concrete type of slicePtr
// is *[]T, NewSlice creates a slice of element type T with the specified
// capacity whose backing store is from the arena a and stores it in
// *slicePtr. The length of the slice is set to the capacity.  The slice must
// not be accessed after arena a is freed.
func (a *Arena) NewSlice(slicePtr interface{}, cap int)

The application can create an arbitrary number of arenas using arena.New, each with a different lifetime. An object with a specified type can be allocated in a particular arena using a.New, where a is an arena. Similarly, a slice with a specified element type and capacity can be allocated from an arena using a.NewSlice. Because the object and slice pointers are passed via an empty interface, any type can be allocated. This includes types that are generated at run-time via the reflect library, since a reflect.Value can be converted easily to an empty interface.

The application explicitly frees an arena and all the objects allocated from the arena using a.Free. After this call, the application should not access the arena again or dereference a pointer to any object allocated from this arena. The implementation is required to cause a run-time erro and terminate the Go program if the application accesses any object whose memory has already been freed. The associated error message should indicate that the termination is due to access to an object in a freed arena. In addition, the implementation must cause a panic or terminate the Go program if a.New or a.NewSlice is called after a.Free is called. a.New and a.NewSlice should also cause a panic if they are called with an argument which is not the correct form (**T for a.New and *[]T for a.NewSlice).

Here is some sample code as an example of arena usage:

import (
	“arena”
	…
)

type T struct {
	val int
}

func main() {
	a := arena.New()
	var ptrT *T
	a.New(&ptrT)
	ptrT.val = 1

	var sliceT []T
	a.NewSlice(&sliceT, 100)
	sliceT[99] .val = 4

	a.Free()
}

There may be an implementation-defined limit, such that if the object or slice requested by calls to a.New or a.NewSlice is too large, the object cannot be allocated from the arena. In this case, the object or slice is allocated from the heap. If there is such an implementation-defined limit, we may want to have a way to expose the limit. We’ve listed it as one of the possible metrics mentioned in the “Open Issues” section. An alternate API would be to not allocate the object or slice if it is too large and instead leave the pointer arguments unchanged. This alternate API seems like it would be more likely to lead to programming mistakes, where the pointer arguments are not properly checked before being accessed or copied elsewhere.

For optimization purposes, the implementation is allowed to delay actually freeing an arena or its contents. If this optimization is used, the application is allowed to proceed normally if an object is accessed after the arena containing it is freed, as long as the memory of the object is still available and correct (i.e. there is no chance for incorrect behavior). In this case, the improper usage of arena.Free will not be detected, but the application will run correctly, and the improper usage may be detected during a different run.

The above four functions are the basic API, and may be sufficient for most cases. There are two other API calls related to strings that are fairly useful. Strings in Go are special, because they are similar to slices, but are read-only and must be initialized with their content as they are created. Therefore, the NewSlice call cannot be used for creating strings. NewString below allocates a string in the arena, initializes it with the contents of a byte slice, and returns the string header.

// NewString allocates a new string in arena a which is a copy of b, and
// returns the new string.
func (a *Arena) NewString(b []byte) string

In addition, a common mistake with using arenas in Go is to use a string that was allocated from an arena in some global data structure, such as a cache, which that can lead to a run-time exception when the string is accessed after its arena is freed. This mistake is understandable, because strings are immutable and so often considered separate from memory allocation. To deal with the situation of a string whose allocation method is unknown, HeapString makes a copy of a string using heap memory only if the passed-in string (more correctly, its backing array of bytes) is allocated from an arena. If the string is already allocated from the heap, then it is returned unchanged. Therefore, the returned string is always usable for data structures that might outlast the current arenas.

// HeapString returns a copy of the input string, and the returned copy
// is allocated from the heap, not from any arena. If s is already allocated
// from the heap, then the implementation may return exactly s.  This function
// is useful in some situations where the application code is unsure if s
// is allocated from an arena.
func HeapString(s string) string

Of course, this issue of mistakenly using an object from an arena in a global data structure may happen for other types besides strings, but strings are a very common case for being shared across data structures.

We describe an efficient implementation of this API (with safety checks) in the "Implementation" section. Note that the above arena API may be implemented without actually implementing arenas, but instead just using the standard Go memory allocation primitives. We may implement the API this way for compatibility on some architectures for which a true arena implementation (including safety checks) cannot be implemented efficiently.

Rationale

There are a number of possible alternatives to the above API. We discuss a few alternatives, partly as a way to justify our above choice of API.

Removing Arena Free

One simple adjustment to the above API would be to eliminate the arena Free operation. In this case, an arena would be freed automatically only by the garbage collector, once there were no longer any pointers to the arena itself or to any objects contained inside the arena. The big problem with not having a Free operation is that arenas derive most of their performance benefit from more prompt reuse of memory. Though the allocation of objects in the arena would be slightly faster, memory usage would likely greatly increase, because these large arena objects could not be collected until the next garbage collection after they were no longer in use. This would be especially problematic, since the arenas are large chunks of memory that are often only partially full, hence increasing fragmentation. We did prototype this approach where arenas are not explicitly freed, and were not able to get a noticeable performance benefit for real applications. An explicit Free operation allows the memory of an arena to be reused almost immediately. In addition, if an application is able to use arenas for almost all of its allocations, then garbage collection may be mostly unneeded and therefore may be delayed for quite a long time.

APIs that directly return the allocated objects/slices

An alternate API with similar functionality, but different feel, would replace (*Arena).New and (*Arena).NewSlice with the following:

// New allocates an object of the given type from the arena and returns a
// pointer to that object.
func (a *Arena) New(typ reflect.Type) interface{}

// NewSlice allocates a slice of the given element type and capacity from the
// arena and returns the slice as an interface. The length of the slice is
// set to the capacity.
func (a *Arena) NewSlice(typ reflect.Type, cap int) interface{}

An example of usage would be:

a := arena.New()
floatPtr := a.New(reflect.TypeOf(float64(0))).(*float64)
byteSlice := a.NewSlice(reflect.TypeOf(byte(0)), 100).([]byte)

This API potentially seems simpler, since it returns the allocated object or slice directly, rather than requiring that a pointer be passed in to indicate where the result should be stored. This allows convenient use of Go’s idiomatic short variable declaration, but does require type assertions to convert the return value to the correct type. This alternate API specifies the types to be allocated using reflect.Type, rather than by passing in an interface value that contains a pointer to the required allocation type. For applications and libraries that already work on many different types and use reflection, specifying the type using reflect.Type may be convenient. However, for many applications, it may seem more convenient to just pass in a pointer to the type that is required.

There is an efficiency distinction in the NewSlice call with the two choices. In the NewSlice API described in the "Proposal" section, the slice header object is already allocated in the caller, and only the backing element array of the slice needs to be allocated. This may be all that is needed in many cases, and hence more efficient. In the new API in this section, the Slice call must allocate the slice object as well in order to return it in the interface, which causes extra heap or arena allocation when they are often not needed.

Another alternative for a.New is to pass in a pointer to type T and return a pointer to
type T (both as empty interfaces):

// New, given that the concrete type of objPtr is a pointer to type T,
// allocates an object of type T from the arena a, and returns a pointer to the
// object.
func (a *Arena) New(objPtr interface{}) interface{}

An example use of this API call would be: intPtr := a.New((*int)(nil)).(*int). Although this also allows the use of short variable declarations and doesn’t require the use of reflection, the rest of the usage is fairly clunky.

Simple API using type parameterization (generics)

We could have an optional addition to the API that uses type parameterization to express the type to be allocated in a concise and direct way. For example, we could have generic NewOf and NewSliceOf functions:

// NewOf returns a pointer to an object of type T that is allocated from
// arena a.
func arena.NewOf[T any](a *Arena) *T
// NewSliceOf returns a slice with element type T and capacity cap
// allocated from arena a
func arena.NewSliceOf[T any](a *Arena, cap int) []T

Then we could allocate objects from the arena via code such as:

intPtr := arena.NewOf[int](a)

We don’t think these generic variants of the API can completely replace the suggested methods above, for two reasons. First, the NewOf function can only allocate objects whose type is specified at compile-time. So, it cannot satisfy our goal to support allocation of objects whose type is computed at run-time (typically via the reflect library). Second, generics in Go are just arriving in Go 1.18, so we don’t want to force users to make use of generics before they are ready.

Compatibility

Since this API is new, there is no issue with Go compatibility.

Implementation

In order to fit with the Go language, we require that the semantics of arenas in Go be fully safe. However, our proposed API has an explicit arena free operation, which could be used incorrectly. The application may free an arena A while pointers to objects allocated from A are still available, and then sometime later attempt to access an object allocated from A.

Therefore, we require that any implementation of arenas must prevent improper accesses without causing any incorrect behavior or data corruption. Our current implementation of the API gives a memory fault (and terminates the Go program) if an object is ever accessed that has already been freed because of an arena free operation.

Our current implementation performs well and provides memory allocation and GC overhead savings on the Linux amd64 64-bit architecture for a number of large applications. It is not clear if a similar approach can work for 32-bit architectures, where the address space is much more limited.

The basic ideas for the implementation are as follows:

  • Each arena A uses a distinct range in the 64-bit virtual address space
  • A.Free unmaps the virtual address range for arena A
  • The physical pages for the arena can then be reused by the operating system for other arenas.
  • If a pointer to an object in arena A still exists and is dereferenced, it will get a memory access fault, which will cause the Go program to terminate. Because the implementation knows the address ranges of arenas, it can give an arena-specific error message during the termination.

So, we are ensuring safety by always using a new range of addresses for each arena, in order that we can always detect an improper access to an object that was allocated in a now-freed arena.

The actual implementation is slightly different from the ideas above, because arenas grow dynamically if needed. In our implementation, each arena starts as a large-size "chunk", and grows incrementally as needed by the addition of another chunk of the same size. The size of all chunks is chosen specifically to be 64 MB (megabytes) for the current Go runtime on 64-bit architectures, in order to make it possible to recycle heap meta-data efficiently with no memory leaks and to avoid fragmentation.

The address range of these chunks do not need to be contiguous. Therefore, when we said above that each arena A uses a distinct range of addresses, we really meant that each chunk uses a distinct range of addresses.

Each chunk and all the objects that it contains fully participate in GC mark/sweep until the chunk is freed. In particular, as long as a chunk is part of an arena that has not been freed, it is reachable, and the garbage collector will follow all pointers for each object contained in the chunk. Pointers that refer to other objects contained in the chunk will be handled very efficiently, while pointers to objects outside the chunk will be followed and marked normally.

The implementation calls SetFinalizer(A, f) on each arena A as it is allocated, where f calls A.Free. This ensures that an arena and the objects allocated from it will eventually be freed if there are no remaining references to the arena. The intent though is that every arena should be explicitly freed before its pointer is dropped.

Because unmapping memory is relatively expensive, the implementation may continue to use a chunk for consecutively allocated/freed arenas until it is nearly full. When an arena is freed, all of its chunks that are filled up are immediately freed and unmapped. However, the remaining part of the current unfilled chunk may be used for the next arena that is allocated. This batching improves performance significantly.

Because of the large 64-bit address space, our prototype implementation has not required reusing the virtual addresses for any arena chunks, even for quite large and long-running applications. However, the virtual addresses of most chunks can eventually be reused, since there will almost always be no more reachable pointers to anywhere in the chunk. Since the garbage collector sees all reachable pointers, it can determine when an address range can be reused.

The implementation described above demonstrates that it is possible to implement the Arena API for 64-bit architectures with full safety, while still providing performance benefits. Many other implementations are possible, and some may be tuned for other types of usage. In particular, because of the 64 MB chunk size, the above implementation may not be useful for applications that need to create a large number of arenas that are live at the same time (possibly because of many concurrent threads). It is probably most appropriate that there should only be a few to 10's of arenas in use at any one time. Also, it is not intended that arenas be shared across goroutines. Each arena has a lock to protect against simultaneous allocations by multiple goroutines, but it would be very inefficient to actually use the same arena for multiple goroutines. Of course, that would rarely make sense anyway, since the lifetimes of objects allocated in different goroutines are likely to be quite different.

Open issues

Another possibility in the design space is to implement the API described in the "Proposal" section, but without the safety checks, or with an option to disable the safety checks. The idea here is that the performance savings from the use of arenas can be increased by doing an implementation that doesn't have safety guarantees. As compared to the implementation described above, we can avoid the mapping and unmapping overhead, and reuse the memory of an arena much more quickly (and without OS involvement) when it is freed. We have done a prototype of such an implementation, which we call "unsafe arenas". We have seen an additional 5-10% improvement in performance in some cases when using unsafe arenas rather than our safe arena implementation. However, we feel very strongly that arenas in Go need to be safe. We do not want the use of arenas to lead to memory bugs that may be very hard to detect and debug, and may silently lead to data corruption. We think that it is better to continue to optimize the implementation of safe arenas, rather than trying to support unsafe arenas.

It would be useful to have some run-time metrics associated with arenas. The desired metrics will depend somewhat on the final API, so we have not yet tried to decide the exact metrics that will cover the application needs. However, here are some metrics which might be useful:

  • the number of arena created and freed
  • the number of current arenas and the maximum number of arenas that have been active at one time
  • the total number of bytes allocated via arenas, and the average number of bytes allocated per arena
  • the (constant) limit on the largest-size object or slice that can be allocated from an arena

Another open issue is whether arenas can be used for allocating the elements of a map. This is
possible, but it is not clear what a good API would be. Also, there might be unusual cases if the arena used for the main map object is different from the arena used to allocate new elements of the map. With generics arriving in Go 1.18, generic maps (or hash tables) can now be implemented in user libraries. So, there could be a user-defined generic map implementation that allows optionally specifying an arena for use in allocating new elements. This might be the best solution, since that would allow for greater flexibility than adjusting the semantics of the built-in map type.

Protobuf unmarshalling overheads

As noted above, arenas are often quite useful for reducing the allocation and GC overhead associated with the objects that are created as a protobuf message is being unmarshaled. We have prototyped changes to the protobuf package which allow for providing an arena as the allocation option for objects created during unmarshalling. This arrangement makes it quite easy to use arenas to reduce the allocation and GC overhead in applications that make heavy use of protobufs (especially unmarshalling of large protobufs). If the arena proposal is accepted and implemented in Go, then it would make sense to extend the protobuf package to provide such an arena allocation option.

@gopherbot gopherbot added this to the Proposal milestone Feb 22, 2022
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Feb 22, 2022
@hherman1
Copy link

What is the reason to add this to the standard library as opposed to building a third party package?

@clausecker
Copy link

Your specification says:

The application explicitly frees an arena and all the objects allocated from the arena using a.Free. After this call, the application should not access the arena again or dereference a pointer to any object allocated from this arena. The implementation is required to cause a run-time erro and terminate the Go program if the application accesses any object whose memory has already been freed.

If I read it correctly, this means that it is permitted to keep pointers into a released arena (i.e. stray pointers) as long as you do not explicitly dereference them. However, this means that the compiler must now be careful not to dereference any pointer it knows not to be nil unless user code explicitly does so, lest it be a pointer into a released arena. This seems like it would significantly reduce the potential for optimisation as otherwise, the compiler seems to be allowed to perform such accesses, knowing that each reachable object can also be dereferenced safely.

If the rules were tightened to say that you have to erase all pointers into an arena before calling Free, not only would these optimisations be enabled, but there would also be a way to reclaim the address space occupied by the arena: the garbage collector could be programmed to check if any pointers into the arena address space remain and if there aren't any, it could allow the address space to be reclaimed. If it finds a stray pointer, it could likewise abort the program in much the same manner as when you dereference a stray pointer.

Another benefit is that it's less likely to have tricky edge cases where stray pointers could remain in some data structures (e.g. as the result of some string manipulation or append operations which may only some times return a pointer to one of their arguments), causing them to be dereferenced later only under specific, hard to reconstruct circumstances. By prohibiting the presence of any stray pointers after a call to Free, this kind of error would be much easier to find.

With this issue addressed, I'm very interested in this proposal. It will be very useful for complex temporary data structures that need to be built step by step and deallocated all at once.

@quenbyako
Copy link

@hherman1 as far as i understood this proposal, there are a lot of troubles for example in appending to slices, e.g. the code could be more readable if you just use append, without calling arena-specific methods.

Even though, the idea to manualy handle memory freeing sounds great for me, cause there are a lot of specific cases, when you don't want to hope on the garbage collector

@tarndt
Copy link

tarndt commented Feb 22, 2022

I'd be curious to understand the problems that an arena solves that can't be solved by either using sync.Pool, allocating a slice of a given type (for example a binary tree allocating a slice of nodes), or a combination of both techniques. It seems to me that Go provides idiomatic ways of addressing at least the specific protobuf use-case mentioned here.

In addition to the above, doesn't the Go allocator already have sizes classes? If the proposal does proceed, I think we need to see a prototype outperform the runtimes allocator and is enough gain to justify the ecosystem complexity.

@komuw
Copy link
Contributor

komuw commented Feb 22, 2022

We don’t think these generic variants of the API can completely replace ....

First, the NewOf function can only allocate objects whose type is specified at compile-time. So, it cannot satisfy our goal to support allocation of objects whose type is computed at run-time (typically via the reflect library).

What's the main usecase for wanting to allocate types created via reflect in arenas? Marshalling & unmarshall?
Are those usecases compelling enough? Honest question.

Second, generics in Go are just arriving in Go 1.18, so we don’t want to force users to make use of generics before they are ready.

Is there a hurry in adding arenas? It can always wait one or two release cycles before been added so that people are ready with generics.
In other words, if hypothetically speaking, the arena-generics design was the better API; we shouldn't shelve it just because generics aren't ready yet.

@frioux
Copy link

frioux commented Feb 22, 2022

This proposal sounds really good to me from a user's perspective, but one detail causes concern: wouldn't something like all code eventually end up needing arena support? For example, imagine I have a web service where I want to allocate an arena for each web request. This sounds like a great use case for this, but I'd need encoding/json to have an Arena mode, and maybe text/template/html/template to have Arena modes so their working sets can use the Arena. Probably same for various clients (SQL, http, etc.) Is that what you see as the path forward or am I missing something?

@ianlancetaylor
Copy link
Contributor

@hherman1

What is the reason to add this to the standard library as opposed to building a third party package?

It's hard to do this safely in a third party package. Consider a struct allocated in the arena that contains pointers to memory allocated outside the arena. The GC must be aware of those pointers, or it may incorrectly free ordinary-memory objects that are still referenced by arena objects. A third party package would have to somehow make the garbage collector aware of those pointers, including as the pointer values change, which is either hard or inefficient.

@ianlancetaylor
Copy link
Contributor

@clausecker

However, this means that the compiler must now be careful not to dereference any pointer it knows not to be nil unless user code explicitly does so

That is already true. Go code can use pointers that point to memory that was allocated by C, or that was allocated by syscall.Mmap. The compiler already can't casually dereference a pointer.

the garbage collector could be programmed to check if any pointers into the arena address space remain and if there aren't any, it could allow the address space to be reclaimed

I believe that we can already do that. If the garbage collector sees a pointer to a freed arena, it can crash the program. If we have two complete GC cycles after an arena is freed, we can know for sure that there are no remaining pointers into that address space, and we can reclaim the addresses. However, I don't know if the current patches implement that.

@ianlancetaylor
Copy link
Contributor

@tarndt

I'd be curious to understand the problems that an arena solves that can't be solved by either using sync.Pool, allocating a slice of a given type (for example a binary tree allocating a slice of nodes), or a combination of both techniques. It seems to me that Go provides idiomatic ways of addressing at least the specific protobuf use-case mentioned here.

As you note, sync.Pool only permits allocating a specific type. It's reasonable to use if all allocations are the same type. An arena is when most allocations are different types. That is the case for protobuf allocations: each protobuf is implemented as a different Go struct. It's also the case for many uses of, for example, JSON.

In addition to the above, doesn't the Go allocator already have sizes classes? If the proposal does proceed, I think we need to see a prototype outperform the runtimes allocator and is enough gain to justify the ecosystem complexity.

I'm not sure how size classes are related.

As @danscales mentions, there is already an implementation, which is in use internally at Google. It does overall outperform the runtime allocator for cases like RPC servers that transmit data as protobufs.

@Merovius
Copy link
Contributor

First, I feel this should live in the runtime package, or at least in runtime/arena, as it seems fairly runtime specific. Which is my main concern with this, that other implementations might not support it and then packages using it would not be usable on those implementations.

Also, a question for clarification:

It is probably most appropriate that there should only be a few to 10's of arenas in use at any one time.

One of the main usecases mentioned is protobuf decoding. ISTM that, if every call to proto.Unmarshal creates an arena, used until that message is done with, we would very quickly outpace 10's of arenas by orders of magnitude on a loaded gRPC server.

@quenbyako

as far as i understood this proposal, there are a lot of troubles for example in appending to slices, e.g. the code could be more readable if you just use append, without calling arena-specific methods.

AIUI the proposal would not interoperate with append, i.e. you'd indeed have to use arena-specific methods.

@ianlancetaylor
Copy link
Contributor

@komuw

What's the main usecase for wanting to allocate types created via reflect in arenas? Marshalling & unmarshall?
Are those usecases compelling enough? Honest question.

Yes, marshaling and unmarshaling. These cases are compelling for, for example, network RPC servers that must serialize and unserialize data for every request.

@adonovan
Copy link
Member

One implementation caveat: a 64-bit address space isn't quite the inexhaustible vastness it first seems because many environments impose tighter restrictions. (I recently used a 4GB mmap as an optimization but found OpenBSD's default ulimit is 1.5GB and some users seem to impose their own limits.)

@Merovius
Copy link
Contributor

Which reminds me:

It is not clear if a similar approach can work for 32-bit architectures, where the address space is much more limited.

Does this mean this package would use build-tags to limit itself to 64-bit architectures? And would programs using this then not run on 32-bit platforms, unless they specifically code around that? Also, this seems all the more reason to put this into runtime, to make clear that it's platform dependent.

@thepudds
Copy link
Contributor

thepudds commented Feb 22, 2022

Hi @Merovius, my interpretation was that the API could be implemented everywhere, but it would not increase efficiency everywhere:

Note that the above arena API may be implemented without actually implementing arenas, but instead just using the standard Go memory allocation primitives. We may implement the API this way for compatibility on some architectures for which a true arena implementation (including safety checks) cannot be implemented efficiently.

@Merovius
Copy link
Contributor

@thepudds ah thank you, I missed that, alleviates my concerns around portability. Though I'd find it a bit confusing to provide the API without actual arenas. But, fair enough.

@thepudds
Copy link
Contributor

@Merovius raises an interesting question, though, around what the overhead of the non-arena implementation of the arena api would be…

@tarndt
Copy link

tarndt commented Feb 23, 2022

@ianlancetaylor

As you note, sync.Pool only permits allocating a specific type. It's reasonable to use if all allocations are the same type. An arena is when most allocations are different types. That is the case for protobuf allocations: each protobuf is implemented as a different Go struct. It's also the case for many uses of, for example, JSON.

Assuming the problem with protobufs is the actual objects and not some internal buffer, here is how I would naively use sync.Pool.

Today you might see code like this:

book := &pb.AddressBook{}

Rather I would suggest the generated AddressBook code includes a constructor function:

book := NewAddressBook()

The protoc generated code in its package would look something like this:

var addressBookPool sync.Pool = sync.Pool{
	New: func() interface{} {
		return new(AddressBook)
	},
}

func NewAddressBook() *AddressBook {
    return addressBookPool.Get().(*AddressBook)
}

func (ab *AddressBook) Close() error { 
    *ab = AddressBook{}
     addressBookPool.Put(ab)
     return nil
}

I assume if arena existed similar changes to have a constructor that uses it would need to happen as well? If not that burden would fall to the object creator which is in fact how I use sync.Pool with protobufs today (when needed)

Now maybe more is needed say:

func (ab *AddressBook) Close() error {
    ab.closeOnce.Do(func() {
        *ab = AddressBook{}
         addressBookPool.Put(ab)
    })
    return nil
}

Or maybe we need to also generate and use pools/constructors of *structs referenced in AddressBook (or do what some pkgs like gogoproto optionally do and not use pointers for objects nested in proto structs). Lots of nuance that is besides the point of this discussion-

But I would like to understand why the arena approach is better. I'd also be curious to bench an arena prototype against sync.Pool and an approach like the above.

@danscales
Copy link
Contributor Author

If the rules were tightened to say that you have to erase all pointers into an arena before calling Free, not only would these optimisations be enabled, but there would also be a way to reclaim the address space occupied by the arena: the garbage collector could be programmed to check if any pointers into the arena address space remain and if there aren't any, it could allow the address space to be reclaimed. If it finds a stray pointer, it could likewise abort the program in much the same manner as when you dereference a stray pointer.

In addition to what Ian already said about the compiler not dereferencing stray pointers:

One reason that we would like to able to deal with pointers to objects in the arena staying around after it freed (as long as they are not used) is because it may be quite tricky and disruptive to have to remove them from the stack. A pointer to an object in the arena could have been passed around as an arg and is still on the stack as an argument or local variable, but will not be used again. It would be good to ensure that the programmer does not have to nil out these pointers on the stack just to satisfy arenas, when they will naturally go away later.

@danscales
Copy link
Contributor Author

We don’t think these generic variants of the API can completely replace ....

Second, generics in Go are just arriving in Go 1.18, so we don’t want to force users to make use of generics before they are ready.

Is there a hurry in adding arenas? It can always wait one or two release cycles before been added so that people are ready with generics. In other words, if hypothetically speaking, the arena-generics design was the better API; we shouldn't shelve it just because generics aren't ready yet.

Yes, I agree. There's no rush to figure out and add arenas, so it may be worth waiting to use the generics API if there's consensus on it being a better API. The first point does remain - which is that we would prefer to have some API that allows for allocating a type that is created/calculated at runtime. However, the generics API could become the main API, and the one that allows for dynamic types could be the secondary API.

@danscales
Copy link
Contributor Author

This proposal sounds really good to me from a user's perspective, but one detail causes concern: wouldn't something like all code eventually end up needing arena support? For example, imagine I have a web service where I want to allocate an arena for each web request. This sounds like a great use case for this, but I'd need encoding/json to have an Arena mode, and maybe text/template/html/template to have Arena modes so their working sets can use the Arena. Probably same for various clients (SQL, http, etc.) Is that what you see as the path forward or am I missing something?

Yes, that's a good point. If we have a package that is allocating lots of memory (often by doing decoding of wire format) and we want to use arenas with it, then we will have to plumb in a way to send down an optional allocator function that it should use if enabled (in our case, the allocator function would use an arena). That's what we have prototyped for the protobuf unmarshaling library calls (see last section). I don't see any easy way to make packages use arenas, but open to suggestions.

@thepudds
Copy link
Contributor

Hi @danscales

If we have a package that is allocating lots of memory (often by doing decoding of wire format) and we want to use arenas with it, then we will have to plumb in a way to send down an optional allocator function that it should use if enabled (in our case, the allocator function would use an arena).

Is there a rough estimate of how many new APIs this might translate to in the standard library, for example?

@danscales
Copy link
Contributor Author

Also, a question for clarification:

It is probably most appropriate that there should only be a few to 10's of arenas in use at any one time.

One of the main usecases mentioned is protobuf decoding. ISTM that, if every call to proto.Unmarshal creates an arena, used until that message is done with, we would very quickly outpace 10's of arenas by orders of magnitude on a loaded gRPC server.

Yes, that is true. It is possible there is another implementation that could deal with that many arenas simultaneously. But I will note that if you have 100s of requests in flight at one time, then they may not be allocating that much new data while serving each individual request, so arenas may not really be required in that case. Arenas are really more useful/appropriate for unmarshalling large data objects for which you may do a lot of processing. sync.Pool may be more appropriate if the allocated data per request is small (since there may also be only a limited number of known object types).

@CannibalVox
Copy link

Yes, I agree. There's no rush to figure out and add arenas, so it may be worth waiting to use the generics API if there's consensus on it being a better API. The first point does remain - which is that we would prefer to have some API that allows for allocating a type that is created/calculated at runtime. However, the generics API could become the main API, and the one that allows for dynamic types could be the secondary API.

Is there any idea of (A) what is the net cost of doing this with reflection and (B) will it be possible to not do it with reflection when using the generics API?

Yes, that is true. It is possible there is another implementation that could deal with that many arenas simultaneously. But I will note that if you have 100s of requests in flight at one time, then they may not be allocating that much new data while serving each individual request, so arenas may not really be required in that case. Arenas are really more useful/appropriate for unmarshalling large data objects for which you may do a lot of processing. sync.Pool may be more appropriate if the allocated data per request is small (since there may also be only a limited number of known object types).

What message sizes/throughputs was the prototype tested at? Many small requests/responses is certainly the norm in my background, but I know google was looking hard (I believe?) a Vitess blog post that mainly dealt with very large data replication messages. Certainly if proto performance gets worse than status quo at high throughput levels, many people wouldn't be thrilled with that.

@CannibalVox
Copy link

CannibalVox commented Feb 23, 2022

This proposal sounds really good to me from a user's perspective, but one detail causes concern: wouldn't something like all code eventually end up needing arena support? For example, imagine I have a web service where I want to allocate an arena for each web request. This sounds like a great use case for this, but I'd need encoding/json to have an Arena mode, and maybe text/template/html/template to have Arena modes so their working sets can use the Arena. Probably same for various clients (SQL, http, etc.) Is that what you see as the path forward or am I missing something?

Yes, that's a good point. If we have a package that is allocating lots of memory (often by doing decoding of wire format) and we want to use arenas with it, then we will have to plumb in a way to send down an optional allocator function that it should use if enabled (in our case, the allocator function would use an arena). That's what we have prototyped for the protobuf unmarshaling library calls (see last section). I don't see any easy way to make packages use arenas, but open to suggestions.

Add memory allocator to context? It seems really silly but doing a second round of "context-aware" updates to add context in more places seems better to me than doing a second round of "context-aware" updates that adds an entirely new type of object that ends up needing to be sent everywhere.

@evanphx
Copy link
Contributor

evanphx commented Feb 23, 2022

Hi all,

One quick thought: could you utilize the write barrier that is currently used by the compiler/GC to detect writes to arena allocated memory outside of the arena. Then on .Free, copy the values that are referenced out to the main heap?

This would let you effectively use an arena as a large scratch area, with the GC assisting in figuring out which values are not just scratch.

@schmichael
Copy link
Contributor

Is the code for the experimental protobuf implementation published? The scope of code that needs changing in order to take advantage of per-request arenas seems huge, but I'm eager to see a real world example in hopes I'm wrong!

@ianlancetaylor
Copy link
Contributor

@tarndt A protobuf message can include many different messages (as they are called) arranged in a complex memory layout. This is all relatively seamless for the program but it means that unmarshaling a protobuf isn't just a matter of allocating a single object type, it can involve dozens of different types. So you would need dozens of different sync.Pool structures. And in order to use the pools effectively you would need a complicated piece of code to release all the data back to the appropriate pools, which is much more complicated, and slower, than releasing an arena.

Frankly an arena sounds a lot simpler. And let's not forget that an arena lets you notify the runtime exactly when the memory is no longer needed. A pool will stick around in a much less predictable fashion.

@seebs
Copy link
Contributor

seebs commented Jul 17, 2023

i think it's impossible to do nested data structures as actual pointers in a thing to be mmapped, but there's a fair bit of prior art for doing nested data structures by storing "pointers" as memory-address-offsets within a block. I think capnproto does something like that.

@kolinfluence
Copy link

When is this going to be standard and not experimental?

@gophun
Copy link

gophun commented Aug 13, 2023

When is this going to be standard and not experimental?

@kolinfluence
Likely never: "This proposal is on hold indefinitely due to serious API concerns."

@joaoeinride
Copy link

Could this API or a similar one be added as an experimental library with the documented limitations, until a safer design is found?

@gophun
Copy link

gophun commented Aug 13, 2023

@joaoeinride It has been available via GOEXPERIMENT=arena since Go 1.20, but please note, it

may be changed incompatibly or removed at any time, and we do not recommend its use in production.

@joaoeinride
Copy link

Thanks for the prompt reply. I am aware but was curious if it could be released like "https://pkg.go.dev/golang.org/x/exp/arenas"

@Merovius
Copy link
Contributor

@joaoeinride I believe we specifically don't want to do that, because it means people would use it and would have to leak it into their API to do so - which is specifically what we want to avoid. Also, FWIW, it is implemented in the runtime, so it's somewhat awkward to put into x/exp.

If you are only using it in internal software, you can add the GOEXPERIMENT to your CI system and get the same effect. If you want to use it in open source code and find this setup too painful, that's basically WAI.

At least, that's my understanding.

@gophun
Copy link

gophun commented Aug 13, 2023

Also, there was some recent drama, because several people ignored the warnings and used x/exp/slices in production, and then a function signature changed: #61374 (comment)
(I personally don't want to hear any complaints in case the arena experiment is removed - you all have been warned)

@hherman1
Copy link

Is there a version of this proposal where arenas aren’t added to the standard library, but instead some important unsafe function is which enables devs to implement something like arenas? Because that would probably satisfy the initial use case and push some of the API experimentation work onto the community

@ianlancetaylor

This comment was marked as resolved.

@Merovius

This comment was marked as resolved.

@ianlancetaylor

This comment was marked as resolved.

@go101
Copy link

go101 commented Aug 19, 2023

@hherman1 No that unsafe functionalities provided. Personally, I indeed hope the "unsafe" package could provide alloc and free functions to let users manage memory themselves, so that they don't need to use cgo to do this.

@someview
Copy link

Is there some plan to add this feature to the release version

@ianlancetaylor
Copy link
Contributor

@someview At present there are no plans for putting this into a Go release.

@gptlang
Copy link

gptlang commented Sep 15, 2023

Personally, I indeed hope the "unsafe" package could provide alloc and free functions to let users manage memory themselves, so that they don't need to use cgo to do this.

Are there any current proposals for something similar? There are a few hot paths where this could be very useful

@ianlancetaylor
Copy link
Contributor

It is exceptionally unlikely that we would ever add functionality for freeing memory.

The discussion is also off-topic on this proposal, so please discuss in some other forum. Thanks.

@qiulaidongfeng
Copy link
Contributor

I recently studied this issue.
Based on my understanding of the proposal, and what the non-standard library can do.
I created https://gitee.com/qiulaidongfeng/arena
It Features

  • It can be used in multiple goroutines at the same time
  • It can be used in versions younger than go1.20 (>=go1.18)
  • not happen use-after-free

Performance is also good, with 32.18n sec/op on a 16-thread benchmark.

goos: windows
goarch: amd64
pkg: gitee.com/qiulaidongfeng/arena
cpu: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics
                │      n      │
                │   sec/op    │
Alloc_int_P1-16   11.41n ± 1%
Alloc_int-16      32.18n ± 3%
geomean           19.17n

                │      n       │
                │     B/s      │
Alloc_int_P1-16   751.9Mi ± 1%
Alloc_int-16      266.7Mi ± 3%
geomean           447.8Mi

                │     n      │
                │    B/op    │
Alloc_int_P1-16   8.000 ± 0%
Alloc_int-16      49.00 ± 8%
geomean           19.80

                │      n       │
                │  allocs/op   │
Alloc_int_P1-16   0.000 ± 0%
Alloc_int-16      0.000 ± 0%
geomean

Although third-party implementations of arena perform well.
But considering that it depends on such as #62483 (comment) the idea here will not become a reality, and the other is no guarantee that do not change the existing implementation details.
An arane addition to the standard library is worthwhile.

This proposal is on hold indefinitely due to serious API concerns

If the concerns is use-after-free , See #63671 Adding implicit alloc and free is probably not a good idea.
Even if use-after-free can occur in the standard library arena, it can be avoided by the structured allocation I describe here
#63671 (comment)
It is I assume that the program has not started to use memory at a certain time node, called A1; At a certain time node, called A2, memory is used; At a certain time node, called A3, the program no longer uses memory.
So if you do NewArena before A2 and Free after A3, use-after-free doesn't happen.
Because using Arena need to explicitly call New, also avoid the #63671 (comment) said, is not convenient to come to the conclusion that when the free memory is safe.

If #18802 is completed later, there may be a way to make the arena perform equally on single goroutine and multi-Goroutine, although since it is not possible to merge the two arenas, Need #18802 (comment) description by using the dynamic pool to manage indices into the static pool, but I believe that performance is better when there are multiple cores because they are not competing for atomic operations on the same piece of memory.

I have noticed that the scene of multi-Goroutine using arena may be rare but there are some.
For example, when creating a new programming language with go, different nodes of the syntax tree can often be free at the same time.

@qiulaidongfeng
Copy link
Contributor

qiulaidongfeng commented Jan 22, 2024

As I review the discussion of this proposal, it appears that the arguments preventing its acceptance center on two points.

  1. this introduces the possibility of use-after-free in languages with GC.

There are many solutions, as follows:

  1. Such as https://gitee.com/qiulaidongfeng/arena, Arena.Free semantics is to write code mark here Arena allocated memory life cycle is over. True free exists after the GC determines that there are no references.
  2. As with go:linkname, you must import unsafe to use it. Its security is similar to that of the C languages malloc and free.
  3. Check memory ownership regardless of compilation speed, as rust does.
  4. Default behavior after Arena.Free call, free memory after GC confirms no reference. free memory directly after opt-in Arena.Free call. I think such a scheme can even be implemented in go1.23.
  5. Restrict the use of arena in the API, and the API cannot pass Pointers outward.
    For example:
// compile pass , because not return Pointer
func New()int{
a:=arena.NewArena()
b:=arena.New[int](a)
a.Free()
return b
}
// compile not pass , because return Pointer
func New()*int{
a:=arena.NewArena()
b:=arena.New[*int](a)
return b
}
  1. arena's API is highly contagious.

See #63671 , Implicit modify some heap allocation behavior is not a good idea, the most important objection in the #63671 (comment)

If arena is to be used explicitly, it does not necessarily mean infectivity.
See #51317 (comment) what I said about structured allocation .
One way to implement this idea is to put the arena in the field of the struct and use the arena in the struct's methods (without giving the memory allocated from the arena to the outside world).
The caller is then explicitly told that the behavior of using this struct and its methods is undefined from the time a method of the struct, such as Close, is called.
In this way, the user does not need to know about the arena, and if someone uses the structure and its methods after the structure's Close method is called, it only shows that the code is inherently wrong, regardless of whether there is an arena.

@mknyszek
Copy link
Contributor

There are many solutions, as follows:

  1. Such as https://gitee.com/qiulaidongfeng/arena, Arena.Free semantics is to write code mark here Arena allocated memory life cycle is over. True free exists after the GC determines that there are no references.

The resource wins of arenas come almost entirely from releasing memory early so it doesn't count against the GC, and the frequency of GC cycles goes down. If you have to wait until the GC determines there are no references, this is not much better than just regular heap allocation.

Actually, I'm not sure I really understand what https://gitee.com/qiulaidongfeng/arena does to help improve performance. I see it doesn't free memory in a way that allows for immediate reuse. It appears to group memory by type in 8 MiB blocks and bump-pointer allocates into the blocks. The existing memory allocator in the runtime already allocates in size classes, in P-local buffers, without any map lookups or synchronization on the fast path (except for the publication barrier on weak memory architectures). I don't see how this would be faster. Perhaps writing out heap metadata for pointer-ful allocations in bulk earlier is helpful? The gains seem marginal at best though.

(Bump-pointer allocation is an interesting direction to explore, but I suspect it's unlikely to be worth it if additional synchronization is required. The allocator's fast path is already usually pretty fast.)

(Note: Be careful when comparing with microbenchmarks. By allocating 8 MiB up-front, the minimum total heap size you're going to get is 16 MiB. If you compare this to a microbenchmark that heap-allocates the same type with no ballast or anything, the heap will be, at most, 4 MiB in total size. That means it's not an good comparison because the latter is using half the memory and is going through twice as many GC cycles.)

  1. As with go:linkname, you must import unsafe to use it. Its security is similar to that of the C languages malloc and free.

Compromising memory safety really feels like a very last resort, and I think we're very far from exhausting all our options.

  1. Check memory ownership regardless of compilation speed, as rust does.

This is an interesting direction, but has a similar kind of virality as const.

  1. Default behavior after Arena.Free call, free memory after GC confirms no reference. free memory directly after opt-in Arena.Free call. I think such a scheme can even be implemented in go1.23.

I do not know what you mean by this. A direct memory free without any protections means use-after-free bugs resulting in memory corruption. The implementation currently in the runtime ensures that memory corruption never occurs (only a possible immediate crash) on use-after-free.

  1. Restrict the use of arena in the API, and the API cannot pass Pointers outward.

If your point is to bind arenas to the stack frame in which the arena is created, this is basically already what the compiler's escape analysis pass already does automatically. The general idea of emitting a compilation error if a value escapes its stack frame is interesting, but the weird part is that the compilation error depends on how sophisticated the compiler's escape analysis is. Encoding that in the spec is going to be very limiting and changing the analysis algorithm will be hard. I'm no expert on language design, but my impression is that language really has to be designed from the ground up for this sort of thing, like Rust. (Even then, Rust has been steadily expanding in sophistication to allow more flexibility, IIUC.)

If arena is to be used explicitly, it does not necessarily mean infectivity.

Using an arena explicitly at all means that it is going to be infectious to some degree. Lots of packages cannot make use of arena allocation without changing their API to accept an arena. A common example of this is serialization packages like encoding/json. Another example is most tree data structures. math/big could accept an arena. Changing all of these packages' APIs to accept arenas as an option does not seem like the best path forward.

One way to implement this idea is to put the arena in the field of the struct and use the arena in the struct's methods

I do not see how this helps in general. Sure, you can pass around a struct as part of an API that hides the arena, but that's not really that different from just passing around the arena.

@qiulaidongfeng
Copy link
Contributor

https://gitee.com/qiulaidongfeng/arena is mainly used to explain the arena if implemented with third-party libraries, the question is how to interact with GC.
One way to do this is to use syscall.Mmap, but then the problem is to make sure the GC knows that there are go Pointers on the non-Go heap. Any solution depends on the GC implementation.
An alternative approach that does not rely on the GC implementation is to use runtime.mallocgc, but without an API like unsafe.Free, it waits for GC to free the memory.
So implementing arena in the standard library is a better choice.

  1. Default behavior after Arena.Free call, free memory after GC confirms no reference. free memory directly after opt-in
    Arena.Free call. I think such a scheme can even be implemented in go1.23.

I do not know what you mean by this. A direct memory free without any protections means use-after-free bugs resulting in
memory corruption. The implementation currently in the runtime ensures that memory corruption never occurs (only a possible > immediate crash) on use-after-free.

I mean the meaning of Arena.Free is
The default free memory occurs after the GC determines that no pointer references the memory allocated from the arena. The purpose of this is to ensure that arena can be used without the risk of use-after-free.
With options like GODEBUG=arenafreenow=1, arena is allowed with the risk of use-after-free for higher performance.
free memory simply means that memory allocated from arena via the return pointer should not be dereferenced again. The runtime can do anything with this memory, including existing implementations.

  1. Restrict the use of arena in the API, and the API cannot pass Pointers outward.

If your point is to bind arenas to the stack frame in which the arena is created, this is basically already what the compiler's escape analysis pass already does automatically. The general idea of emitting a compilation error if a value escapes its stack frame is interesting, but the weird part is that the compilation error depends on how sophisticated the compiler's escape analysis is. Encoding that in the spec is going to be very limiting and changing the analysis algorithm will be hard. I'm no expert on language design, but my impression is that language really has to be designed from the ground up for this sort of thing, like Rust. (Even then, Rust has been steadily expanding in sophistication to allow more flexibility, IIUC.)

What I mean specifically is that for functions that use arena, there is no possibility that a pointer created inside the function will survive the return of the function. For structs that use arena in their methods, a pointer created in any way through the struct is not allowed to survive after the struct object is reclaimed by GC.
In this way, although functions cannot write to arguments of type **int32, arena is transparent to callers. As long as Arena.Free is called before the end of the function call or Arena.Free is called with a finalizer before the struct is GC, there will be no use-after-free because subsequent program execution has no pointer to the memory allocated from the Arena.
This is backwards compatible with code that does not use arena.

One way to implement this idea is to put the arena in the field of the struct and use the arena in the struct's methods

I do not see how this helps in general. Sure, you can pass around a struct as part of an API that hides the arena, but that's not really that different from just passing around the arena.

Tie the lifecycle and structure of Arena together. This way, as long as there is no pointer obtained from the struct, it can still be accessed after the structure is GC. The caller does not need to know arena.

@qiulaidongfeng
Copy link
Contributor

qiulaidongfeng commented Jan 23, 2024

Actually, I'm not sure I really understand what https://gitee.com/qiulaidongfeng/arena does to help improve performance. I see it doesn't free memory in a way that allows for immediate reuse. It appears to group memory by type in 8 MiB blocks and bump-pointer allocates into the blocks. The existing memory allocator in the runtime already allocates in size classes, in P-local buffers, without any map lookups or synchronization on the fast path (except for the publication barrier on weak memory architectures). I don't see how this would be faster. Perhaps writing out heap metadata for pointer-ful allocations in bulk earlier is helpful? The gains seem marginal at best though.
(Bump-pointer allocation is an interesting direction to explore, but I suspect it's unlikely to be worth it if additional synchronization is required. The allocator's fast path is already usually pretty fast.)

https://gitee.com/qiulaidongfeng/arena It is possible to free memory before GC determines that there is no reference.
As long as there is a risk of use-after-free, reusing 8 MiB sized blocks through sync.Pool can be achieved.
For the caller, this is indeed free memory.This gives it better Free performance than GC.
Much of its current overhead comes from atomic operations that are used synchronously.
Just by dropping support for multiple goroutine, or using the check-out/check-in model shown in #18802 but with static size and no need to merge the sync.ShardedValue of the method, eliminate synchronization such as atomic manipulation , I believe its Alloc performance can exceed that of calling new. (On my computer, the performance of new(int) is 7+ns, it is 10+ns)
https://gitee.com/qiulaidongfeng/arena performance improvement potential mainly from it using //go:linkname runtime_mallocgc runtime.mallocgc, can customize the behavior of Alloc and Free.
Ways to improve performance that are not implemented:

  1. The above mentioned methods to improve Alloc and Free performance.
  2. Reduce the overhead of setting to zero value by setting the needzero parameter of runtime.mallocgc to false when zero value is not required.

If arena is to be used explicitly, it does not necessarily mean infectivity.

Using an arena explicitly at all means that it is going to be infectious to some degree. Lots of packages cannot make use of arena allocation without changing their API to accept an arena. A common example of this is serialization packages like encoding/json. Another example is most tree data structures. math/big could accept an arena. Changing all of these packages' APIs to accept arenas as an option does not seem like the best path forward.

What I mean is that requiring explicit use of Arena does not mean that the caller must know Arena.
I want this sentence to be expressed along with my other words, by hiding the use of arena within a function or within a method of a structure, so that the memory allocated through arena cannot be accessed outside of the function or structure and its methods. Use arena to be transparent to the caller.

@qiulaidongfeng
Copy link
Contributor

qiulaidongfeng commented Jan 25, 2024

I noticed this benchmark after these two commits(https://gitee.com/qiulaidongfeng/arena/compare/v0.4.0...v0.5.0

set GOEXPERIMENT=arenas
go test -bench=^*  -benchtime=2s
UseAfterFree(false)
goos: windows
goarch: amd64
pkg: gitee.com/qiulaidongfeng/arena
cpu: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics
BenchmarkAlloc_int_P1-16                273490933                9.637 ns/op     933.90 MB/s           8 B/op          0 allocs/op
BenchmarkAlloc_int-16                   62029608                35.58 ns/op      252.93 MB/s          69 B/op          0 allocs/op
BenchmarkAlloc_int_new-16               281095550                8.432 ns/op    1067.37 MB/s           8 B/op          1 allocs/op
BenchmarkAlloc_intAndFree-16                5997            402418 ns/op           0.02 MB/s     8389042 B/op          7 allocs/op
BenchmarkAlloc_int_stdArena-16          210657051               11.27 ns/op      798.29 MB/s           7 B/op          0 allocs/op
BenchmarkAlloc_intAndFree_arena-16       1825330              1279 ns/op           7.04 MB/s         856 B/op          2 allocs/op
PASS
UseAfterFree(true)
BenchmarkAlloc_int_P1-16                205623828               11.61 ns/op      775.45 MB/s           8 B/op          0 allocs/op
BenchmarkAlloc_int-16                   58220772                36.24 ns/op      248.35 MB/s          50 B/op          0 allocs/op
BenchmarkAlloc_int_new-16               291649136                8.087 ns/op    1112.94 MB/s           8 B/op          1 allocs/op
BenchmarkAlloc_intAndFree-16             2008635              1169 ns/op           7.70 MB/s         511 B/op          6 allocs/op
BenchmarkAlloc_int_stdArena-16          214101681               11.28 ns/op      797.78 MB/s           7 B/op          0 allocs/op
BenchmarkAlloc_intAndFree_arena-16       1847961              1279 ns/op           7.04 MB/s         855 B/op          2 allocs/op
PASS
ok      gitee.com/qiulaidongfeng/arena  40.705s

It seems that the arena implemented outside my standard library supports multi-Goroutine, supports the choice of whether the use-after-free situation can occur, and can be further optimized when #18802 is completed. Scenarios in which int is allocated and 100 ints are allocated Free are comparable to the standard library's arena performance.(need UseAfterFree(true))

It seems that the biggest advantage of the standard library experimental arena is that it doesn't keep use-after-free silent.

I think that many packages, such as encoding/json packages, cannot use arena allocation without changing the API, which is basically unsolved in cases requiring explicit use of arena.

Changing all of these packages' APIs to accept arenas as an option does not seem like the best path forward.

But seems no better way.
After all, it's unlikely that go will ever have an interface like this in the future to decouple where alloc's memory comes from.

type Alloc[T any] interface{
Alloc(*T)
}

Then let the user use gls or something to pick up such an allocator for heap allocation.
This is done without changing the API, but the work becomes code rewriting that the compiler can do.
But as shown in the #63671 (comment), any change implicit heap allocation can use this reason to oppose.( you would have to deeply inspect its code (and the code of its transitive dependencies) to draw any conclusion about when it is safe to free the memory. And you would have to repeat this exhaustive exercise any time you upgrade anything.)

@sprappcom

This comment was marked as duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Hold
Development

No branches or pull requests