How Go Taught Its Compiler to Skip the Heap

Go’s memory model has always had a clean abstraction at the surface: allocate a variable, the runtime figures out where it goes. The Go FAQ says you do not need to know whether something is on the stack or the heap to write correct code. That is true. But you do need to know if you want to write fast code, because the two locations have very different costs.

Stack allocation is essentially a stack pointer decrement at function entry. There is no lock, no size class lookup, no GC metadata to record, no write barrier. When the function returns, the frame is gone. Heap allocation goes through runtime.mallocgc, which finds a free slot in a size-segregated span, records type information for the garbage collector, and may trigger a new span fetch with locking. After that, the GC has to scan, mark, and eventually sweep the object, at a CPU cost proportional to the allocation rate. The Go GC guide is precise about this: in steady state, total GC CPU cost scales with allocation rate divided by GOGC. Reducing your allocation rate is the highest-leverage performance lever in Go programs.

The compiler has always tried to keep variables on the stack when possible. The mechanism is escape analysis, implemented in cmd/compile/internal/escape/escape.go. It builds a weighted directed graph over all values in a compilation unit. Edges represent assignments; weights encode indirection levels (address-of, copy, dereference). The analysis solves for which variables are reachable from a synthetic heapLoc node through a low-indirection path. Any such variable escapes to the heap. The two invariants the compiler enforces are: no pointer to a stack object may be stored on the heap, and no pointer to a stack object may outlive the function’s stack frame. When the compiler cannot prove either invariant holds, the variable moves to the heap.

You can inspect escape analysis decisions with:

go build -gcflags="-m=2" ./...

The -m=2 level shows the reasoning chain rather than just the conclusion. You will see lines like moved to heap: x or tasks does not escape, along with the assignment edges that drove the decision.

The Slice Problem

Slices are one of the most common sources of heap allocation in Go. The slice header (pointer, length, capacity) is typically small enough to live on the stack. The backing array is the expensive part. Its size is often determined at runtime, and runtime-determined sizes create a problem for stack allocation: Go stacks are fixed-size frames. There is no alloca or variable-length array mechanism as in C. Every slot in a stack frame must be known at compile time.

This created a practical limitation. If you wrote:

tasks := make([]task, 0, 10)  // constant capacity

…the compiler could allocate the 10-element backing array on the stack (assuming the slice did not escape), because it knew the size at compile time. If you wrote:

tasks := make([]task, 0, lengthGuess)  // variable capacity

…the backing array went to the heap unconditionally, because the compiler could not reserve a frame slot of unknown size.

Go 1.25 closed this gap by emitting a small fixed-size stack buffer, currently 32 bytes, and using it when the actual runtime capacity fits within it. The 32-byte size is a pragmatic compromise: large enough to hold a few elements of common types, small enough to add negligible cost to every function frame that uses it. If lengthGuess would require more space than 32 bytes, the allocation falls back to the heap unchanged. The optimization fires silently for the common case, which is small initial estimates on functions that build modest-sized slices.

What Append Was Doing

The variable-capacity case was one problem. A different one was the pattern of building a slice incrementally with append:

var tasks []task
for t := range c {
    tasks = append(tasks, t)
}
processAll(tasks)

With no initial capacity, the first append allocates a backing array of size 1. The second hits capacity and allocates size 2, copying the first element. The third allocates size 4. This doubling sequence means that for a slice that ends up with five elements, there were three heap allocations (sizes 1, 2, and 4) before the final size-8 array settled. All the intermediate arrays became garbage immediately.

Go 1.26 handles this by routing the early appends through a stack-allocated buffer. When the compiler can prove that the slice’s backing array does not escape the function, it allocates a small stack buffer and uses it for the initial growth sequence. Only when the buffer fills does heap allocation begin. For functions where the slice stays small, the result is zero heap allocations through the entire loop body.

The move2heap Transformation

The more architecturally interesting case is when the slice does escape, because the function returns it:

func extract(c chan task) []task {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    return tasks
}

Here the backing array must eventually live on the heap; the caller’s lifetime extends beyond the function’s stack frame, so a stack pointer would dangle. The old behavior was a sequence of heap allocations during the growth phase, ending with whatever capacity the last doubling left behind.

Go 1.26 handles this with an implicit compiler-inserted call to runtime.move2heap at the return point. The function is conceptually simple: if the slice’s backing array is already on the heap because it overflowed the stack buffer during the loop, move2heap is an identity function. If the data is still in the stack buffer, move2heap allocates a heap copy of exactly the final slice length, copies the data once, and returns the heap-backed slice.

The practical effect is that a function like extract now produces exactly one heap allocation, sized precisely to the number of elements collected, with no intermediate garbage from the growth sequence. The allocation table from the Go team’s allocation blog post makes the progression concrete:

Pattern	Go 1.24	Go 1.25	Go 1.26
`make([]T, 0, constant)`	0 allocs	0 allocs	0 allocs
`make([]T, 0, variable)`, small	1+ allocs	0 allocs	0 allocs
`append` loop, non-escaping, small	3-4 allocs	3-4 allocs	0 allocs
`append` loop, escaping, small	3-4 allocs	3-4 allocs	1 alloc, exact size

What Still Defeats the Optimizer

These optimizations are automatic. No code changes are required to benefit from them in Go 1.25 and 1.26. But several patterns still reliably produce heap allocations that escape analysis cannot eliminate.

Passing values through interface types is the most common one. Every non-zero-width, non-pointer-typed value stored into an interface{} or any requires a heap allocation for the value itself. This is why fmt.Println(42) allocates: the integer is boxed. The fmt package’s variadic signatures accept ...any, boxing every argument. High-throughput logging code tends to use structured loggers precisely to avoid this; the difference between log.Printf("count: %d", n) and slog.Info("count", "n", n) is not just style, it is a meaningful allocation boundary.

Closures capture variables by reference when the variable has its address taken or is reassigned after capture. Those variables escape. The threshold for capture-by-value is narrow (the variable must be read-only and at most 128 bytes), so large struct types captured in goroutine closures routinely end up on the heap.

Maps are a blanket case: make(map[K]V) always allocates on the heap. Go 1.24 improved map throughput with a Swiss table implementation, but the allocation itself cannot be avoided. Minimize map creation in hot paths.

Profile-guided optimization (PGO), stable since Go 1.21, interacts usefully with escape analysis. PGO enables aggressive inlining of hot call sites beyond the default inline budget. When a callee is inlined, escape analysis runs on the merged code and can prove properties that were invisible at the call boundary: a value that appeared to escape through an opaque function call turns out not to escape once the body is visible. Used alongside a solid understanding of escape analysis, PGO compounds with these stack allocation improvements. The Go team has documented cases where PGO-driven inlining alone eliminated millions of allocations per benchmark run.

Auditing Your Own Code

For any allocation-sensitive path, -gcflags="-m=2" remains the most direct tool. Look for moved to heap on variables you expected to be cheap, then trace the assignment chain the compiler reports. Often a single interface conversion or a missed inline is responsible for a cascade of unintended escapes. The Go heap profiler (go tool pprof -alloc_space) shows the allocation sites in your actual program. When runtime.mallocgc consumes more than 15% of CPU time in a profile, reducing the allocation rate is the first place to look.

The work across Go 1.25 and 1.26 closed gaps that affected code patterns virtually every Go developer writes daily. The append-to-collect-and-return idiom is ubiquitous. The fact that it now costs one allocation instead of three or four, without any API change or manual optimization, is the kind of improvement that compounds quietly across large codebases.