· 7 min read ·

How Go's Compiler Learned to Move Slices Off the Heap

Source: go

The Go team published a post in late February 2026 titled Allocating on the Stack, walking through what the compiler has been doing to slice allocations across the 1.25 and 1.26 releases. Coming back to it now, a few weeks later, the details are worth sitting with. These aren’t headline features, but they represent a meaningful shift in what Go’s compiler can prove automatically, without touching any user code.

Why Heap Allocation Has a Real Cost

Every time Go allocates memory from the heap, the runtime does work. It finds the right size class (there are 67 of them, up to 32 KB), consults the per-P mcache, falls back to the mcentral or mheap if the cache is cold, and eventually the garbage collector reclaims that memory. Stack allocation sidesteps all of that. Memory comes from the goroutine’s existing stack frame at no real cost, and it disappears when the function returns.

Go’s FAQ captures the philosophy: from a correctness standpoint, you don’t need to know whether a variable is heap- or stack-allocated. But from a performance standpoint, the distinction matters, and the compiler uses escape analysis to make that determination.

The escape analysis is observable directly:

go build -gcflags='-m' ./...

This prints per-variable decisions like x escapes to heap or x does not escape. Each “does not escape” is a value that stays on the stack. The question, and the work the Go team has been doing, is how many of those decisions the compiler can get right.

The Slice Growth Problem

Slices are a struct with three fields: a pointer to a backing array, a length, and a capacity. The struct itself is small and often stack-allocated. The backing array is where heap allocations happen.

Consider the standard pattern:

func process(c chan task) {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

Before the recent improvements, each append that exceeded capacity triggered a new heap allocation. Go doubles slice capacity on growth, so the allocations look like: backing array of size 1, then 2 (size-1 array becomes garbage), then 4, then 8. The early allocations are short-lived garbage. Experienced Go developers learned to avoid this by pre-allocating:

func process2(c chan task) {
    tasks := make([]task, 0, 10)
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

This works, but it requires knowing a reasonable bound. It also spreads allocation reasoning throughout the codebase.

What Go 1.25 Changed

Go 1.25 added two related optimizations for slice backing arrays.

The first case is constant-size make calls. When the capacity argument is a compile-time constant and the slice’s backing array doesn’t escape the function, the compiler allocates the backing array directly in the stack frame. The process2 example above, written with a constant capacity, now produces zero heap allocations. Nothing else changes; no annotations, no code rewrites.

The second case handles variable-size make:

func process3(c chan task, lengthGuess int) {
    tasks := make([]task, 0, lengthGuess)
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

Here the compiler doesn’t know the capacity at compile time. Go 1.25 handles this by allocating a 32-byte stack buffer and using it if the requested capacity fits within those 32 bytes. If lengthGuess is small enough, the backing array stays on the stack entirely. For larger requests, it falls through to heap allocation as before. This is a conditional alloca-style pattern, without Go actually having alloca.

What Go 1.26 Added

The 1.26 change is more interesting because it handles the original pattern without any make call at all:

func process(c chan task) {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    processAll(tasks)
}

When the compiler can prove that tasks doesn’t escape process (meaning processAll doesn’t store a reference to it somewhere that outlives the call), it intercepts the first append. Instead of allocating from the heap, it uses a small pre-allocated stack buffer. Subsequent appends fill that buffer. Only when the buffer overflows does it allocate from the heap, and at that point it allocates exactly what’s needed.

The exponential doubling at small sizes is gone. The allocations at sizes 1, 2, and 4 that used to be garbage are eliminated.

The escaping case is where the compiler transformation gets genuinely clever:

func extract(c chan task) []task {
    var tasks []task
    for t := range c {
        tasks = append(tasks, t)
    }
    return tasks
}

Here the slice must end up on the heap because it’s being returned. The compiler handles this by inserting a call to an internal runtime.move2heap() function. During the loop, all the intermediate appends use the stack buffer. At return time, move2heap allocates exactly the right number of bytes on the heap and copies the final contents over. One heap allocation, sized precisely for the data.

Before this change, the equivalent manual optimization was: build the slice with appends, count the elements, allocate a final backing array of the exact size, copy. The compiler now performs this transformation automatically. What used to be a pattern that senior Go developers applied by hand in hot paths is now the default behavior.

The Inliner’s Role

These optimizations depend entirely on escape analysis, and escape analysis depends on inlining in ways that aren’t obvious at first.

When the compiler calls processAll(tasks), it needs to prove that processAll doesn’t cause tasks’s backing array to escape. If processAll is not inlined, the compiler can’t see what happens inside it and must conservatively assume the backing array escapes. When processAll is inlined, the compiler can trace the value through the call boundary and may prove it safe to stack-allocate.

This is why profile-guided optimization (PGO), introduced in Go 1.21, matters for allocation counts. PGO lets the compiler inline hot functions past the normal 80-AST-node budget. Those inlining decisions cascade into escape analysis decisions, which cascade into allocation decisions. A Go 1.21+ program built with go build -pgo=auto may see allocation improvements on hot paths without any code changes, just because more functions got inlined.

go build -gcflags='-m -m' ./...

The double -m flag prints both inlining decisions and escape decisions together. When you see a value escaping to the heap, checking the inlining output often shows which un-inlined function call is the cause.

How Other Languages Handle This

Rust handles slice growth at the type level. Vec<T> always allocates from the heap; you choose stack storage by choosing a fixed-size array [T; N] or reaching for a crate like smallvec, which keeps small vectors on the stack and spills to the heap when they exceed a threshold. There’s no escape analysis and no compiler transformation. The programmer decides explicitly, and the type system tracks it. It’s predictable but requires more decisions at the point of writing code.

The JVM does something conceptually similar through escape analysis in the JIT. HotSpot can eliminate heap allocations for short-lived objects through scalar replacement, where an object that doesn’t escape a scope gets decomposed into its individual fields on the stack. The difference is timing: JVM EA runs at runtime with profiling data, after the program has been running long enough to gather information. Go’s escape analysis runs at compile time, which means consistent behavior from the first request without warmup.

Java’s ArrayList<T> always allocates from the heap, similar to Go’s var tasks []task before these improvements. JVM EA focuses on eliminating entire short-lived objects, not specifically on the intermediate-allocation pattern that occurs inside a growing collection. The move2heap transformation Go 1.26 performs, building in a stack buffer and finalizing to a single heap allocation, doesn’t have a clean JVM equivalent.

Practical Implications

For most Go code, these improvements are passive. Upgrade to Go 1.25 or 1.26, run go test -bench=. -benchmem, and you may see reduced allocs/op without touching a line of code.

For hand-optimized code that pre-allocates slices to avoid growth overhead, the manual optimization is still valid. The compiler won’t fight it. In cases where you have a reliable upper bound and can pass it to make, the constant-size stack allocation path in Go 1.25 gives you the best possible outcome: zero heap allocations.

The cases that benefit most from the 1.26 changes are functions that build slices from channels or iterators where the element count isn’t known in advance. That pattern appears constantly in real Go programs, which is probably why the team focused there.

If you need to verify that a specific allocation is being optimized, escape analysis flags are the tool:

go build -gcflags='-m -m' ./...

And if an optimization causes a regression (edge cases where the stack buffer adds overhead), the behavior can be disabled:

go run -gcflags=all=-d=variablemakehash=n ./...

The broader pattern across these two releases is the compiler taking on decisions that previously required human judgment. The allocation profile of idiomatic Go code is improving without asking programmers to write differently. That’s a reasonable goal for a language that positioned implicit memory management as a feature from the start, and the escape analysis improvements in 1.25 and 1.26 are a concrete step toward closing the gap between what’s idiomatic and what’s efficient.

Was this interesting?