Stack First, Heap Later: How Go 1.25 and 1.26 Changed the Allocation Default
Source: go
The Go Blog’s retrospective on allocation optimizations, published in February 2026, describes specific changes to how the compiler handles slices in Go 1.25 and 1.26. The headline is “fewer heap allocations,” but the more interesting story is about how escape analysis has always been correct yet conservative, and what it took to narrow that gap.
Why Heap Allocation in Go Is Expensive
Go’s allocator is modeled on tcmalloc, Google’s thread-caching malloc. The path from a slice allocation to actual memory involves three layers: a per-P mcache for each logical processor, a per-size-class mcentral protected by a mutex, and a global mheap arena backed by OS memory.
For small objects under 32 KB, the runtime finds the appropriate size class among roughly 70 options, checks the mcache free list, and returns the next slot. Under low contention, this costs 20 to 40 nanoseconds. That cost includes mandatory zero-initialization (Go guarantees all memory starts zeroed), a size-class lookup, and a free-list pointer update. Stack allocation is a register decrement: 0.5 to 2 nanoseconds.
When a P’s mcache for a given size class runs out, the runtime falls back to mcentral, which requires acquiring a mutex. Contended allocation costs 100 to 300 nanoseconds. Objects above 32 KB bypass the per-P cache entirely and hit mheap with a global lock.
Beyond raw allocation cost, every heap pointer write in Go goes through the write barrier for the concurrent tri-color GC, adding 1 to 3 nanoseconds per store. Heap objects remain live for the GC to scan, contributing to mark phase work. Stack allocations carry neither overhead. They are reclaimed automatically when the function returns, with no GC involvement.
The difference between stack and heap allocation in Go is substantial. In tight loops and high-throughput paths, it amounts to 1 nanosecond versus 40 nanoseconds per object, a ratio that compounds quickly in allocator-heavy code.
How Escape Analysis Decides Where Values Live
The Go compiler performs escape analysis as part of compilation, constructing a constraint graph over every variable and function parameter. It solves the graph by checking whether any reference flows to a sink: the heap, returned pointer values, interface boxes, channel sends, or calls to functions the compiler cannot inspect.
If a flow path exists from a variable to a sink, the variable escapes to heap. If not, it stays on the stack. You can observe the compiler’s decisions with:
go build -gcflags="-m" ./...
Adding a second -m prints the full flow chain, showing exactly which assignment or return caused the escape. The analysis is conservative by design. It runs once at compile time, produces permanent decisions, and has no runtime correction path. When the compiler cannot prove a variable is stack-bounded, it defaults to heap.
That conservatism has a direct cost. Building a slice with a runtime-variable capacity, or growing a slice through successive appends, historically fell into the conservative bucket: the compiler moved the backing store to the heap even when the actual runtime behavior would have stayed small.
The Slice Allocation Cascade
Consider the most basic accumulation pattern:
func process(c chan task) {
var tasks []task
for t := range c {
tasks = append(tasks, t)
}
processAll(tasks)
}
Before Go 1.26, the first several appends each allocated a new backing array: capacity 1, then 2, then 4, then 8. Each intermediate array became garbage immediately after the next resize. The slice eventually reached a useful size, but only after producing several short-lived heap objects.
A related problem appeared with variable-capacity creation:
func process3(c chan task, lengthGuess int) {
tasks := make([]task, 0, lengthGuess)
for t := range c {
tasks = append(tasks, t)
}
processAll(tasks)
}
Before Go 1.25, the backing store was unconditionally heap-allocated whenever lengthGuess was a runtime variable. The compiler could not prove at compile time that the capacity would fit on the stack. Both patterns appear throughout real Go programs, which is why the Go team targeted them across two consecutive releases.
Go 1.25: Speculative Stack Buffers
Go 1.25 introduced a speculative 32-byte backing store for variable-capacity slice creation. When the compiler sees make([]T, 0, n) with a runtime-variable n, it emits a small stack-allocated buffer alongside the slice header. At runtime, if n * sizeof(T) fits within those 32 bytes, the backing store is served from the stack. If not, the runtime falls through to a heap allocation as before.
The tradeoff is deliberate. The 32-byte buffer is wasted stack space when the slice grows beyond it, but stack space is cheap and the buffer is reclaimed with the stack frame. For the common case of small or empty initial slices, this eliminates the heap allocation entirely.
Go 1.26: move2heap and Deferred Allocation
Go 1.26 extends the idea in two directions. The first covers append-driven slices with no initial capacity hint, collapsing the allocation cascade by providing a stack-backed initial buffer for the first append.
The second optimization covers slices that do escape. Consider:
func extract(c chan task) []task {
var tasks []task
for t := range c {
tasks = append(tasks, t)
}
return tasks
}
The slice is returned, so it escapes. Before 1.26, the compiler marked the backing store as heap-allocated from the start, and every intermediate resize produced garbage at heap allocation cost. In 1.26, the compiler rewrites the function internally to:
func extract(c chan task) []task {
var tasks []task
for t := range c {
tasks = append(tasks, t)
}
tasks = runtime.move2heap(tasks)
return tasks
}
move2heap checks whether the backing array already lives on the heap. If the slice grew past the stack buffer during the loop, it does, and move2heap returns it unchanged. If the slice stayed small, move2heap copies it to a correctly-sized heap allocation before the return.
The effect is that when heap allocation happens, it is sized correctly for the final slice, not for any intermediate state. The double-allocation penalty that previously occurred when a function built a slice internally and then returned it collapses into a single allocation at the right size. The Go Blog describes this as deferred allocation, and the framing is accurate: the decision about whether to heap-allocate is pushed to the last responsible moment rather than made pessimistically at slice creation.
Inlining as a Multiplier
These optimizations compose with inlining. When the compiler inlines a function, it applies escape analysis in the caller’s context. A function returning *T appears to require a heap allocation from the outside, but if inlined into a caller that uses the result locally, the allocation can remain on the stack.
The PGO improvements in Go 1.22 through 1.24 raised the inlining budget for hot functions and transitively enabled more stack allocations. The 1.25 and 1.26 slice optimizations address the complementary cases: slice-building patterns that cross a function boundary or return their results to an external caller, where inlining cannot help.
How Other Languages Handle This
Java’s HotSpot JIT performs escape analysis at runtime, after sufficient profiling. When an object is proven not to escape a method, the JIT can scalar-replace it, decomposing its fields into stack slots and eliminating the allocation entirely. In practice, Java’s escape analysis is fragile. It depends on the JIT having inlined the full call chain, any method call outside the inlined scope defeats it, and there is no static way to inspect these decisions. The analysis can produce different results between runs.
Rust’s approach is structurally different. Values are stack-allocated by default; Box<T>, Rc<T>, and Arc<T> are explicit heap allocations. The borrow checker encodes lifetime relationships at compile time, preventing references from outliving their owners. Rust cannot silently heap-allocate a value because escape analysis is uncertain: the programmer expresses allocation intent explicitly. The tradeoff is that Rust requires reasoning about lifetimes at the call site, something Go deliberately offloads to the compiler.
C++‘s RVO and NRVO (Return Value Optimization and Named RVO) address a narrower problem: eliminating redundant copies when a function returns a value. C++17 mandates copy elision for specific return patterns. These are copy-elimination optimizations, not escape analysis. C++ does not silently promote stack allocations to heap the way Go can, because the programmer controls heap access explicitly through new and smart pointers.
What Changes in Practice
The Go Blog notes that existing code benefits without modification. The patterns targeted, building slices from channels, collecting results in loops, returning accumulated data, appear throughout real Go programs. For developers who have been pre-sizing slices defensively to reduce allocation pressure, these changes lower the cost of getting the estimate wrong.
For code that builds and returns a slice, move2heap collapses multiple intermediate allocations into a single correctly-sized one at return time. Heap allocation count drops, garbage produced during the function body drops, and GC scan work decreases proportionally.
The approach is conservative in the productive sense: the optimizations are transparent, the fallback path is unchanged, and the behavior is inspectable with existing -gcflags tooling. That transparency distinguishes Go’s escape analysis from Java’s JIT-time equivalent. You can read the compiler’s reasoning directly, adjust your code in response, and verify the result without running a profiler. The 1.25 and 1.26 optimizations narrow the gap between what the compiler could prove and what programmers knew to be true at runtime, and they do so without requiring any changes to existing code.