Bill Hall’s memory allocation strategies series is one of the better pedagogical resources for systems programmers who want to understand what lives below malloc. The six-part walkthrough builds from a linear allocator through stack allocators, pool allocators, and free list variants, with complete C implementations at each step.
What the series doesn’t linger on is the conceptual shift underneath those implementations. The mechanical question is how each allocator works. The more useful question is what property of your program’s data each allocator exploits. Every custom allocation strategy is a claim about the lifetime topology of your objects, and once you see it that way, the right allocator for a given problem becomes considerably easier to identify.
What malloc Actually Costs
General-purpose allocators like ptmalloc2 (glibc’s default), jemalloc, and tcmalloc are impressive engineering. They handle arbitrary allocation sizes, arbitrary free ordering, and thread safety across arbitrary access patterns. That generality has a price.
To support arbitrary free ordering, an allocator must track metadata for every live allocation. The classic approach embeds a header before each allocated block. Those headers consume real cache lines, and on workloads that allocate millions of small objects, they interleave with your data throughout memory, degrading spatial locality. Fragmentation compounds this over time: after hours of mixed alloc/free traffic, the heap develops holes, and a large allocation may fail even when total free bytes exceed the request.
Daan Leijen’s 2019 mimalloc paper quantifies this concretely. mimalloc beats ptmalloc2 and jemalloc on most benchmarks by sharding free lists per page rather than globally per size class, measuring 7% faster than tcmalloc on redis and 14% faster than jemalloc on cfrac. But the improvement is bounded by the fundamental cost of supporting arbitrary deallocation. The fastest allocator in every benchmark, across every language and runtime, remains the bump allocator that doesn’t support individual frees at all.
The Bump Allocator’s Contract
A linear allocator works by advancing a pointer into a backing buffer on each allocation:
void *linear_alloc(Linear_Allocator *a, size_t size, size_t align) {
uintptr_t curr_ptr = (uintptr_t)a->buf + a->curr_offset;
uintptr_t offset = align_forward(curr_ptr, align);
offset -= (uintptr_t)a->buf;
if (offset + size > a->buf_len) return NULL;
a->prev_offset = offset;
a->curr_offset = offset + size;
return &a->buf[offset];
}
Alignment is a bitwise round-up, keeping align a power of two so the mask is a single AND operation. Allocation is two arithmetic instructions and a bounds check. There is no free function for individual objects; you reset the entire arena with a single store.
This is not a limitation to work around. It is a contract to satisfy in your program’s design. If you can use a bump allocator, it means all objects allocated from that arena share a common lifetime: they become valid together and invalid together. The allocator encodes that relationship in its interface.
Pool Allocators and Homogeneous Size
A pool allocator supports individual frees, but only for objects of a single fixed size. During initialization it carves a backing buffer into N equal chunks and links them into a free list by storing the next pointer inside the free space of each unallocated chunk:
void pool_init(Pool_Allocator *p, void *buf, size_t buf_len,
size_t chunk_size, size_t chunk_align) {
uintptr_t start = align_forward((uintptr_t)buf, chunk_align);
size_t count = (buf_len - (start - (uintptr_t)buf)) / chunk_size;
for (size_t i = 0; i < count; i++) {
Pool_Free_Node *node = (void *)(start + i * chunk_size);
node->next = p->head;
p->head = node;
}
}
Allocation pops from the free list head; free pushes back. Both are O(1) with no fragmentation, because every block is identical. The contract here is object homogeneity: all allocations serve the same type, or at least the same size. The Linux kernel’s SLUB allocator is built on exactly this premise; each kmem_cache is a pool for one object type, which is why kernel object allocation is substantially cheaper than userspace malloc for common kernel structures.
Pool allocators become essential on any hot path that allocates objects with independent lifetimes but uniform size: particle systems, network packet buffers, ECS component stores, database page caches. All of those cases have heterogeneous lifetimes but homogeneous sizes, which is precisely the trade-off the pool handles.
How Production Systems Apply This
The most instructive example of these patterns in a production C codebase is PostgreSQL’s MemoryContext. Every query, function invocation, and transaction owns a MemoryContext. All allocations within that context go to its arena via palloc(). When the context is destroyed, the entire memory tree is freed in one traversal of the context hierarchy.
This design has essentially zero fragmentation in practice. A query allocates hundreds of nodes for its plan tree, builds intermediate results, and runs to completion. When it finishes, MemoryContextDelete() frees everything regardless of how individual objects interleaved during execution. Most objects never need an explicit pfree() call because the context’s lifetime is the object’s lifetime.
LLVM’s BumpPtrAllocator serves the same role for IR nodes within a compilation unit. The entire LLVMContext owns arenas, and at compilation end the whole context is destroyed. Clang’s ASTContext works the same way; LLVM’s own documentation notes that BumpPtrAllocator is 5-10x faster than equivalent malloc calls for AST node allocation, because the entire compilation unit’s allocations share a single lifetime and no individual tracking is required.
Game engines use per-frame linear allocators for the same reason. The per-frame scratch memory in Unreal, Unity, and similar engines allocates from an arena that resets at the end of each frame. Particle updates, animation interpolation, render command lists, collision broadphase scratch buffers: most per-frame working data fits this pattern, and a frame arena reset is a single store instruction. This eliminates a substantial fraction of dynamic allocation traffic across a typical frame.
The Language-Level Consequences
These patterns are decades old, but language designers have increasingly taken them seriously in their standard libraries and type systems.
C++17 standardized std::pmr::memory_resource along with concrete implementations: monotonic_buffer_resource (a bump allocator) and unsynchronized_pool_resource (a pool allocator). The polymorphic_allocator<T> lets you supply any memory_resource to standard containers:
char buffer[65536];
std::pmr::monotonic_buffer_resource arena(buffer, sizeof(buffer));
std::pmr::vector<int> v(&arena);
// v's internal storage comes from the stack buffer;
// freed automatically when arena goes out of scope
Zig makes the commitment explicit at the language level. There is no implicit global allocator; every allocating function takes an explicit std.mem.Allocator parameter, which is a vtable of alloc/resize/free function pointers. The standard library ships ArenaAllocator, FixedBufferAllocator, MemoryPool(T), and GeneralPurposeAllocator with leak detection in debug builds. Allocation strategy is not an afterthought because the language does not permit it to be one.
Rust’s ecosystem settled on a practical middle path. The bumpalo crate provides a bump allocator that integrates with the ownership model:
use bumpalo::Bump;
let bump = Bump::new();
let x = bump.alloc(MyStruct { field: 42 });
let v = bumpalo::vec![in ≎ 1, 2, 3];
// everything freed when `bump` drops
The arena’s lifetime bounds all references into it at the type level. The borrow checker enforces the “all objects share a lifetime” contract at compile time. You cannot hold a reference past the arena’s reset because the type system won’t allow it; the contract that was implicit in C becomes structural in Rust.
Odin, Bill Hall’s language, threads allocators through the implicit context parameter. Any procedure can swap the active allocator for a scope, and all allocating operations within that scope use the new allocator without explicit threading through every call site. The programmer commits to a lifetime structure; the allocator mechanism enforces it transparently.
The Pattern Underneath the Patterns
The gingerbill series reads as a progression from simple to complex, but it is really a taxonomy of lifetime structures. The linear allocator encodes “all objects live until reset.” The stack allocator encodes “objects live until the scope that created them exits.” The pool allocator encodes “objects of this type have independent lifetimes but uniform size.” The free list allocator encodes “I cannot commit to any lifetime structure; handle the general case.”
General-purpose allocators are what you use when you haven’t analyzed your data’s lifetime structure carefully enough, or when you genuinely cannot know it at design time. Neither is inherently wrong, but both are expensive in proportion to the structure left unexploited.
The interesting work in memory allocation is not in the allocator implementations themselves; those are mostly solved problems. It is in the program design that makes the cheaper strategies applicable. That design work doesn’t fit in a tutorial series, but the series is a necessary prerequisite. Without understanding what each allocator buys you, you cannot make the structural decisions that let you use it.