Splitting the Heap: Why Meilisearch's Three-Allocator Strategy Is Architecturally Sound
Source: lobsters
The investigation Clément Renault published on Meilisearch’s allocator choices is framed as a comparison exercise: jemalloc versus bumpalo versus mimalloc, each with its own failure mode. The implicit conclusion is that you eventually pick one winner. The actual conclusion is that Meilisearch settled on all three, with each covering a distinct allocation pattern. That outcome serves a service with genuinely different allocation profiles in different phases of operation, and Rust’s allocation architecture supports exactly this kind of decomposition.
Understanding why requires understanding how Rust structures its allocation model, and where bumpalo fits compared to the other two.
Two Levels of Allocation in Rust
Rust has two separate interfaces for custom allocation. The first is GlobalAlloc, a stable trait that backs the #[global_allocator] attribute. Every Box, Vec, String, HashMap, and heap allocation in the process routes through this one implementation.
[dependencies]
tikv-jemallocator = "0.5"
#[global_allocator]
static ALLOC: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;
This single declaration changes the allocator for every heap operation in the binary. jemalloc and mimalloc operate at this layer.
The second interface is Allocator, a nightly trait that attaches a custom allocator as a generic parameter to individual collections. Vec<T, A: Allocator>, Box<T, A: Allocator>. Individual data structures choose their own allocator, independently of the global default.
// requires #![feature(allocator_api)]
let arena = bumpalo::Bump::new();
let mut v: Vec<u32, &bumpalo::Bump> = Vec::new_in(&arena);
v.extend([1, 2, 3]);
// v and all its contents freed when arena drops
bumpalo implements the nightly Allocator trait. On stable Rust, it ships its own collection types that bypass the trait:
use bumpalo::collections::Vec as BumpVec;
let arena = bumpalo::Bump::new();
let mut v = BumpVec::new_in(&arena);
v.push(42u32);
The practical effect is identical either way: allocations for that collection go into the arena, bypassing whatever jemalloc or mimalloc is handling globally.
Rust removed jemalloc as its default global allocator in version 1.28 (August 2018), citing portability. jemalloc requires C compilation support that is not uniformly available across all targets, particularly embedded and bare-metal targets. The system allocator is always available; jemalloc is not. For server-side applications on Linux, opting back in is typically worthwhile.
Why These Two Layers Exist
The GlobalAlloc layer and the Allocator layer are not redundant designs competing for the same role; they serve categorically different purposes.
GlobalAlloc is about choosing the best general-purpose allocator for the heap as a whole. When a long-running service allocates and frees many varied structures across many threads, the allocator’s fragmentation characteristics, memory return behavior, and thread-local cache design all affect production behavior. jemalloc’s per-arena design, where each arena has independent free lists and threads are distributed across arenas to reduce lock contention, and its size-class bucketing that packs same-sized allocations into contiguous slabs, are specifically aimed at workloads with varied allocation sizes and long process lifetimes.
Allocator is about opting out of the general-purpose heap entirely for specific workloads. An arena allocator like bumpalo operates differently: every allocation is a pointer increment into a contiguous slab, and the concept of fragmentation within the arena’s lifetime does not apply. There are no free lists, no per-object metadata, no size-class lookup. When the arena drops, the entire slab frees in one operation. An arena allocator serves workloads where a batch lifetime model is the correct abstraction; for those workloads it is faster and simpler than any general-purpose design.
How Meilisearch’s Workload Uses Both Layers
Meilisearch’s allocation profile splits naturally by phase.
During indexing, the system processes document batches and builds intermediate structures: inverted index entries, field norms, prefix trees for prefix search, sort buffers for LMDB writes. These allocations are varied in size, short-lived within the batch, and numerous. This is a workload that generates fragmentation: many allocations of irregular sizes, freed in an order unrelated to allocation order, leaving the general heap with a mix of live and freed regions that cannot be coalesced. Switching to jemalloc brought Meilisearch’s fragmentation ratio from approximately 3:1 to 1.3:1 (resident set to allocated bytes), because jemalloc’s slab design keeps same-sized allocations adjacent and the size-class system limits internal fragmentation to under 20% per allocation.
During search, the constraint is different. Each query triggers parsing, facet traversal, candidate scoring, and ranking. The allocations are smaller and more uniform, but they are on the latency-critical path. Overhead from interacting with a general-purpose allocator adds up across the call stack.
bumpalo addresses the search path. Query parsing can allocate all its intermediate structures into a per-request Bump arena. When the request completes, the arena drops and all of that memory frees in one call, with no per-object deallocation cost.
fn handle_search(index: &Index, query: &str) -> SearchResults {
let arena = bumpalo::Bump::new();
let tokens = tokenize_with_arena(&arena, query);
let parsed = parse_query(&arena, tokens);
let results = execute(index, parsed);
results
// arena drops here; all scratch allocations freed in one operation
}
This function does not interact with jemalloc for its scratch allocations. The two allocators run alongside each other: jemalloc handles the long-lived heap and bumpalo handles short-lived scratch work. The critical discipline is keeping the arena lifetime tightly scoped to the operation. Extending a Bump past the function that needs it turns the performance advantage into a memory accumulation bug, and heap profilers that sample by call site will eventually surface the pattern, but only if you correlate growth rate with query volume rather than elapsed time.
The C++ Parallel
C++ arrived at the same decomposition through std::pmr, standardized in C++17. std::pmr::memory_resource is a polymorphic base class for allocators, and std::pmr::monotonic_buffer_resource provides the arena equivalent of bumpalo’s Bump:
std::pmr::monotonic_buffer_resource pool;
std::pmr::vector<int> v{&pool};
v.push_back(1);
// pool destructor frees all memory
std::pmr::polymorphic_allocator lets any standard container accept a memory_resource* at construction time. The key design difference: C++17 PMR is runtime-polymorphic via virtual dispatch through memory_resource, while Rust’s Allocator is a generic parameter resolved at compile time. Rust’s approach eliminates virtual dispatch overhead at the cost of more monomorphization. For hot query paths, that tradeoff favors Rust’s approach.
The C++ ecosystem has been slower to adopt pmr than its design deserves. Most library code does not accept std::pmr::polymorphic_allocator because doing so changes the type signature in ways that break callers expecting the default. Rust has the same fundamental problem with Allocator being generic: library authors must opt in explicitly, and doing so forces all downstream users to either instantiate both allocator variants or commit to one.
What Stabilizing the Allocator Trait Would Change
The Allocator trait has been on nightly behind #![feature(allocator_api)] since roughly 2019. It remains unstabilized as of mid-2025, and the friction is real. Stable Rust code that wants to express “allocate this into an arena” must either use bumpalo’s own collection types, depend on nightly, or work around the limitation with less ergonomic patterns.
Stabilization would let ecosystem crates expose arena-compatible APIs in their public interfaces without forking standard collection types. A query parser crate could return Vec<Token<'a>, &'a Bump> and work correctly on stable Rust. Right now that requires a nightly dependency, which cascades through every downstream user.
The open questions around stabilization center on fallible allocation. The Allocator trait supports allocators that can return Err from allocate, but most existing code assumes allocation succeeds or panics. Reconciling those assumptions requires either propagating Result through collection operations or accepting that most callers will panic on allocation failure. The pragmatic resolution is probably the latter, acknowledging that infallible allocation is the common case and the trait’s Err path exists for specialized embedded or no-std contexts.
The Practical Decomposition
Meilisearch’s three-allocator result reflects a genuine decomposition of the service’s allocation needs, and Rust’s allocation architecture supports exactly that decomposition. At the global level, choosing jemalloc provides fragmentation control for the long-lived heap and diagnostic tooling via the mallctl API for production incidents. At the hot-path level, bumpalo eliminates allocator overhead for per-request scratch work where the batch lifetime model holds.
The two decisions are independent. Changing the global allocator does not affect bumpalo arenas. Adding bumpalo to a service already using jemalloc requires no jemalloc tuning changes. Each allocator operates on its own allocation pool and serves its own workload class.
For any Rust service with a clear distinction between long-lived heap state and short-lived per-operation scratch work, using both layers of Rust’s allocation model produces better outcomes than finding the single best general-purpose allocator. The GlobalAlloc and Allocator interfaces exist because these two workload profiles are common enough to need distinct solutions, and the Meilisearch investigation is a concrete demonstration of what that looks like in a production system.