Custom Arenas and Extent Hooks: The jemalloc API Behind Meta's NUMA Work
Source: hackernews
Meta’s renewed investment in jemalloc lists NUMA-aware arena assignment as one of its four focus areas. Reading that, you might expect the implementation to require patching the allocator’s core mmap paths or introducing NUMA policy throughout the internals. That is not the approach. jemalloc 5.x ships an extensibility layer called extent hooks that lets external code take full control of how the allocator obtains and releases backing memory. The NUMA work lives in this layer, and understanding how extent hooks work explains both what Meta is building and what the architecture makes possible beyond the typical tuning-knob usage most developers know.
Why Arenas Don’t Fully Solve the NUMA Problem
jemalloc distributes threads across arenas to reduce lock contention. By default it creates 4 * ncpus arenas; threads are assigned round-robin. Each arena manages its own free lists, slab state, and extent tracking independently, so threads on different arenas never contend with each other. This is the mechanism that made jemalloc scale where glibc’s ptmalloc2 did not.
The arena model provides isolation but says nothing about physical placement. On a two-socket server, a thread running on socket 0 may be assigned an arena whose backing memory, sourced via anonymous mmap, was first-touched by a thread on socket 1. Linux’s default first-touch NUMA policy means those pages land on socket 1’s DRAM. Every subsequent access by the socket-0 thread crosses the interconnect, costing 30-60 nanoseconds per cache miss compared to local access. On a memory-intensive service running millions of small allocations per second, this compounds into a measurable latency increase and reduced memory bandwidth.
The fix is to bind each arena’s backing memory to the NUMA node local to the socket its threads run on. The arena structure is the right isolation unit; you need a hook into how the arena obtains extents from the OS. That hook is extent_hooks_t.
The Extent Hooks API
In jemalloc 5.0, the allocator’s OS-level memory management was rewritten from a fixed 2 MB “chunk” model to a variable-size extent model. An extent is an aligned region of virtual address space that jemalloc has obtained from the OS and is managing internally. The allocator carves extents into slabs for small objects, uses them directly for large allocations, and cycles them through dirty and muzzy states before returning them.
The extent hooks interface is a struct of nine function pointers covering every operation the allocator performs on extents:
typedef struct extent_hooks_s {
extent_alloc_t *alloc; // obtain a new extent from the OS
extent_dalloc_t *dalloc; // return an extent to the OS
extent_destroy_t *destroy; // called on arena teardown
extent_commit_t *commit; // make pages within an extent usable
extent_decommit_t *decommit; // decommit pages (keep VA, release RAM)
extent_purge_t *purge_lazy; // MADV_FREE equivalent
extent_purge_t *purge_forced; // MADV_DONTNEED equivalent
extent_split_t *split; // split one extent into two
extent_merge_t *merge; // merge two adjacent extents into one
} extent_hooks_t;
You can override any subset. Hooks left as NULL fall back to jemalloc’s defaults. Most custom implementations override alloc and dalloc and leave the purging and commit hooks to the defaults, which use madvise as usual. The split and merge hooks matter primarily if your backing allocator can’t coalesce arbitrary adjacent regions; for straightforward NUMA-aware mmap, the defaults work fine.
Wiring Up a Custom Arena
The workflow: create a new arena via mallctl, install hooks, then route allocations through it explicitly using mallocx.
#include <jemalloc/jemalloc.h>
// 1. Create a new arena; get its index back
unsigned arena_ind;
size_t sz = sizeof(arena_ind);
je_mallctl("arenas.create", &arena_ind, &sz, NULL, 0);
// 2. Install custom extent hooks on this arena
char ctl_name[64];
snprintf(ctl_name, sizeof(ctl_name), "arena.%u.extent_hooks", arena_ind);
extent_hooks_t *hooks_ptr = &numa_hooks;
je_mallctl(ctl_name, NULL, NULL, &hooks_ptr, sizeof(hooks_ptr));
// 3. Allocate from this arena
void *p = je_mallocx(4096, MALLOCX_ARENA(arena_ind));
je_free(p);
The MALLOCX_ARENA(n) flag routes through arena n instead of the thread’s default arena. The tcache still applies for small allocations; MALLOCX_TCACHE_NONE bypasses it when you want allocations to pass directly through the arena’s slab machinery without caching.
A NUMA-Aware Alloc Hook
The alloc hook receives the requested size and alignment and must return a suitably aligned pointer, or NULL on failure. For NUMA locality, the implementation maps memory and binds pages to the target node before returning:
#include <sys/mman.h>
#include <numaif.h>
// Populated at startup based on CPU topology
static int arena_to_numa_node[MAX_ARENAS];
static void *numa_extent_alloc(extent_hooks_t *hooks, void *new_addr,
size_t size, size_t alignment,
bool *zero, bool *commit,
unsigned arena_ind) {
int node = arena_to_numa_node[arena_ind];
// Over-allocate to satisfy arbitrary alignment, then trim.
size_t map_size = size + alignment;
void *raw = mmap(new_addr, map_size,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (raw == MAP_FAILED) return NULL;
uintptr_t p = (uintptr_t)raw;
uintptr_t aligned = (p + alignment - 1) & ~(alignment - 1);
size_t leading = aligned - p;
size_t trailing = map_size - size - leading;
if (leading > 0) munmap(raw, leading);
if (trailing > 0) munmap((void *)(aligned + size), trailing);
void *ptr = (void *)aligned;
// Bind pages to the target NUMA node before first touch.
unsigned long nodemask = 1UL << node;
if (mbind(ptr, size, MPOL_BIND, &nodemask, node + 2, 0) != 0) {
munmap(ptr, size);
return NULL;
}
*zero = true; // MAP_ANONYMOUS returns zeroed pages
*commit = true; // pages are immediately committed
return ptr;
}
static bool numa_extent_dalloc(extent_hooks_t *hooks, void *addr,
size_t size, bool committed,
unsigned arena_ind) {
munmap(addr, size);
return false; // false signals success in jemalloc's convention
}
static extent_hooks_t numa_hooks = {
.alloc = numa_extent_alloc,
.dalloc = numa_extent_dalloc,
// remaining hooks NULL: jemalloc defaults handle commit/purge/split/merge
};
At startup, create per-NUMA-node arena sets, populate arena_to_numa_node, and assign threads to arenas based on CPU affinity using sched_getaffinity or pthread_getaffinity_np to determine which socket a thread runs on. The slab allocator and tcache machinery run on top of this unchanged; all that changes is where the backing pages come from.
This is structurally close to what Meta’s NUMA work will ship. Their implementation handles topology discovery via numa_node_of_cpu(), threads migrating between nodes, and arena count calibration at varying core counts, as a coherent documented feature rather than a hand-rolled hook. The extensibility point already exists in jemalloc 5.x; the announced work is building a polished, production-tested implementation of the common NUMA case on top of it.
Other Uses of the Same Interface
Extent hooks are general enough to support a few other patterns worth knowing.
Persistent memory. DAX-capable filesystems like ext4-dax expose storage as byte-addressable memory. An alloc hook that uses mmap with MAP_SHARED against a DAX file returns extents backed by persistent storage. jemalloc’s arena and slab machinery then manages objects within that region normally, with all the size-class and fragmentation benefits intact. Several experimental persistent-heap allocators have been prototyped this way, using jemalloc as the substrate rather than reimplementing slab management.
Memory quotas. A wrapper around the default alloc behavior that tracks cumulative bytes returned and fails with NULL once a budget is exceeded implements per-arena memory limits without touching allocator internals. Combined with mallctl per-arena statistics, this gives enforceable budgets at arena granularity inside a process that hosts multiple allocation domains.
Address space isolation. JIT compilation and some sandboxing schemes require heap memory placed within specific virtual address ranges. An alloc hook using mmap with MAP_FIXED_NOREPLACE in a controlled range achieves this while leaving internal layout to jemalloc. The allocator manages the region; you constrain where that region lives.
What Meta’s Upstream Commitment Actually Delivers
Meta is committing to upstream-first development: the NUMA arena work goes into the public jemalloc repository rather than a private fork. The downstream effects are concrete.
FreeBSD ships jemalloc as its system allocator and tracks upstream closely. Redis and Valkey explicitly recommend jemalloc for fragmentation reasons; NUMA locality is useful for Redis on multi-socket servers, and it comes from the upstream package once the feature lands. RocksDB users benefit along the same path, including the many organizations running TiKV, CockroachDB, and cloud storage systems built on RocksDB.
For Rust, tikv-jemallocator bundles jemalloc and exposes it as a global allocator; tikv-jemalloc-ctl provides idiomatic mallctl access. These crates pick up upstream improvements on version bumps. The MALLOCX_ARENA flag and the extent hooks struct are already accessible through the raw tikv-jemalloc-ctl interface; NUMA-specific controls will appear there once the upstream API stabilizes.
The extent hooks interface is one of the least-documented parts of jemalloc relative to its power. Most users know about MALLOC_CONF, decay timers, and jeprof, because those get written about. The extensibility layer that makes the NUMA work possible barely appears in tutorials. Meta’s announced improvements include documentation alongside the features, which may do as much for the ecosystem as the NUMA support itself: there are teams running jemalloc in NUMA-sensitive or memory-constrained environments today who don’t know this API exists.