Daniel Lemire wrote about this pattern back in December 2025, and it is worth revisiting because the problem it solves comes up more often than people admit.
The setup is familiar in systems work. You have a function with a fixed signature, imposed by a library or a stable API contract, and inside that function something expensive happens on every call. The specific case Lemire examines involves map lookups by index: the interface forces you through a map each time, and you cannot change that interface. But the keys and values are not changing on most calls either.
The fix is thread_local storage. Since C++11, you can declare a variable with thread-local storage duration, giving each thread its own independent copy. No contention, no locking, no coordination overhead.
int expensiveLookup(int index) {
thread_local std::unordered_map<int, int> cache;
auto it = cache.find(index);
if (it != cache.end()) return it->second;
int result = slowMapAccess(index);
cache[index] = result;
return result;
}
The thread-local cache absorbs repeated lookups within a thread’s lifetime, and because each thread owns its cache independently, synchronization primitives are not needed at all. For multi-threaded workloads hitting the same function frequently, the reduction in overhead can be substantial.
What makes this useful beyond the specific case is how broadly it applies. Legacy codebases often have exactly these characteristics: interfaces frozen by backwards compatibility, internal implementations that made sense at lower call frequencies, and no realistic path to a full redesign. Thread-local caching is a targeted intervention that leaves the interface and the surrounding architecture untouched.
There are real limits to account for. The cache grows unbounded without eviction logic, which adds its own complexity once you implement it. Thread-local storage also has initialization overhead per thread, so frequently spawning short-lived threads will eat into the gains. And if the underlying data changes, you need an invalidation strategy or the cache becomes a liability.
For the common case, read-heavy and stable data accessed through a fixed interface, the pattern holds up well. It belongs in the same category as RAII and other C++ idioms that look simple on first read but handle a whole class of problems that would otherwise require significant structural changes.