The BPF Verifier's Blind Spot: How State Pruning Breaks Spinlock Safety
Source: lobsters
The BPF verifier is one of the most sophisticated static analysis engines in the Linux kernel. It walks every possible execution path of a BPF program, tracking register types, value ranges, reference counts, and memory access patterns, all to guarantee that user-supplied code cannot corrupt the kernel. For most people who interact with eBPF, the verifier is a black box that either accepts or rejects their program. But its internal architecture has a recurring weakness: every new feature that introduces stateful resources requires an explicit update to the verifier’s state comparison logic, and when that update is missed, the verifier can be tricked into blessing an unsafe program.
The spinlock pruning bug documented here is a clean example of this pattern. Understanding it properly requires understanding how the verifier avoids exponential path explosion in the first place.
State Pruning: The Verifier’s Core Optimization
The verifier explores all execution paths through a BPF program. For a program with branches, loops, and helper calls, the number of paths grows exponentially without some form of caching. The verifier’s solution is state pruning, also called the “safe state” cache.
At each instruction in the program, the verifier maintains a list of previously-verified states, stored in an explored_states hash table keyed by instruction index. A state encodes the full context of the verifier at that point: the type and value range of every register, the contents of every stack slot, and the status of any open references. When the verifier reaches an instruction for the second time (via a different path), it checks whether the current state is subsumed by any previously-cached safe state via the states_equal() function. If yes, it prunes the current path, trusting that the cached result covers it.
The critical invariant is that states_equal() must compare every component of state that affects safety. If any component is missing from the comparison, then a cached safe state (where that component was benign) can incorrectly subsume a new, unsafe state (where that component is dangerous).
This is exactly what happened with spinlocks.
How bpf_spin_lock Works
Linux 5.1 introduced bpf_spin_lock as a way to protect map values from concurrent modification. The lock is a 32-bit value embedded inside a BPF map value (not on the stack, not as a global), and user programs acquire and release it via two helper functions:
void bpf_spin_lock(struct bpf_spin_lock *lock);
void bpf_spin_unlock(struct bpf_spin_lock *lock);
The verifier enforces a strict set of rules: no nested locks, no bpf_tail_call while holding a lock, no map lookups that could acquire a second lock, and every lock acquisition must be paired with a release on every exit path. These constraints are enforced by tracking lock state in struct bpf_verifier_state:
struct bpf_active_lock {
struct bpf_reg_state *ptr; /* the map value holding the lock */
u32 id; /* register identity */
};
struct bpf_verifier_state {
/* ... */
struct bpf_active_lock active_lock;
/* ... */
};
When the verifier processes a bpf_spin_lock() call, it checks that active_lock.ptr is NULL (no nested locks) and then sets it to the pointer being locked. When it processes bpf_spin_unlock(), it clears the field and invalidates all registers that pointed into the now-unlocked map value, preventing use-after-unlock access.
The Pruning Bug
The bug emerged from the interaction between lock state tracking and states_equal(). In the flawed implementation, states_equal() compared register types, stack slots, and reference state, but it did not compare active_lock. This omission created the following scenario:
- Path A acquires the lock, performs some work, releases the lock, and reaches instruction N with
active_lock.ptr == NULL. The verifier caches this state as safe at instruction N. - Path B acquires the lock, performs different work, and reaches instruction N while still holding the lock (
active_lock.ptr != NULL). - The verifier compares Path B’s current state against the cached state from Path A. Because
active_lockis not part of the comparison, the states appear equal. The verifier prunes Path B, declaring it safe based on the Path A cache. - Path B exits the program without ever calling
bpf_spin_unlock(). The verifier does not catch this.
The result is a BPF program that the verifier accepts but that exits while holding a spinlock. The kernel’s BPF spinlock implementation disables preemption around the lock (using preempt_disable() on non-SMP or architecture spinlock primitives on SMP), so a program that exits without releasing the lock leaves a CPU either with a permanently held spinlock or with preemption disabled, depending on architecture. Any subsequent attempt to acquire that lock from another CPU or program spins indefinitely. This produces a soft lockup or a system hang.
The fix is straightforward: add active_lock to the states_equal() comparison.
/* states are not equivalent if lock status differs */
if (!!cur->active_lock.ptr != !!old->active_lock.ptr)
return false;
if (cur->active_lock.ptr &&
cur->active_lock.id != old->active_lock.id)
return false;
With this check in place, a state where the lock is held cannot be subsumed by a state where it is not, and the verifier correctly identifies Path B as requiring further analysis.
The Recurring Architecture Problem
What makes this bug class interesting beyond the specific spinlock case is that it has recurred multiple times in BPF verifier history, each time a new stateful feature was added.
Reference tracking was added to ensure that file descriptors opened within a BPF program (sockets, iterators) are closed on all exit paths. When it was first introduced, states_equal() did not compare reference state correctly, allowing programs that leaked references to pass verification. The fix followed the same pattern: add reference state to the comparison.
Dynamic pointers (bpf_dynptr), introduced in Linux 5.19, had similar issues. The verifier’s state for a bpf_dynptr includes whether the pointer has been initialized, whether it is associated with a particular buffer, and whether it has been consumed. Early implementations did not fully capture dynptr state in states_equal(), leading to pruning bugs where uninitialized dynptrs were treated as equivalent to initialized ones.
Open-coded iterators, added in Linux 6.4, brought another round of the same problem. An iterator has a notion of being “open” or “exhausted”, and pruning across those states produces programs that access exhausted iterators as if they were still valid.
In every case, the root cause is the same: bpf_verifier_state acquired a new field representing a new resource type, but states_equal() was not updated to compare it. The verifier’s safety guarantee only holds when the comparison is complete.
What Makes This Hard to Catch
The difficulty is not conceptual. The fix, once identified, is a few lines of comparison logic. The difficulty is that BPF programs which trigger the bug must be specifically crafted to exercise the right combination of paths, and the bug only manifests at verification time in a way that produces an acceptance when rejection was warranted. Normal BPF selftests tend to test that correct programs pass and obviously broken programs fail, rather than testing that programs which are subtly broken via path interaction are caught.
The kernel’s BPF selftest suite, located under tools/testing/selftests/bpf/, includes test_spin_lock.c and associated failure cases, but coverage of the pruning interaction specifically requires tests that construct the exact path structure that triggers incorrect pruning. These tests are typically added after the bug is found, not before.
This points to a broader gap in verifier testing methodology. The verifier is essentially a theorem prover for a small domain, and theorem provers benefit from property-based or model-checked test generation far more than from handwritten cases. The BPF community has explored using fuzzing (via bpf_fuzzer and Syzkaller) to find verifier bugs, and Syzkaller has found several verifier correctness issues over the years. But the combinatorial space of path interactions, particularly around pruning, remains difficult to cover systematically.
The Safety Model and Its Limits
The BPF verifier’s approach to safety is fundamentally different from the approach taken by, say, the Rust borrow checker. The borrow checker proves safety through a type system applied compositionally to every expression; its guarantees are structural and grow with the language’s type system. The BPF verifier proves safety through path enumeration with abstract interpretation; its guarantees are procedural and depend on the completeness of the abstract domain.
The distinction matters here. When Rust adds a new ownership concept, the type checker either accepts or rejects it at the type level, and the safety proof is built into the type rules. When BPF adds a new resource concept (a spinlock, a dynptr, an iterator), the safety proof depends on the verifier’s abstract state correctly modeling that resource, and the states_equal() function correctly comparing it during pruning. There is no structural guarantee that a new feature will be correctly integrated; it requires explicit, manual attention at every point in the verifier where state is compared or transferred.
The spinlock pruning bug and its relatives are not random implementation mistakes. They are the predictable output of a verification architecture where safety depends on completeness of a data structure that grows incrementally. Each new BPF feature is an opportunity for an omission. The fixes are correct and the verifier today is substantially more robust than it was at 5.1, but the underlying tension remains: a verifier that proves safety via abstract interpretation must maintain an ever-expanding abstract domain, and every gap in that domain is a potential safety hole.
For anyone building systems on top of BPF, particularly systems that use bpf_spin_lock to protect shared map state, the practical takeaway is to run recent kernels. The fixes for spinlock pruning have been backported to stable trees, but the specific version depends on your distribution. The selftests in the kernel tree are the best way to confirm that your kernel has the correct behavior for the lock/unlock path combinations your programs rely on.