State Pruning and the Spinlock Safety Gap in the BPF Verifier

The Linux kernel’s eBPF verifier is one of the more impressive feats of in-kernel static analysis in any production codebase. It proves, at load time, that a BPF program will terminate, won’t access memory out of bounds, and won’t violate a set of safety invariants specific to whatever kernel context the program runs in. When it works, it works remarkably well. When it doesn’t, the bugs that slip through tend to be structurally interesting rather than just careless.

The tale of fixing eBPF spinlock issues in the Linux kernel is exactly that kind of bug. It sits at the intersection of two hard problems: the correctness of a state-pruning algorithm operating over an enormous search space, and the temporal nature of lock-held state, which is exactly the kind of property that interval-domain abstract interpretation handles poorly.

What `bpf_spin_lock` Is and Isn’t

bpf_spin_lock was introduced in Linux 4.19 (commit d83525ca62cf) by Alexei Starovoitov. From the BPF program’s perspective, the struct is deliberately minimal:

struct bpf_spin_lock {
    __u32 val;
};

The kernel enforces that this struct is embedded in a BPF map value (not a per-CPU map, not a ringbuf), aligned to a 4-byte boundary, and declared using BTF metadata at map creation time. Underneath, the runtime helper in kernel/bpf/helpers.c delegates to a queued spinlock implementation:

notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock *, lock)
{
    ___bpf_spin_lock(lock);
    return 0;
}

The notrace attribute is not incidental. Excluding these functions from ftrace and kprobes prevents re-entrant tracing while a lock is held, which would be a straightforward deadlock vector. The kernel is being deliberate about where instrumentation is allowed.

What BPF programs cannot do with these locks is just as important as what they can do. The verifier enforces a strict set of prohibitions: no nested bpf_spin_lock calls, no bpf_tail_call while holding a lock (tail calls replace the program stack, so the unlock would never execute), no returning from a BPF-to-BPF subprog while locked, no sleepable helpers in sleepable program contexts, and no program exit with a lock still held. These rules live primarily in kernel/bpf/verifier.c, enforced through process_spin_lock() and check_helper_call().

The Verifier’s State Problem

The verifier simulates every possible execution path through a BPF program. For programs with branches and loops, the naive approach is exponential in the number of paths. The verifier manages this with state pruning: it caches explored states at each instruction, and when the current state is “dominated by” (more general than) a previously-explored state, it prunes, treating the current path as already proven safe.

The equivalence check lives in states_equal() (or equiv_states(), depending on the kernel version). Two verifier states are considered equivalent if their tracked register types, value ranges, liveness information, and other metadata are compatible. This is where spinlock bugs tend to appear.

The field tracking lock state in struct bpf_verifier_state started as a simple boolean:

bool active_spin_lock;  /* a lock is held on this path */

In later kernel versions it was refined to a pointer that identifies the specific locked object:

struct bpf_reg_state *active_spin_lock;  /* pointer to the locked register */

The bug class is consistent across its various incarnations: states_equal() fails to compare the lock-held state correctly, allowing the verifier to prune a path with a lock held as equivalent to a path without one (or vice versa). At that point, the verifier has accepted a program with an execution path it never actually analyzed. The runtime behavior depends on which path runs: a lock that’s never released causes a kernel deadlock, and a forbidden helper called while a lock is held violates the safety invariants the verifier was supposed to guarantee.

Why This Is Structurally Hard

The difficulty isn’t that kernel developers are careless about locks. It’s that state pruning creates an optimization pressure that cuts directly against the kind of conservative reasoning lock safety requires.

Effective pruning requires that the equivalence predicate be as coarse as possible. The more things two states have to agree on before they’re considered equivalent, the fewer states get pruned, and the more exponential the verification becomes. Every additional field added to the equivalence check potentially multiplies verification time for programs near the complexity limit.

Lock-held state is particularly awkward here because it’s a temporal property. Whether a lock is held at a given instruction depends on the entire execution history up to that point, not just the current register values. The verifier’s abstract domain, which tracks value ranges and register types using something close to interval arithmetic, isn’t naturally suited to temporal properties. It can track that a register contains a pointer to a map value, but tracking that a specific lock within that map value was acquired three instructions ago requires encoding execution history into the abstract state, which is exactly what the active_spin_lock pointer is trying to do.

The transition from a boolean to a pointer-identity comparison in states_equal() illustrates the iterative nature of the fix. A boolean tells you whether any lock is held; a pointer tells you which specific lock. Two states holding different locks on the same map are not equivalent and must not be pruned as if they were. The verifier needs to know the identity of the locked object, not just its existence.

This connects to a broader theme in BPF verifier development. The academic work “PREVAIL: Polynomial-Runtime EBPF Verifier using an Abstract Interpretation Layer” (published around 2019) examined exactly this class of issue: properties that require temporal reasoning fall outside what polynomial-time abstract interpretation can express without specific domain extensions. The BPF verifier is doing something close to bounded model checking under significant time pressure, and lock state is one of the cases that expose the gap.

The Fix Pattern

For virtually every spinlock-related verifier bug, the fix follows the same structure. Find where states_equal() fails to include the lock-held state in its comparison. Add it. Verify that the comparison is by pointer identity when the lock identity matters, not just by boolean presence. Test that programs previously accepted but unsafe are now rejected, and that previously-rejected programs that are actually safe are still accepted.

The complication is that overly conservative equivalence checks are not free. The BPF test suite and real-world programs like those used by Cilium and Katran exercise the verifier at the limits of what it can handle. A fix that causes previously-verifiable programs to hit the instruction limit is a real regression, even if it improves safety. The kernel developers working on this, including Alexei Starovoitov, Martin KaFai Lau, and Andrii Nakryiko, have to balance both sides.

New lock interactions arrive with each major verifier extension. Sleepable BPF programs (introduced in Linux 5.10) brought a new constraint: the verifier had to reject bpf_spin_lock calls in sleepable context, because sleeping with a spinlock held is a kernel invariant violation. bpf_for_each_map_elem() callbacks introduced a question of whether the caller’s lock state propagates into the callback’s verifier context. bpf_timer callbacks running in softirq context raised the question of whether a timer callback could deadlock against a BPF program on the same CPU that had already acquired a lock on the same map value. Each extension requires the verifier to reason about lock state across a new kind of context boundary.

What This Reveals About the Verifier Model

The BPF verifier is not a formal proof system. It’s an in-kernel static analyzer that has to run in polynomial time, on programs submitted by (in some configurations) unprivileged users, without ever producing false negatives on safety-critical properties. That’s a hard constraint set. The verifier accepts this by being conservative: it rejects programs it cannot prove safe, even if they might be safe at runtime.

Spinlock bugs are the cases where this conservatism had a hole. The verifier was accepting programs it hadn’t fully analyzed because state pruning discarded paths that weren’t equivalent. The fix is to tighten the equivalence predicate, at a modest cost in verification time, until the pruned paths genuinely are safe to prune.

For anyone running BPF programs in production, the practical takeaway is to keep the BPF-related kernel packages current. The verifier is actively maintained, and the patches addressing these classes of issues are not backported uniformly to all stable branches. Long-term stable kernels used in cloud environments are sometimes several verifier revisions behind the fixes in mainline.

For anyone writing BPF programs: the constraints around bpf_spin_lock exist because the verifier has to prove safety statically. Keeping locked sections minimal, avoiding helper calls inside locked regions, and never crossing BPF-to-BPF call boundaries while holding a lock are not just style preferences. They reflect the actual boundaries of what the verifier can track without creating the kind of state-pruning blind spot that these bugs exploit.