· 6 min read ·

State Pruning and the Spinlock: The Recurring Safety Gap in the BPF Verifier

Source: lobsters

The BPF verifier is marketed as the thing that makes eBPF safe to run in the kernel. Programs are statically analyzed before execution; anything the verifier cannot prove safe gets rejected. In practice this works remarkably well, but the verifier is a large and complex piece of software, and its complexity creates its own class of bugs. A recent write-up by Rohan Varma on fixing eBPF spinlock issues in the Linux kernel is a good case study in how these bugs work and why they keep recurring.

How the Verifier Tracks Spinlock State

bpf_spin_lock landed in Linux 5.1 via commit d83525ca62cf by Alexei Starovoitov. The data structure itself is minimal: struct bpf_spin_lock { __u32 val; }. It must be embedded in a BPF map value, not placed on the stack or in arbitrary memory. Programs acquire and release it via bpf_spin_lock() and bpf_spin_unlock().

The verifier tracks whether a lock is currently held via a field in bpf_func_state called active_spin_lock. This field holds the register ID of the pointer to the held lock, or zero if no lock is currently held. The tracking happens in process_spin_lock() inside kernel/bpf/verifier.c:

// On bpf_spin_lock():
if (cur->active_spin_lock) {
    verbose(env, "Illegal nesting of bpf_spin_lock\n");
    return -EINVAL;
}
cur->active_spin_lock = reg->id;

// On program exit:
if (env->cur_state->active_spin_lock) {
    verbose(env, "BPF program cannot exit with a held lock\n");
    return -EINVAL;
}

The verifier must prove three things on all execution paths: a lock cannot be acquired twice without being released first, a lock cannot be released without being held, and no lock can be held when the program exits. The third condition matters because on non-RT kernels, bpf_spin_lock() is implemented via __bpf_spin_lock_irqsave, which disables local IRQs on the acquiring CPU. If a program exits without calling bpf_spin_unlock(), IRQs stay disabled and the CPU stops responding to interrupts. The result is a kernel hang with no recovery path short of a reboot.

State Pruning: The Scalability Mechanism That Introduced the Bug

The BPF verifier works by abstract interpretation. It simulates program execution across all possible paths, tracking register types, value ranges, memory safety properties, and lock state at each instruction. This is sound but expensive. The kernel caps verification at 1,000,000 instructions processed via BPF_COMPLEXITY_LIMIT_INSNS. Programs with complex branching would routinely hit this limit without some form of pruning.

State pruning is the mechanism that makes verification tractable. When the verifier reaches an instruction it has visited before, it checks whether the current abstract state is “more constrained” than a previously cached state at the same instruction. If so, it prunes: it stops exploring that branch on the grounds that the cached state’s proof covers everything the current state could reach.

The comparison happens in states_equal() and func_state_equal(). These functions compare registers, frame structure, stack contents, reference counts, and other tracked properties. If all comparisons pass, the states are considered equivalent and the branch is pruned.

The bug was that active_spin_lock was not included in this comparison.

The consequence follows directly from the omission. The verifier first explores a path where active_spin_lock == 0 at instruction N, verifies the remaining code as safe, and caches that state. Later, it reaches instruction N via a different path, this time with active_spin_lock != 0 because a lock is held. It calls states_equal(), which returns true because the only difference between the two states is a field it does not check. Pruning fires. The remaining instructions are never verified. The program is accepted.

At runtime, when execution follows that second path, the program exits with a lock held. IRQs stay disabled. The kernel hangs.

The fix is a single additional comparison in states_equal():

static bool states_equal(struct bpf_verifier_env *env,
                         struct bpf_verifier_state *old,
                         struct bpf_verifier_state *cur)
{
    if (old->active_spin_lock != cur->active_spin_lock)
        return false;
    /* ... register and frame comparisons ... */
}

One line. But the cost is non-trivial: states differing in lock-held status can no longer be treated as equivalent for pruning purposes, which means more paths get verified for programs with complex branching near lock boundaries. Verification time increases, and programs sit closer to the complexity limit.

The PREEMPT_RT Variant

There is a second, distinct bug class here that does not involve state pruning at all. On kernels built with CONFIG_PREEMPT_RT, BPF spinlocks are not implemented with IRQ-disable. They become sleeping locks backed by rtmutex. The API is identical from the BPF program’s perspective, but the runtime semantics are completely different.

On an RT kernel, a BPF program holding a “spinlock” can be preempted. If another BPF program on the same CPU then tries to acquire the same lock, the result is a potential deadlock. The verifier cannot catch this: it has no model of the runtime scheduler, and the spinlock operations look exactly the same regardless of kernel configuration.

This is a harder problem than the state-pruning bug. The state-pruning bug was a missing comparison in a function. The PREEMPT_RT problem is that the verifier’s safety proofs are constructed against an execution model that does not match actual runtime behavior on RT kernels. The BPF + PREEMPT_RT integration effort tracked across LKML threads from 2021 onward addresses this by reworking the underlying lock primitive on RT kernels to use something that preserves non-preemptibility without IRQ-disable, so that the verifier’s model remains sound. It is the more architecturally difficult fix because it requires changing what the kernel does at runtime rather than what the verifier checks.

A Structural Pattern, Not a One-Off Bug

The spinlock pruning bug is not an isolated incident. The same class of problem has appeared repeatedly as new kernel features have added state to bpf_func_state:

  • active_rcu_lock, introduced with bpf_rcu_read_lock and bpf_rcu_read_unlock in Linux 5.18
  • Reference tracking for typed kernel pointers (kptrs) in Linux 5.19
  • BPF timers in Linux 5.15
  • BPF iterators in Linux 6.4

The pattern is consistent: implement a feature, add tracking state to bpf_func_state, update states_equal(), ship it, discover that states_equal() missed a subtle case, fix it, add regression tests. Eduard Zingerman has done systematic work on making states_equal() sound across these cases, but the underlying structural problem remains: there is no compiler or type system enforcement that flags a new field in bpf_func_state as requiring inclusion in the equivalence check. Developers have to remember to do it, and sometimes they do not.

This is a classic case of a code invariant that should be enforced by the language or tooling but is instead maintained by convention. In a codebase the size of the Linux kernel, conventions break down under the pressure of contributor turnover, time constraints, and the sheer scope of what the verifier tracks. The BPF selftests under tools/testing/selftests/bpf/ help catch regressions, but negative test coverage for verifier bugs is inherently incomplete: you have to know what the bug looks like before you can write a test that would have caught it.

What This Means for the BPF Safety Model

The BPF safety model rests on the claim that the verifier proves programs safe before they run. In practice that claim is qualified in important ways. The verifier is software; it has bugs. Most BPF program types require CAP_BPF or CAP_SYS_ADMIN, which limits exposure to privileged users who could crash the kernel through other means. But there is ongoing work to expand unprivileged BPF program types, and that work raises the security stakes for every verifier correctness bug considerably.

The spinlock case illustrates the gap between “the verifier checks this property” and “this property holds at runtime.” The verifier checked that active_spin_lock was zero at program exit, but it checked it on the wrong set of paths due to the pruning omission. The property it proved was true of a model that did not accurately represent the program’s actual control flow under pruning.

Proving memory safety, type safety, and termination for programs that directly manipulate kernel memory is genuinely difficult engineering. The BPF verifier does this well across an enormous range of programs. But the complexity required to make verification tractable, particularly state pruning, creates a surface where adding new tracked state correctly is harder than it looks from the outside. Every new feature that touches bpf_func_state requires a developer to also touch states_equal() in the right way, with no automated check that they did. The spinlock bug shows what happens when that surface is not fully covered: a program receives verifier approval it does not deserve, and the kernel pays the cost at runtime.

Was this interesting?