State Pruning and the Spinlock the BPF Verifier Forgot to Track

The BPF verifier is one of the more impressive pieces of infrastructure in the Linux kernel. Every BPF program submitted for loading gets subjected to a full symbolic execution pass before it ever runs. The verifier explores every reachable execution path, tracking register types and value ranges, ensuring memory accesses are in-bounds, guaranteeing that acquired resources get released. It is a real static analysis engine living inside the kernel itself.

But static analysis at scale requires shortcuts, and the most important shortcut the verifier takes is called state pruning. Understanding how state pruning works is essential to understanding why spinlock bugs in the verifier keep happening, and this recent investigation into one such bug illustrates the failure mode more clearly than most security write-ups do.

What State Pruning Actually Does

The verifier explores BPF programs through exhaustive symbolic execution. At each instruction, it maintains a bpf_verifier_state struct that captures everything it knows at that point: the type and value range of all 11 BPF registers, the state of each stack slot, acquired references, and various other bookkeeping. When it hits a conditional branch, it explores both sides, so the number of paths can grow exponentially.

Looped programs (supported since Linux 5.3 with bounded loops) make this worse. Without mitigation, even small programs could require billions of verification steps.

State pruning solves this by caching previously verified states. When the verifier reaches an instruction, it first checks whether it has already verified a state at that instruction that is more permissive than the current state. If the cached state is a superset of the current state in terms of knowledge, then anything safe in the less-constrained cached state is guaranteed safe for the more-constrained current state. The current path can be skipped.

The correctness of pruning depends entirely on what “more permissive” means. The function states_equal() in kernel/bpf/verifier.c implements this comparison. It checks registers via regsafe(), stack slots through stacksafe(), and reference tracking through comparisons of the refs array. If all of those pass, the verifier considers the states equivalent and prunes the current path.

The structural vulnerability is obvious once you see it: every safety invariant the verifier enforces must be represented in states_equal(). Any invariant that is not included in the comparison becomes invisible to the pruning oracle. The oracle can then declare two states equivalent when they are not, causing the verifier to skip paths it should verify.

Spinlocks in BPF

BPF spinlocks were introduced in Linux 5.1 by Alexei Starovoitov via commit d83525ca62cf. The mechanism is straightforward: a map value can embed a struct bpf_spin_lock, and BPF programs can call bpf_spin_lock() and bpf_spin_unlock() to serialize access to the surrounding data.

struct my_value {
    struct bpf_spin_lock lock;
    __u64 counter;
};

SEC("xdp")
int my_prog(struct xdp_md *ctx) {
    struct my_value *val = bpf_map_lookup_elem(&my_map, &key);
    if (!val)
        return XDP_PASS;
    bpf_spin_lock(&val->lock);
    val->counter++;
    bpf_spin_unlock(&val->lock);
    return XDP_PASS;
}

The verifier enforces several invariants around spinlocks. You cannot call bpf_spin_lock() while already holding a lock. You cannot call most other helpers while holding a lock. Most importantly, you must have released the lock by the time the program exits. A BPF program that exits while holding a spinlock would deadlock the next invocation on the same CPU core, since the lock is per-map-value and physically embedded in the map data.

To track this, the verifier maintains active_lock in bpf_verifier_state:

struct bpf_active_lock {
    void *ptr;  /* map or local kptr holding the lock */
    u32  id;    /* register ID of the base pointer */
};

When bpf_spin_lock is called, env->cur_state->active_lock is populated. When bpf_spin_unlock is called, it is cleared. At every BPF_EXIT instruction, the verifier checks that active_lock.ptr is null. This is how the “no exit while locked” invariant is enforced.

The Bug

The problem is that active_lock was not part of the states_equal() comparison. When the verifier’s pruning oracle compared two states to decide whether one could substitute for the other, it looked at registers and stack slots and reference counts, but it did not look at whether a spinlock was held.

This creates a concrete failure path. Consider a program with a conditional branch:

Path A reaches instruction N with no lock held. The verifier fully verifies path A from instruction N to exit. This state gets cached in explored_states at instruction N.
Path B reaches instruction N with a lock held. The verifier checks states_equal() between path B’s state and the cached path A state. Since active_lock is not compared, the states look equivalent. Path B is pruned.
Path B’s continuation, which eventually executes BPF_EXIT while still holding the lock, is never verified.
The program loads successfully. At runtime, it can exit while holding a spinlock.

This is the exact class of soundness failure that state pruning is supposed to prevent. The pruning oracle over-approximated equivalence by ignoring a dimension of state, causing it to skip a verification path that would have caught a real violation.

The fix is conceptually simple: add active_lock to the comparison in states_equal(). Two states should only be considered equivalent for pruning purposes if they have identical spinlock state. A state holding a lock on pointer P is not a superset of a state with no lock held, nor the reverse.

/* Simplified version of the required check */
if (!!cur->active_lock.ptr != !!old->active_lock.ptr)
    return false;
if (cur->active_lock.ptr &&
    (cur->active_lock.ptr != old->active_lock.ptr ||
     cur->active_lock.id  != old->active_lock.id))
    return false;

With this in place, path B in the scenario above would not match the cached path A state, pruning would not occur, and the verifier would continue down path B, reach the locked exit, and correctly reject the program.

A Recurring Pattern

The spinlock pruning bug is not an isolated incident. The BPF verifier has gone through this same cycle several times with different safety invariants.

Reference tracking went through the same failure mode earlier. When BPF gained the ability to acquire kernel references via kfuncs (typed pointers to kernel objects), those references needed to be tracked and released before exit. The refs array in bpf_verifier_state tracks acquired references. Early versions of states_equal() did not correctly compare refs arrays, meaning programs that leaked kernel references could have their exit paths pruned against reference-free states and pass verification. Patches from Kumar Kartikeya Dwivedi in the 6.1 and 6.2 timeframe tightened this considerably.

Dynptr state followed the same pattern. bpf_dynptr objects introduced in Linux 6.1 have an initialized/uninitialized distinction that affects safety. Initially, dynptr initialization state was not fully represented in the pruning comparison, allowing programs with uninitialized dynptrs on some paths to pass verification.

The structural reason this keeps happening is that the verifier is built incrementally. New features introduce new state that must be tracked at exit. But states_equal() must be explicitly updated to include every new dimension of safety-relevant state. There is no mechanism that automatically makes new state part of the pruning oracle. Each new feature requires a deliberate, separate decision to update the comparison function.

The BPF verifier documentation describes the state machine and pruning in general terms, but the correctness requirement for states_equal() is not stated as a formal invariant. It is an informal obligation on every developer who adds new state to bpf_verifier_state.

Why Privilege Mitigates the Impact

Loading BPF programs requires CAP_BPF or CAP_SYS_ADMIN in most kernel configurations. Unprivileged BPF was effectively disabled by default starting around Linux 5.15 due to ongoing concerns about the verifier’s attack surface. This substantially limits the practical security impact of verifier soundness bugs: an attacker who can already load BPF programs typically has significant system access already.

That said, verifier soundness matters for reasons beyond privilege escalation. Cloud environments where tenants can load BPF programs (via eBPF-based observability tools, for example) depend on the verifier to maintain kernel integrity. A tenant who can construct a BPF program that exits while holding a spinlock can cause the next BPF program invocation on the same CPU to deadlock, which is a denial-of-service vector against the host kernel’s BPF infrastructure.

What This Means for Verifier Development

The BPF verifier source in kernel/bpf/verifier.c is around 18,000 lines as of Linux 6.x. The states_equal() function and its callees are a relatively small part of that, but they are load-bearing for the correctness of every optimization the verifier performs to stay within its complexity limits.

The pattern of fixes suggests that the BPF community is aware of this structural issue. Patches to BPF locking have progressively tightened the active_lock representation: it started as a simple boolean, became a {ptr, id} pair to support multiple possible lock targets, and now needs to be fully included in equivalence checks. You can track this evolution through the BPF mailing list archives.

One longer-term direction that has been discussed is fuzzing the verifier’s pruning decisions directly: given two states that states_equal() considers equivalent, verify that a program accepted from one state is also safe when started from the other. Tools like syzkaller already target the BPF subsystem, and there have been proposals for verifier-specific property-based testing. The spinlock bug described in the article is exactly the kind of issue such testing could surface systematically, rather than requiring a developer to manually notice the gap.

For now, the fix is correct and contained. The broader obligation, though, is that every developer extending bpf_verifier_state with new safety-relevant fields needs to treat states_equal() as a required update, not an optional one. The verifier’s guarantees are only as strong as the weakest dimension its pruning oracle ignores.