Property-Based Verification Is Not Just Better Testing

Most codebases that claim to use property-based testing are doing something closer to structured fuzzing. The framework generates random inputs, runs them against your function, and checks that some invariant holds. If it runs 10,000 cases without finding a failure, you get some confidence. Maybe the framework shrinks a counterexample when it does find one. This is useful, and it is genuinely better than writing individual test cases by hand for many problem shapes. But it is not verification.

Adolfo Ochagavía’s recent post draws the distinction clearly by walking through a real situation where the goal was not just to gain confidence but to actually prove a property holds across a complete input space. That shift in intent changes everything about how you approach the problem, and it is a shift that most teams never make.

What Property-Based Testing Actually Gives You

QuickCheck, the Haskell library that launched the whole genre in 2000, was designed by Koen Claessen and John Hughes around a simple observation: the interesting bugs in functions that operate on structured data are rarely in the cases you think to write by hand. They show up at the edges of your mental model. Random generation with shrinking is a systematic way to probe those edges without requiring you to enumerate them yourself.

The core loop is: generate random inputs satisfying some precondition, run the function, assert the postcondition, shrink any failures to the smallest reproducing case. Hypothesis in Python, proptest in Rust, fast-check in TypeScript, and jqwik in Java all work on variations of this model. They are excellent at finding bugs. They are not designed to prove their absence.

The practical difference shows up in how you interpret a clean run. After 100,000 cases with no failure, you know the code passed 100,000 cases. You do not know it passes all inputs. For many applications, that is fine. For a parser handling untrusted input, a cryptographic primitive, or a network protocol implementation, “probably correct” is a weaker claim than you might want.

The Verification End of the Spectrum

Actual property-based verification uses exhaustive or bounded-exhaustive analysis. Instead of sampling the input space, the tool either enumerates all possible inputs (feasible only for small domains), uses symbolic execution to reason about all execution paths simultaneously, or applies bounded model checking to prove that no counterexample exists within some bound.

Kani, the Rust verifier from AWS, represents the current state of the art for this in the Rust ecosystem. It translates Rust MIR (mid-level intermediate representation) into a form that CBMC (the C Bounded Model Checker) can analyze, then runs bounded model checking over it. You write proofs that look syntactically similar to proptest tests, but the semantics are different: instead of sampling, Kani explores all paths up to some depth bound and either proves the property holds or returns a concrete counterexample.

#[kani::proof]
fn verify_add_no_overflow() {
    let a: u32 = kani::any();
    let b: u32 = kani::any();
    kani::assume(a <= u32::MAX / 2);
    kani::assume(b <= u32::MAX / 2);
    assert!(a.checked_add(b).is_some());
}

The kani::any() call does not generate a random value; it introduces an unconstrained symbolic value. The verifier reasons over all possible values simultaneously. If the assertion can fail for any combination satisfying the preconditions, the tool finds it. If it cannot, the tool proves it. AWS has used Kani to verify properties of the Rust standard library, the s2n-tls TLS implementation, and several internal services. The Kani verifier blog documents several of these efforts with concrete results.

Why the Tooling Looks So Similar

The surface similarity between property-based testing and property-based verification is not accidental. Both ask you to describe the shape of valid inputs and the invariants the code should satisfy. This is a useful abstraction for both goals. The difference is entirely in what happens after you write the property.

This convergence in interface creates a practical migration path. A codebase that already has proptest harnesses can often add Kani proofs without redesigning the test infrastructure, because the property descriptions transfer directly. The bolero crate in Rust makes this explicit: it provides a uniform API that can run the same property under libFuzzer, honggfuzz, or Kani, depending on what kind of analysis you want. Write the property once; choose the analysis depth separately.

#[test]
#[cfg_attr(kani, kani::proof)]
fn check_my_parser() {
    bolero::check!().with_type::<MyInput>().for_each(|input| {
        let result = parse(input);
        assert!(result.is_ok() || result.unwrap_err().is_recoverable());
    });
}

Under normal testing, this samples from the input space. Under Kani, it verifies within the configured bound. The same property description does both jobs, which means you pay the cost of careful property design only once.

When Verification Is Worth the Cost

Bounded model checking is not free. For large input spaces or complex control flow, verification can take hours or fail to complete within any practical time budget. The tools give you knobs for bounding loop iterations and recursion depth, but these bounds affect what you are actually proving. A proof valid for inputs up to 64 bytes is not a proof valid for all inputs, and you need to be honest about that scope in how you document the results.

This cost structure means verification earns its keep in specific circumstances. Protocol parsers are the clearest case. If you are implementing a parser for a well-specified wire format, you can often enumerate the properties the spec requires (length fields match actual lengths, reserved bits are zero, no truncated payloads allowed) and verify them exhaustively for all inputs within a reasonable size bound. The cost is high but one-time; the payoff is a documented, machine-checked claim that certain classes of vulnerabilities cannot exist in the covered input range.

Security-critical functions benefit similarly. The seL4 microkernel verification project, while using a different methodology (interactive theorem proving in Isabelle/HOL), demonstrated that full functional correctness for a real kernel component is achievable, at high cost but with strong guarantees. Property-based verification with tools like Kani operates at lower cost and weaker guarantees, but in the same spirit: you are making a machine-checked claim about all inputs, not just asserting that your tests passed.

Data structure invariants are another natural fit. A balanced tree, a memory allocator’s free list, a ring buffer with producer and consumer indices: these have tight invariants that hold for all valid states, and the state spaces are small enough for bounded verification to cover completely. For code that manipulates these structures, a Kani proof that the invariants are preserved under all valid operations is significantly stronger than any number of proptest runs.

The Testing Pyramid Does Not Capture This

The standard testing pyramid (unit, integration, end-to-end) describes a tradeoff between speed and confidence: fast cheap tests at the bottom, slow expensive tests at the top, with the implication that you want more of the former than the latter. Property-based verification does not fit this framing. It is slow and expensive like an integration test, but it produces qualitatively different information.

A passing unit test says “this input worked.” A passing Kani proof says “this property holds for all inputs satisfying these constraints, up to this bound.” The kind of claim is different, not just the degree. This is closer to a type system than to a test suite: you are adding machine-checked constraints to your codebase that document what the code is guaranteed to do, and those constraints get rechecked automatically whenever the code changes.

The Erlang community understood this relatively early. Ericsson and Volvo used PropEr and commercial QuickCheck for stateful system verification long before the current wave of bounded model checkers became accessible. Their experience showed that the effort of specifying properties precisely, which feels like overhead when you are used to writing examples, pays back compounding returns as the codebase grows and changes hands.

Ochagavía’s post is worth reading for the concrete details of where this approach was applied and what the verification effort revealed in practice. The broader point it illustrates is that property-based verification is not a quantitative improvement over property-based testing. It is a different tool, appropriate in different circumstances, and the distinction matters when you are deciding what confidence level your application actually requires.