· 6 min read ·

The Generator Is Half the Property

Source: lobsters

The technique has been explained many times. You describe what must always be true rather than what happens for a specific input. The framework generates thousands of cases. Shrinking narrows failures to their minimal form. The combination finds bugs that example-based tests miss.

What gets less attention is the generator. Every property-based test depends on a generator that defines the input space. The generator is, in a precise sense, half of the specification: the property says what must be true, and the generator says what “for all inputs” actually means. Tutorials tend to skip this because the default generators for primitive types look simple, but in real-world application of the technique, like the kind shown in Adolfo Ochagavía’s case study, designing the right generator takes most of the work.

What “For All Inputs” Actually Means

The canonical property-based test looks like this:

proptest! {
    #[test]
    fn sort_then_sort_is_idempotent(mut xs: Vec<i32>) {
        xs.sort();
        let first = xs.clone();
        xs.sort();
        assert_eq!(first, xs);
    }
}

Here the generator is implicit: proptest knows how to generate Vec<i32> values because i32 implements the Arbitrary trait, and vectors of arbitrary types are handled by composition. The input space is, roughly, all finite vectors of 32-bit signed integers. For a sorting function that operates on integers, that is a reasonable definition of the domain.

Real code operates on structured data with complex validity constraints. A DNS message parser receives byte sequences that must conform to a specific format. A configuration parser receives strings representing some subset of valid TOML or YAML. A payment processing function receives transaction records with field dependencies that are obvious to domain experts and invisible to a random generator. In these cases, using the default generator for Vec<u8> or String means spending most of the test budget on inputs the code would immediately reject, and shrinking toward shorter strings that are also immediately rejected.

Filtering Versus Shaping

The first instinct when dealing with constrained inputs is to filter with assume. Proptest exposes this through prop_assume!, and Hypothesis offers the equivalent with assume(). The pattern looks like:

proptest! {
    #[test]
    fn verify_my_parser(input: Vec<u8>) {
        prop_assume!(looks_like_valid_packet(&input));
        let result = parse_packet(&input);
        prop_assert!(result.is_ok());
    }
}

The failure mode here is quiet. If the filtering predicate rejects most random inputs, the framework burns through its budget on rejected cases and effectively tests almost nothing. Hypothesis raises an error when it detects excessive filtering; its documentation describes the condition as the strategy being unsatisfiable at the required rate. Proptest’s default behavior is to silently skip filtered cases up to a configured limit and count only the accepted ones toward the test budget. You run a thousand cases and believe you have coverage, but nine hundred seventy were filtered before the parser ever ran.

The correct approach is to shape the generator rather than filter its output. Instead of generating arbitrary byte vectors and selecting for packet-shaped ones, you write a generator that produces valid packets directly:

fn arbitrary_packet() -> impl Strategy<Value = Vec<u8>> {
    (0u16..512u16, any::<u8>(), prop::collection::vec(any::<u8>(), 0..512))
        .prop_map(|(payload_len, flags, payload_data)| {
            let mut packet = Vec::new();
            packet.extend_from_slice(&payload_len.to_be_bytes());
            packet.push(flags);
            let actual_len = payload_len as usize;
            packet.extend_from_slice(
                &payload_data[..actual_len.min(payload_data.len())]
            );
            packet
        })
}

This generator always produces structurally consistent packets. The test budget goes toward running the parser across the space of valid inputs, which is what verification of parser correctness actually requires.

The Shrinking Dependency

Generator design also determines the quality of shrinking, and shrinking is where a property-based framework’s diagnostic value comes from.

Proptest’s strategy-tree approach shrinks by reducing each component of the strategy independently. A Vec<u8> generator shrinks by reducing vector length and reducing individual byte values. A carefully composed generator shrinks along its compositional structure: a failing packet shrinks to a packet with fewer payload bytes, or a different flags value, and each shrunk form is itself a valid packet. The minimal counterexample is small and structurally meaningful.

When the generator is a filter on arbitrary bytes, shrinking often finds shorter byte sequences that the filter also rejects. The framework cannot reach a truly minimal failing case because valid shrunk inputs are sparse in the unconstrained space. It terminates with the best valid shrunk input it found, which may still be complex enough to obscure the cause of the failure. A failure on a well-shrunk input is a debugging target; a failure on a partially-shrunk blob of bytes requires a separate investigation before the counterexample is even interpretable.

The integrated shrinking model in Hypothesis avoids this class of problem by shrinking the underlying bytestream rather than the generated value. Any valid bytestream produces a value that is by construction within the generator’s output space, so every shrunk candidate is automatically valid. This is one of the most important engineering improvements over the original QuickCheck model from Claessen and Hughes, which required per-type shrink implementations and provided no structural guarantee about shrunk candidates. Proptest achieves similar results through its strategy-tree representation, which records the generation plan and can backtrack along it during shrinking.

Custom Generation as Domain Modeling

Writing a generator for complex structured types is, in effect, writing down what valid inputs look like. This makes the generator a form of executable domain specification, distinct from both the production code and the property assertion. When the generator is wrong, it produces inputs that miss the bugs; when it is right, it finds them. And when the domain changes, the generator needs to change with it, which surfaces disagreements between the production code and the domain model in a concrete way.

For wire protocols, this usually means encoding the grammar of the format as a compositional generator. For domain objects, it means encoding the business rules as generator constraints. A user record generator that only produces records with valid email addresses, non-negative balances, and consistent timestamps documents those constraints in a runnable form. If the production code handles inputs the generator would not produce, that gap is worth understanding explicitly.

Hypothesis’s @composite decorator and proptest’s prop_compose! macro both provide clean syntax for building generators that read like specifications:

prop_compose! {
    fn valid_user_record()(
        email in "[a-z]{3,12}@[a-z]{3,8}\\.com",
        balance in 0u64..1_000_000u64,
        created_at in 0i64..1_800_000_000i64
    ) -> UserRecord {
        UserRecord { email, balance, created_at }
    }
}

The generator is readable, composable, and encodes the domain constraints directly in the type of values it produces. Properties written against this generator are making a claim about the space of records matching these constraints, and that claim is documented in executable form alongside the property itself.

Coverage and the Limits of Shaped Generation

The trade-off with shaped generators is that you can accidentally exclude the inputs where bugs live. A generator that only produces well-formed packets will not find a parser bug triggered by a truncated header, because the generator does not produce truncated headers. Whether this is correct depends on the goal: testing correctness on valid inputs requires a valid-input generator; testing robustness to malformed inputs requires a different generator or a different property.

Real verification efforts typically use multiple generator configurations for the same function, each addressing a different part of the input space. The bolero crate in Rust makes this compositional approach explicit by providing a single harness interface that can drive the same property under libFuzzer (coverage-guided generation that explores the input space heuristically), proptest (strategy-based generation), or Kani (symbolic execution that reasons over all inputs simultaneously). The input generation strategy is a separate concern from the property being verified, and keeping that separation makes it straightforward to combine approaches: use proptest during development for fast feedback, layer in fuzzing to explore the corners of the shaped space, and apply bounded model checking to prove the property exhaustively within a depth bound.

What Real-World Application Reveals

A real-world case of property-based verification, applied to a production problem rather than a tutorial exercise, tends to make the generator problem visible in a way that toy examples do not. The properties that matter in real code are often clear once you identify them. The generators that accurately represent the input domain require sustained design work, explicit decisions about what “valid input” means, and maintenance as the domain evolves.

Specifying the input space precisely enough to support reliable property-based verification is itself a significant part of what you are verifying: that your understanding of the domain is complete enough to make a claim about all inputs within it. The generator is the executable form of that understanding, and treating it as a first-class artifact rather than boilerplate is what separates a property-based test that provides genuine evidence from one that merely samples the obvious cases.

Was this interesting?