· 5 min read ·

From Assertions to Proofs: The History of Specifications That Run

Source: lobsters

The claim in Gabriella Gonzalez’s recent post on Haskell for All sounds deceptively simple: a sufficiently detailed specification is code. Sit with it long enough and it reveals a claim about the entire history of formal methods, from Hoare’s 1969 paper to today’s proof assistants. The line between spec and implementation has been dissolving for fifty years. What makes the observation interesting now is how far the tools have come.

The Separation at the Beginning

Tony Hoare’s “An Axiomatic Basis for Computer Programming” introduced what became known as Hoare triples: {P} C {Q}, where P is a precondition, C a command, and Q a postcondition. The precondition and postcondition together form a specification. The command is the implementation. They inhabit different mathematical universes: the spec is a logical claim, the code is a computational artifact.

For the next two decades, formal specification remained almost entirely in the mathematical domain. Languages like Z notation and VDM (Vienna Development Method) allowed rigorous behavioral specifications, but running them required writing a separate implementation. The spec was a document, however precise.

Bertrand Meyer changed this incrementally with Eiffel in 1986. Design by contract brought require (preconditions), ensure (postconditions), and invariant into the language. A method like:

put (x: G; key: STRING)
    require
        key /= Void
    do
        -- implementation
    ensure
        item (key) = x
    end

states its contract directly alongside its implementation. Contract violations raise exceptions at runtime. The contracts are checked, not verified statically. A method body could still satisfy the type checker while failing its postcondition at runtime. The distance between spec and implementation had narrowed, but they remained separate artifacts.

Property-Based Testing as Executable Spec

QuickCheck, introduced by Koen Claessen and John Hughes in 2000, reframed property-based specifications as something that could run automatically. A property like:

prop_reverseReverse :: [Int] -> Bool
prop_reverseReverse xs = reverse (reverse xs) == xs

is a machine-readable specification. QuickCheck generates hundreds of random inputs and checks whether the property holds for each. The spec is an executable predicate, not documentation that could drift out of sync with the code.

This approach spread widely. Hypothesis brought it to Python, fast-check to JavaScript, and jqwik to Java. In each case, the same structure: write a precise behavioral claim as a predicate, let the framework generate counterexamples. Property-based testing still maintains the separation between spec and implementation, but the spec now runs alongside the code and can falsify it.

Types as Lightweight Specifications

Haskell’s type system pushes the boundary further. A signature like:

sort :: Ord a => [a] -> [a]

is a specification. The Ord a constraint is a precondition: elements must be comparable. The return type is a postcondition: you get back a list of the same element type. GHC enforces this at compile time. Any implementation that type-checks satisfies at least part of the contract.

The limitation is that the type specifies shape, not full behavior. Both id and const [] satisfy that signature. The type is necessary but not sufficient.

LiquidHaskell closes part of this gap with refinement types:

{-@ sort :: Ord a => xs:[a] -> {v:[a] | isSorted v && len v == len xs} @-}

The return type now carries a predicate: the result must be sorted and have the same length as the input. An SMT solver checks statically that any implementation with this signature satisfies the predicate. The spec is strong enough to rule out id and const []. The distance between specification and verified implementation has narrowed to a single annotation.

The Curry-Howard Correspondence

The theoretical foundation for the strongest version of this claim is the Curry-Howard isomorphism, which establishes a formal correspondence between types and logical propositions, and between programs and proofs. Writing a program of type A -> B constitutes a proof that A implies B. This is not an analogy; it is a precise mathematical result with consequences for how we construct and verify software.

In Agda, Coq, and Lean 4, you write specifications as types and then provide programs that inhabit those types. The type checker verifies that your program constitutes a valid proof of the specification. The distinction between spec and implementation collapses: the program is the proof of the spec, and the type is the spec.

Lean 4 has been particularly active. Work on tools like Leanstral explores using fine-tuned language models to assist in constructing formal proofs, combining machine learning with mechanized verification. The practical consequence is that stating a specification as a theorem and obtaining machine-checked evidence of correctness is becoming tractable for a broader audience.

Where the Argument Has Limits

The claim holds cleanly for pure functions over well-typed algebraic data. It gets more complicated for software that ships at scale.

OpenAPI specifications describe REST APIs with enough precision that tools like openapi-generator produce client SDKs and server stubs. The spec becomes code in a narrow sense. But the spec says nothing about query behavior, caching, rate limiting, or business logic. The generated code is structural scaffolding. The specification was precise enough to produce shape but not behavior.

Protocol Buffers follow the same pattern. A .proto file specifies message formats and service interfaces precisely enough that protoc generates correct serialization code across a dozen languages. That generated code is genuinely useful; it also says nothing about what the service should do with the messages once deserialized.

The pattern is consistent: specifications that describe structure become code for structure. Specifications that describe behavior require verification machinery to become behavioral guarantees. And achieving a specification precise enough to capture full behavior tends to require roughly as much work as writing the implementation itself.

The Practical Implications

The most immediately actionable version of this argument, for engineers who are not writing theorem provers, lives in the space between QuickCheck and LiquidHaskell. Write your invariants as executable predicates. Use a type system to encode preconditions and postconditions where you can. For distributed systems, TLA+ and Alloy let you model and check behavioral properties before committing to an implementation. Amazon’s documented use of TLA+ on DynamoDB and S3 found subtle bugs in designs that had passed thorough code review, bugs that testing alone would not have caught.

Precision in a specification is not overhead layered on top of the implementation. It is the implementation at a different level of abstraction. Every step toward making a spec more precise, whether through a richer type signature, a verified property, or a formal proof, reduces the gap between what you mean and what the computer does. At some level of precision, that gap reaches zero, and the spec compiles.

Was this interesting?