· 6 min read ·

From Contract to Proof: The Long Convergence of Specifications and Code

Source: lobsters

The idea explored in this haskellforall post is one that formal methods researchers have understood for decades but that rarely penetrates everyday software practice: the transition from specification to implementation is better understood as a point on a continuum of precision than as a boundary between two different kinds of things. Specifications do not transform into code when you hand them to a developer; they become code when they accumulate enough detail to run.

That framing changes how you should think about every artifact sitting between an English requirements document and a type-annotated function body.

The Precision Continuum

Take a sort function as a reference point. In prose: “the function returns elements in ascending order.” This is a specification, and it leaves a great deal implicit. It says nothing about stability, about how equal elements compare, or about empty inputs.

In Python with Hypothesis:

from hypothesis import given
import hypothesis.strategies as st
import collections

@given(st.lists(st.integers()))
def test_my_sort(xs):
    result = my_sort(xs)
    assert all(result[i] <= result[i+1] for i in range(len(result) - 1))
    assert collections.Counter(result) == collections.Counter(xs)

The specification is now machine-readable and falsifiable. The first assertion encodes “sorted order”; the second encodes “same multiset as input,” which rules out implementations that return a fresh constant or silently drop elements. You can run this against thousands of generated inputs and build mechanical confidence. The specification and implementation remain separate artifacts, though.

In Liquid Haskell, refinement types pull the specification into the type signature directly:

{-@ sort :: xs:[a] -> {v:[a] | isSorted v && len v == len xs} @-}
sort :: Ord a => [a] -> [a]

The refinement {v:[a] | isSorted v && len v == len xs} is a predicate logic statement embedded in the type. The Liquid Haskell verifier, backed by an SMT solver, checks that the implementation satisfies this type at compile time. The specification is now load-bearing in the type system; there is no separate test for this property.

Go further to Idris 2 or Lean 4:

sort : (xs : List a) -> (ys : List a ** (Sorted ys, Permutation xs ys))

The type encodes the full functional specification: the result must be sorted, and must be a permutation of the input. A term inhabiting this type is simultaneously an implementation and a proof that the implementation is correct. Writing the code is the same activity as constructing the proof.

The Mathematical Foundation

This convergence has a precise name: the Curry-Howard correspondence. Haskell Curry observed in the 1930s that certain proof systems and type systems share identical structure. William Howard formalized this observation in an unpublished 1969 manuscript. The isomorphism it establishes:

  • Types correspond to propositions
  • Programs correspond to proofs
  • Evaluation of a program corresponds to simplification of a proof

The consequence for specifications: in an expressive enough type system, the proposition “this sort function returns a sorted permutation of its input” can be stated as a type. A well-typed program returning a value of that type is a formal proof of the proposition. Specification and code are the same formal object, described at different levels of detail.

Per Martin-Löf’s intuitionistic type theory, developed beginning in 1972, provided the dependent type machinery that makes this practical at scale. Every major proof assistant, including Coq, Agda, Lean 4, and Idris, builds on this foundation. Lean 4’s mathlib4 contains tens of thousands of formalized theorems where each theorem and its proof is ordinary Lean source code, with no distinction between the mathematical object and the program that represents it.

The Historical Arc

The movement toward executable specification has a long practical history, running parallel to the theoretical machinery.

Bertrand Meyer’s Design by Contract in Eiffel (1986) was an early step. Preconditions, postconditions, and class invariants were written in the source language and checked at runtime. The specification lived in the same file as the implementation and was enforced mechanically, but only at runtime and only for the inputs that tests happened to exercise.

Koen Claessen and John Hughes introduced QuickCheck in a 2000 ICFP paper. Their key insight was that universal properties, statements of the form “for all inputs satisfying X, the output satisfies Y,” could be generated and shrunk automatically. QuickCheck moved specifications into the testing phase while keeping them machine-readable and falsifiable. Property-based testing libraries in every major language, Hypothesis for Python, fast-check for TypeScript, jqwik for Java, follow this model.

Leslie Lamport’s TLA+ (1994) took a different path: a specification language designed for model checking distributed systems, with no expectation that it would run as production code. AWS has published accounts of using TLA+ to find bugs in distributed protocols before implementation. The model checker verifies safety and liveness properties over finite state spaces, but a separate implementation is always required.

Dafny, from Microsoft Research, sits closer to the convergence point. You annotate methods with preconditions, postconditions, and loop invariants; the Dafny verifier, backed by the Z3 SMT solver, attempts to prove them automatically. Verification failures surface as compile errors. The specification and implementation are interleaved in the same source file, and the distinction between the two has largely dissolved.

Practical Implications Below the Proof Level

The full dependent-types approach carries genuine costs. Proof obligations that cannot be discharged automatically require manual effort, type checking becomes undecidable in the general case, and productive use of Agda or Coq requires significant investment before it becomes practical on real code.

The gradient still matters at every level below proof assistants.

TypeScript’s type system, extended with discriminated unions, branded types, and template literal types, carries more specification burden than a plain JavaScript interface. Rust’s ownership and borrowing system encodes a memory safety specification that the borrow checker enforces statically, with no runtime overhead. Neither is dependent types, but both represent movement toward making the type system carry more of the specification load.

Property-based tests are machine-readable specifications that a computer can falsify. The discipline of writing a universal property, rather than a specific example, forces you to articulate what should hold for all inputs. This often surfaces ambiguity in informal requirements: when you try to write “for all inputs X, the output satisfies Y” and realize you cannot state Y precisely, the imprecision was always there.

JSON Schema, OpenAPI definitions, and Protobuf schemas are partial specifications for data shapes and API contracts. Code can be generated from them, implementations can be validated against them, and they constrain the space of valid behavior in ways that prose documentation cannot. They sit further down the precision spectrum than dependent types, but they are on the same spectrum.

The Gap That Persists

The Curry-Howard correspondence establishes that proofs and programs are the same in certain formal systems. It establishes nothing about whether the specification you proved correct corresponds to what users need.

A sort function proven to return a sorted permutation can still optimize for the wrong performance characteristics, handle Unicode incorrectly for a particular locale, or impose the wrong stability guarantee for a particular use case. Formal verification closes the gap between specification and implementation; the gap between specification and user needs belongs to requirements engineering, not type theory.

What precise specifications do is make that second gap visible. When you have to write a formal postcondition for a function, the implicit assumptions in the informal requirement surface as explicit constraints. Gaps in the requirement appear as proof obligations you cannot discharge because the behavior in question has not been defined.

The arc from Design by Contract through QuickCheck and Liquid Haskell to Lean 4 is the arc of closing one particular gap: the distance between saying what code should do and mechanically verifying that it does so. What the haskellforall observation captures is that this arc has a limit, and at that limit, specification and code are the same text.

Was this interesting?