· 6 min read ·

Specifications Are Incomplete Programs: The Precision Continuum from DbC to Coq

Source: lobsters

The Haskell for All post on sufficiently detailed specs states something that formal methods researchers have understood for decades but that most working programmers never encounter: add enough precision to a specification and you stop having a spec. You have a program. The observation sounds provocative, but it is almost a tautology once you trace the history of how specifications have evolved in programming language design. Every step toward more rigorous specification has been a step toward more executable code.

The Beginning: Contracts as Partial Specifications

The earliest practical incarnation of this idea came from Bertrand Meyer in the mid-1980s. His language Eiffel, introduced in 1985 and documented in Object-Oriented Software Construction (1988), introduced Design by Contract: a system of require clauses (preconditions), ensure clauses (postconditions), and invariant blocks that sit directly in the class body alongside the implementation.

withdraw (amount: INTEGER)
    require
        positive_amount: amount > 0
        sufficient_funds: amount <= balance
    do
        balance := balance - amount
    ensure
        balance_decreased: balance = old balance - amount
    end

The postcondition here, balance = old balance - amount, is precise enough that there is essentially one correct implementation. A proof system could, in principle, derive the body from the require and ensure clauses. Eiffel did not do that, but it named a relationship between specification and code that would define the next forty years of programming language research.

Contracts are checked at runtime in Eiffel, not at compile time. They are specifications that execute, but only as assertions, not as derivations. This is the first rung of the ladder.

The Middle Ground: Properties as Runnable Specifications

QuickCheck, introduced by Koen Claessen and John Hughes in their ICFP 2000 paper, operationalized specifications as runnable predicates. A property like length (reverse xs) == length xs is a partial specification of reverse.

prop_reverseLength :: [Int] -> Bool
prop_reverseLength xs = length (reverse xs) == length xs

prop_reverseReverse :: [Int] -> Bool
prop_reverseReverse xs = reverse (reverse xs) == xs

You can run these against thousands of randomly generated inputs and gain high confidence. When QuickCheck finds a failing case, it shrinks the input to the smallest counterexample automatically. The properties are not a complete specification of reverse, they only constrain its behavior, but they are executable and machine-checkable.

The gap between QuickCheck and a proof is the gap between confidence and certainty. Property-based testing remains the pragmatic middle ground for precisely this reason. You say something meaningful about your program, you gain substantial assurance, and you don’t need a type theory PhD to do it.

The concept has spread across every language ecosystem: Hypothesis in Python, fast-check in JavaScript, PropEr in Erlang, RapidCheck in C++. The underlying idea is the same everywhere: elevate your specification from a comment to a predicate, make the predicate executable, and let a generator find the failures.

Closing the Gap: Refinement Types

Liquid Haskell, developed by Ranjit Jhala, Niki Vazou, and colleagues and described in their ICFP 2014 paper, moves preconditions from runtime to compile time by embedding SMT-checked predicates directly into types.

{-@ head :: {v: [a] | len v > 0} -> a @-}
head (x:_) = x
head []    = error "impossible"

{-@ append :: xs:[a] -> ys:[a] -> {v:[a] | len v == len xs + len ys} @-}
append []     ys = ys
append (x:xs) ys = x : append xs ys

The GHC plugin discharges these predicates using the Z3 SMT solver. The compiler proves statically that the empty-list branch of head is unreachable at every call site where the refinement is satisfied. The postcondition on append states that the output length is the sum of the input lengths; Z3 verifies this holds for every possible input.

This is still not a full behavioral specification. The predicates must fall within a decidable fragment of first-order logic, roughly quantifier-free linear arithmetic and uninterpreted functions. But within that fragment, the specification is enforced at compile time, not left to documentation or runtime assertions.

Where the Boundary Dissolves: Dependent Types

The place where the spec-code distinction genuinely collapses is in dependently typed languages. Idris (Edwin Brady, first published 2013, Idris 2 released 2020 with Quantitative Type Theory) allows types to depend on runtime values, which means you can write a type that completely characterizes a function’s behavior.

The standard example is length-indexed vectors:

data Vect : Nat -> Type -> Type where
  Nil  : Vect 0 a
  (::) : a -> Vect n a -> Vect (S n) a

append : Vect m a -> Vect n a -> Vect (m + n) a
append Nil       ys = ys
append (x :: xs) ys = x :: append xs ys

The type Vect m a -> Vect n a -> Vect (m + n) a is a theorem: it asserts that appending a vector of length m to a vector of length n yields a vector of length m + n. The implementation follows from the type by structural induction. If you write a different implementation, the typechecker rejects it. The spec and the code converge to the same artifact.

The same principle appears in safe indexing:

-- Fin n is the type of natural numbers strictly less than n.
-- index can never go out of bounds; the type enforces it.
index : Fin n -> Vect n a -> a
index FZ     (x :: _)  = x
index (FS k) (_ :: xs) = index k xs
-- The case (index _ Nil) is impossible because Fin 0 is uninhabited.
-- No branch required; the typechecker knows.

This is the Curry-Howard correspondence made concrete. William Howard’s 1969 manuscript, published in a 1980 festschrift, established the formal isomorphism: propositions in intuitionistic logic correspond to types in typed lambda calculus, proofs correspond to programs, and type-checking is proof-checking. Philip Wadler’s 2015 survey in Communications of the ACM remains the clearest accessible treatment. With dependent types, you can state theorems precisely enough that type-checking becomes proof-verification in the full logical sense.

The Production End: Coq and Verified Extraction

The most mature point on this spectrum is Coq (developed at INRIA since 1984, now also called Rocq as of 2024). Coq is a proof assistant based on the Calculus of Inductive Constructions. You write proofs of mathematical theorems, and then extract the computational content as executable OCaml or Haskell via the Extraction mechanism, developed by Pierre Letouzey in his 2002 PhD thesis.

Extraction Language OCaml.
Recursive Extraction insertion_sort.

The extracted code contains no proofs; they are erased as proof-irrelevant. What remains is the algorithm, whose correctness is guaranteed by the type theory.

This is not a research curiosity. CompCert, Xavier Leroy’s formally verified C compiler (POPL 2006), ships extracted OCaml generated from Coq proofs. The backend is not tested for correctness; it is proven correct. Fiat Cryptography, MIT’s framework for generating verified elliptic curve implementations, ships code extracted from Coq that now runs inside Chrome’s BoringSSL and Go’s TLS stack. The path from spec to deployed code is fully formal.

The Practical Reading

The spectrum runs continuously from informal English documentation, which carries zero machine-checkable content, through DbC assertions, property-based tests, refinement types, dependent type signatures, and full formal proofs. At each step, adding precision means the specification contains more information that a machine can check or use to derive the implementation.

OpenAPI 3.1 and Protocol Buffers sit near the bottom of this spectrum. An OpenAPI schema specifies the surface of an HTTP API, generating typed client libraries and server stubs from the same source, but it says nothing about what the server does with the data. It is precise enough to generate substantial infrastructure and nothing more. A Dafny function annotated with preconditions and postconditions sits higher: the Dafny verifier discharges proof obligations via Z3 and then compiles to Java, Go, C#, or Python. A Coq proof with extraction sits at the top: the code is the proof.

The claim that a sufficiently detailed spec is code is not a provocation. It is a description of what you get when you move along this gradient far enough. Precision is the only variable. Every machine-checkable precondition is one that cannot be violated silently. Every postcondition the compiler can verify is a test you don’t have to write and a failure mode you cannot introduce. The logical endpoint of the journey is a specification so precise that the only remaining step is to run it.

Was this interesting?