· 6 min read ·

When the Spec Becomes the Program

Source: lobsters

Gabriel Gonzalez’s recent post on Haskell for All puts plainly something that formal methods people have known for decades but rarely state so directly: if you make a specification precise enough, it stops being a specification and becomes code. This is not a metaphor. It is a mathematical fact with a name.

Most programmers encounter this idea as a provocative claim and then move on. It’s worth staying with it longer, because understanding why it’s true changes how you think about types, tests, and the entire upstream half of software development.

The Gradient Nobody Draws

Specs exist on a spectrum of precision. At one end you have natural language: “the function should return the largest element.” At the other end you have a Lean proof term that constructs the output and simultaneously proves it satisfies every required property. Along the way you pass through informal pseudocode, UML diagrams, property-based test suites, type signatures, contracts, refinement types, and dependently typed programs.

What changes as you move along this spectrum is not the intent — it stays the same — but the amount of ambiguity. A natural-language spec has enough wiggle room to be interpreted many ways. A Haskell type signature like maximum :: Ord a => NonEmpty a -> a eliminates most of those interpretations. A dependent type that says the output is greater than or equal to every element of the input eliminates all of them.

When ambiguity reaches zero, the spec and an implementation of it become the same thing. A fully precise specification can be compiled and run.

Curry-Howard Made This Formal in 1969

The theoretical underpinning is the Curry-Howard correspondence, independently observed by Haskell Curry in 1934 and William Howard in 1969. Their insight was that propositions in logic correspond exactly to types in typed lambda calculus, and proofs of those propositions correspond exactly to programs of those types.

This means writing a type is stating a theorem. Writing a program of that type is proving the theorem. A type checker is a proof verifier. These are not analogies; they are the same structure viewed from two disciplines that developed in parallel.

In practice, most mainstream type systems only expose a thin slice of this correspondence. Java’s type system can express “this method returns an integer” but not “this method returns an integer greater than its input.” Haskell’s type system reaches further, expressing things like “this function works uniformly for any type” (parametric polymorphism), which by the parametricity theorem constrains what the function can possibly do to a very small set of behaviors.

Consider id :: a -> a. There is exactly one total function with this type. The type is so precise that it specifies the implementation completely, up to definitional equality. The spec is the code.

Dependent Types Are the Endpoint

Dependent types, as found in Agda, Idris, Lean 4, and Coq, allow types to depend on values. This dissolves the remaining gap between specification and implementation.

In Idris you can write:

index : Fin n -> Vect n a -> a

The type Fin n is the type of natural numbers strictly less than n. The type Vect n a is a vector of exactly n elements of type a. This function cannot possibly produce an out-of-bounds access. The type makes the error unrepresentable, not just unlikely. The spec — “access element i of a vector of length n where i < n” — is encoded directly in the type.

You can go further. Here is a Lean 4 signature for sorted insertion:

def insert (x : α) (xs : List α) (h : Sorted xs) : 
  { ys : List α // Sorted ys ∧ x ∈ ys ∧ ∀ z, z ∈ xs → z ∈ ys } := ...

The return type asserts that the output is sorted, contains x, and contains every element of the input. This is the complete functional correctness specification. Checking whether a program has this type is checking whether it meets the spec. There is no separate verification step.

The Middle Ground: Refinement Types and Property Tests

Full dependent types carry a cost: writing proofs is hard, type inference becomes undecidable in general, and the tooling is immature compared to mainstream languages. Two practical compromises occupy the middle of the spectrum.

Liquid Haskell adds refinement types to Haskell — types annotated with logical predicates that SMT solvers can check automatically. You write:

{-@ maximum :: {v:[a] | len v > 0} -> a @-}
maximum :: Ord a => [a] -> a

The {v:[a] | len v > 0} annotation says the input list must be nonempty, enforced statically by Z3 or CVC4 at compile time. No runtime check needed, no partial function problem. The specification lives in the type and is verified, not tested.

QuickCheck and its descendants (Hypothesis in Python, fast-check in JavaScript) sit at a different point: they let you write executable specifications as properties and then test them against randomly generated inputs. The property:

prop_sortIdempotent :: [Int] -> Bool
prop_sortIdempotent xs = sort (sort xs) == sort xs

is a specification. It cannot be mechanically compiled into the sort function, but it is precise enough to serve as a correctness oracle. Thousands of random inputs give you substantial confidence, though not proof.

The gap between QuickCheck and full verification is the gap between “tests many cases” and “checks all cases.” Both are incomparably better than natural language, but only one closes the loop completely.

Where It Breaks Down

Rice’s theorem establishes that any non-trivial semantic property of programs is undecidable. Full dependent types sidestep this by requiring termination proofs and constraining the language, which is why Agda and Coq are not Turing-complete in their core calculi. The moment you allow general recursion, the type checker can no longer be a complete verifier.

This is why Lean 4 has a sorry escape hatch, why Coq has Admitted, and why Agda has --allow-unsolved-metas. The theoretical ideal — a type system so expressive that every correct program type-checks and every incorrect one does not — is unreachable in a language with general computation. You choose a point on the tradeoff curve.

Practical languages with dependent features, like Rust’s lifetime system or TypeScript’s conditional types, make this tradeoff aggressively in favor of usability. Rust cannot express “this function returns a sorted list” but it can express “this function transfers ownership exactly once” and verify it statically. A narrow slice of the spec space, but an enormously useful one.

What This Changes

If you accept the core claim — that precision collapses the spec/code distinction — then several things follow.

First, writing types carefully is not pedantry. A type signature is the cheapest form of specification that can be machine-checked. String -> String is almost no spec at all. NonEmpty Text -> ValidatedEmail is a meaningful contract. The precision of your types is the precision of your spec.

Second, the traditional software engineering pipeline of requirements, then spec, then implementation, then tests, then verification is a workaround for imprecise intermediate artifacts. When your spec is executable, the pipeline compresses. The spec runs. The tests are derived from the spec. The gap where bugs hide — the space between what you meant and what you wrote — shrinks.

Third, the Haskell community has been arguing for this workflow for thirty years, and languages like Lean 4 are making it practical at industrial scale. Amazon uses TLA+ to formally specify distributed system protocols because the cost of an underspecified distributed algorithm, discovered in production, is measured in data loss. Microsoft’s seL4 verification shows that a fully verified kernel is achievable and deployable.

The distance between a vague requirements document and a runnable program is not a fixed feature of software development. It is a function of the tools you use and the precision you demand of yourself at each stage. The spec does not become code by magic. It becomes code when you stop tolerating ambiguity.

Was this interesting?