· 6 min read ·

The Double Cost of Keeping Spec and Code Apart

Source: lobsters

The haskellforall post from this week states the convergence thesis cleanly: a specification that is precise enough to be unambiguous and complete has already done the computational work, and is therefore a program. That thesis is correct, and the intellectual lineage behind it is long. What the thesis undersells is the economic argument, which is sharper than it might first appear.

Precision is not free. Writing down exactly what a function must do, with enough detail to rule out every wrong implementation, costs roughly the same cognitive effort as writing the function itself. The mental model you need to specify behavior precisely is the same mental model you need to implement it correctly. When you keep the spec and the code as separate artifacts in separate languages, you pay that cost twice, and then you pay a third time, continuously, to keep them synchronized as the system evolves.

This is not a small tax. It is why formal specification languages like Z notation and early TLA+ saw limited industrial adoption despite their expressive power. The engineers who could write a correct TLA+ model were largely the same engineers who could write the correct implementation. Paying both costs, then maintaining a gap between them, is a hard sell.

The Long History of Convergence

Hoare’s 1969 axiomatic semantics paper introduced the idea of attaching logical assertions to code, preconditions and postconditions that describe what a routine requires and what it guarantees. The spec and the code sat in the same artifact, adjacent to each other. The problem was that the assertions were comments. They carried no enforcement weight at compile time, and only partial weight at runtime through assertion checks.

Bertrand Meyer’s Eiffel and Design by Contract in the 1990s made the assertions a first-class language feature and added runtime checking. That was a meaningful step: the spec was now executable in the sense that violations would surface at runtime. But the gap between the contract and the implementation remained. A contract that says a sort routine returns a list of the same length as its input, in ascending order, containing the same elements, is a precise spec. The implementation still has to be written separately, and the compiler cannot tell you whether it satisfies the contract without running it.

The Curry-Howard correspondence closes that gap from a different direction. Haskell Curry noticed in the 1930s and 1950s that type systems and logical systems had structural parallels. William Howard formalized the connection in a 1969 paper that was not published until 1980: types are propositions, programs are proofs. A function that typechecks is a constructive proof that its type is inhabited.

The practical consequence in Haskell is what Philip Wadler called theorems for free. A function with the type (a, b) -> a can only be fst. The type variable a is universally quantified over all types, so the only way to produce a value of type a from inputs of type a and b is to return the a you were given. The type has constrained the implementation space to a single point. That is not metaphor; it is a direct consequence of parametricity. The spec and the implementation have merged.

Dependent Types Push This Further

Dependent types allow types to depend on values, which means you can encode arbitrarily precise behavioral specifications in the type signature. In Idris, a sort function can carry its full contract in its type:

sort : (xs : List Int) -> (ys : List Int ** (IsSorted ys, Permutation xs ys))

This type says the function takes a list xs and returns a list ys along with proofs that ys is sorted and that ys is a permutation of xs. The only way to produce a term of this type is to actually sort the list. The spec is the type, and writing a valid implementation is writing the proof that the spec is satisfiable. The synchronization problem disappears because the two things are one thing.

Agda works the same way. The dependent type system in both languages is expressive enough to encode the full specification of almost any algorithm, which means the compiler enforces the contract at every call site.

F* and the Industrial Middle Ground

F* takes this into a setting designed for real verification work. Refinement types in F* let you attach predicates directly to types:

val sort : l:list int -> Tot (m:list int{sorted m /\ permutation l m})

The type of sort states that it terminates (the Tot effect) and that its return value m satisfies sorted m and permutation l m. The F* verifier, backed by the Z3 SMT solver, checks that any implementation you provide actually proves these predicates. The type is the theorem, the implementation is the proof, and the toolchain enforces the connection. Liquid Haskell brings the same idea to Haskell via refinement types checked by an SMT solver, letting you annotate existing code with machine-checked preconditions and postconditions without leaving the language.

Property-Based Testing as the Accessible Version

Not every project is ready for dependent types or SMT-backed refinement checkers. QuickCheck, introduced by Koen Claessen and John Hughes in 2000, offered a lighter-weight approach: express the properties your code should satisfy as executable predicates, then let the framework generate random inputs and try to falsify them.

Python’s Hypothesis extends this with stateful testing and shrinking. A sort property looks like this:

from hypothesis import given
from hypothesis import strategies as st

@given(st.lists(st.integers()))
def test_sort_properties(xs):
    result = my_sort(xs)
    assert len(result) == len(xs)
    assert all(result[i] <= result[i+1] for i in range(len(result) - 1))
    assert sorted(result) == sorted(xs)  # same elements

This is a spec encoded in the implementation language, executed against the implementation automatically. The spec is not proven correct the way dependent types prove it, but it is checkable, it lives next to the code, and it breaks the build when violated. The synchronization cost drops significantly because the spec is a test that runs in the same CI pipeline as everything else.

Where the Gap Remains

TLA+ is the honest counterexample to the convergence thesis. A TLA+ model describes the behavior of a distributed system at a level of abstraction that is genuinely separate from the implementation. You can verify properties of the model using the TLC model checker or the TLAPS proof system, and those verification results do not transfer to the implementation by construction. The implementation has to be written separately and has to be argued, informally or formally, to conform to the model.

This is a real and useful gap in some domains. When the system involves multiple processes, network partitions, and subtle liveness conditions, the model-level reasoning is different in kind from the implementation-level reasoning. The economic argument still applies, you pay twice, but for distributed systems correctness the payment is often worth it. The gap is a deliberate design choice, not an oversight.

The Economic Upshot

The tools that collapse the spec-code boundary eliminate the synchronization tax. Rust’s borrow checker is a spec for ownership and lifetimes encoded directly in the type system; you cannot write code that violates it and have it compile. Dependent types in Idris and Agda make the contract part of the type that the compiler checks at every call site. F* and Liquid Haskell let you annotate existing code with checkable refinements. QuickCheck and Hypothesis let you write executable property specs that run continuously against every build.

Separate specification languages had limited industrial uptake not because engineers did not care about correctness, but because the cognitive cost of writing a precise spec is close to the cost of writing correct code, and paying both costs for the same behavior is a hard problem in engineering economics. The languages and tools that encode specs as types, as properties, or as contracts in the implementation language solve that problem by collapsing it. The spec is the code, the checker enforces the correspondence, and the synchronization budget goes to zero.

The haskellforall thesis is right. The economic argument for why it matters in practice is that the tools which embody it are not just intellectually cleaner; they are cheaper to maintain over the lifetime of a system than any workflow that keeps the spec and the implementation as separate documents.

Was this interesting?