From Hoare Triples to Protobuf: Seventy Years of Collapsing the Spec-to-Code Gap

Gabriel Gonzalez makes the case on Haskell for All that a sufficiently detailed specification is code. The argument is correct, and it has been correct for about seventy years. What makes it worth tracing in full is that each decade added a different mechanism for collapsing the gap, and the failure modes are different each time. Understanding the arc is more useful than knowing the conclusion.

The Claim Has a Birth Date

Tony Hoare published “An Axiomatic Basis for Computer Programming” in 1969 (ACM DL). The paper introduced what we now call Hoare triples: {P} C {Q}, where P is a precondition, C is a command, and Q is a postcondition. The notation says: if P holds before executing C, then Q holds after. This was not a documentation convention. It was a formal logic for reasoning about program correctness, and the triples were meant to be derived and checked mechanically.

Dijkstra extended this shortly after with weakest preconditions. Instead of checking whether a given P was strong enough, you could compute the weakest P that guarantees Q after executing C. The specification was now the endpoint of a derivation, not just a human-written annotation. The implication was clear to those paying attention: if you could state Q precisely enough, the implementation was, in principle, recoverable from it.

Eiffel Made It Executable

Bertrand Meyer gave Hoare’s ideas their first widely-deployed programming-language form with Design by Contract in Eiffel, formally introduced in his 1988 book. The syntax was direct:

put (x: G; key: STRING)
    require
        key_not_empty: not key.is_empty
    do
        -- implementation
    ensure
        inserted: item(key) = x
    end

Preconditions and postconditions lived in the source file, next to the code they specified. In debug builds, they were checked at runtime. External tools could analyze them statically. Meyer’s central observation was that a contract precise enough reduces the implementation space to a small set of sensible options. The spec and the implementation converge.

Design by Contract influenced C#‘s Code Contracts, Java’s assertion infrastructure, and the invariant idioms in most safety-critical software. It did not become universal, largely because runtime contract checking has a cost, and most teams treated that cost as optional overhead rather than a correctness investment. The idea was sound. The adoption curve was not.

Curry-Howard: The Theoretical Spine

While Meyer was building Eiffel, type theorists were assembling the theoretical explanation for why specs and code converge. The Curry-Howard correspondence, named for observations by Haskell Curry and William Howard spanning 1934 to 1969, establishes a precise isomorphism: propositions in intuitionistic logic correspond to types in typed lambda calculus, and proofs of those propositions correspond to programs of those types.

Philip Wadler’s 2015 paper “Propositions as Types” (ACM DL) is the clearest modern account. Under this correspondence, writing a type signature is stating a theorem. Writing a program that satisfies the type is proving the theorem. A type checker is a proof verifier. These are not analogies; they are the same mathematical structure, independently discovered in two disciplines that spent decades not talking to each other.

The practical consequence is that the expressiveness of a type system determines how much of a spec it can carry. A type like String -> String expresses almost nothing. A parametrically polymorphic type like a -> a expresses quite a lot, because by parametricity there is exactly one total function with that type: the identity. The type is so precise that it specifies the implementation completely.

Dependent Types: Precision at the Logical Limit

Dependent types, as found in Idris 2, Agda, Lean 4, and Coq, allow types to depend on values. This extends Curry-Howard to its natural conclusion. Consider the length-indexed vector from Idris:

data Vect : Nat -> Type -> Type where
    Nil  : Vect 0 a
    (::) : a -> Vect n a -> Vect (S n) a

map : (a -> b) -> Vect n a -> Vect n b
map f Nil       = Nil
map f (x :: xs) = f x :: map f xs

The type Vect n a is a vector of exactly n elements of type a, where n is a value-level natural number carried in the type. The signature for map says the output vector has the same length as the input. This is not a comment or a docstring; it is a checked invariant. An implementation that drops an element or adds one does not compile. The length-preservation property of map is part of its specification, and the specification is part of its type.

Lean 4’s Mathlib library carries this further into industrial-scale mathematics. Theorems about linear algebra, number theory, and topology are stated as types and proved as programs. The specification is the theorem statement; the implementation is the proof term; the type checker verifies that the proof is valid. The gap between spec and code is zero by construction.

Dependent types carry a real cost. Writing proofs is hard. Type inference becomes undecidable in the general case. The tooling is immature compared to mainstream languages. These are genuine tradeoffs, not engineering friction waiting to be optimized away.

Where Haskell Sits on the Spectrum

Haskell occupies the productive middle ground between mainstream type systems and full dependent types. GADTs and the DataKinds extension allow types to carry some value-level information. The singletons library provides a systematic way to lift value-level computations to the type level, enabling a significant subset of dependent-type patterns in a language with mature tooling and a large ecosystem.

For cases where you want the spec-in-the-type approach without full proof obligations, LiquidHaskell adds refinement types: types annotated with logical predicates, discharged automatically by an SMT solver at compile time. A non-empty list refinement on head looks like this:

{-@ head :: {v : [a] | len v > 0} -> a @-}
head :: [a] -> a
head (x:_) = x
head []    = error "unreachable"

The annotation {v : [a] | len v > 0} asserts that the input list has length greater than zero. LiquidHaskell passes this constraint to Z3 at compile time. If every call site can be verified to supply a non-empty list, the error branch is statically unreachable, and LiquidHaskell will confirm that. The partial function problem is not solved by being careful at runtime; it is solved by making the violation a type error.

Property-Based Testing as the Pragmatic Middle

Not every codebase can or should adopt refinement types or dependent types. Property-based testing, introduced by Claessen and Hughes in QuickCheck in 2000 (ACM DL), occupies a pragmatically accessible position on the same spectrum. Instead of unit tests over specific inputs, you write properties that any correct implementation must satisfy:

import Test.QuickCheck
import Data.List (sort)

prop_reverseInvolution :: [Int] -> Bool
prop_reverseInvolution xs = reverse (reverse xs) == xs

prop_sortOrdered :: [Int] -> Bool
prop_sortOrdered xs =
  let ys = sort xs
  in and (zipWith (<=) ys (drop 1 ys))

prop_reverseInvolution is a specification, not a test case. It says something universal about reverse: applying it twice is the identity. prop_sortOrdered states what sorted output must look like across all lists, not just the three examples a developer might write by hand. QuickCheck generates hundreds of random inputs and checks each property mechanically.

The gap between QuickCheck and full verification is the gap between “tests many cases” and “checks all cases.” Both are incomparably better than natural language specifications. Only one closes the loop completely. Hypothesis brings the same approach to Python; fast-check to JavaScript; PropEr to Erlang. The pattern has spread because executable specifications are more useful than documented ones, regardless of whether they carry proof-level guarantees.

The Industrial Collapse: Protobuf, OpenAPI, SQL DDL

The same convergence happened quietly in industry tooling, without the type-theory vocabulary. Consider a Protobuf schema:

message UserEvent {
  string     user_id   = 1;
  int64      timestamp = 2;
  EventType  type      = 3;
  repeated string tags = 4;
}

This is a specification. It describes the shape of data with enough precision that protoc can generate serialization and deserialization code in a dozen languages from it. The schema is the spec; the generated code is derived from the spec; the wire format is enforced by the schema. The gap between the data contract and its implementation is zero in the generated layer.

OpenAPI specs and SQL DDL schemas operate on the same principle. A SQL DDL statement like NOT NULL or CHECK (age > 0) is a formally stated invariant. A FOREIGN KEY constraint is a referential integrity specification, enforced by the database engine on every write. Nobody in the database world calls this formal methods, but structurally it is the same thing: a precise enough statement of what the data must satisfy, enforced mechanically by the runtime.

These tools reached mass adoption not because their users were familiar with Hoare logic, but because the payoff was immediate. Generate the serialization code from the schema and it cannot diverge from the schema. Enforce the constraint in the database and the application cannot violate it. The spec-to-code gap collapses because the tooling makes precision cheap enough to be worth it.

The Gap Is a Precision Problem

The practical takeaway from this seventy-year arc is narrow but useful. The gap between a specification and its implementation is not a fixed feature of software development; it is a function of how precisely you can state what you want, and how much of that precision your tools can check mechanically.

A natural-language comment is the low-precision end. A Lean proof term is the high-precision end. Between them, in roughly ascending order of rigor, sit type annotations, interface definitions, Protobuf schemas, QuickCheck properties, LiquidHaskell refinements, TLA+ models, and dependent types with proof obligations. Each step up the gradient converts something you would otherwise re-verify by reading into something checked automatically on every build.

The tools at every point on this gradient are better now than they were a decade ago. SMT solvers are faster and more capable. Lean 4’s ergonomics are meaningfully better than Coq’s were in 2010. Property-based testing has spread from Haskell into every major language. The cost of moving up the gradient has been falling steadily.

Hoare’s 1969 observation was that if you could state your postcondition precisely enough, the implementation was recoverable from it. That was true in 1969. The difference now is that “precisely enough” is achievable with off-the-shelf tools for a much larger class of software than it was when Hoare wrote the paper.