Gabriel Gonzalez’s article on haskellforall opens with a deceptively simple claim: a sufficiently detailed specification is code. The claim sounds like wordplay until you follow it to its logical conclusion, and that conclusion reshapes how you think about every layer of software development, from type annotations to formal verification.
The Spectrum from Intent to Execution
A product requirement written in plain English occupies one end of a spectrum. “Users should be able to reset their passwords” tells you the goal but nothing about the mechanism. On the other end sits machine code, pure mechanism with no explicit intent. Between these poles, every artifact in the development process sits somewhere along the gradient: design documents, pseudocode, API contracts, type signatures, test suites, formal proofs.
What Gonzalez is pointing at is that this gradient has no clean break. There is no magic line where “documentation” becomes “code.” As you add precision to a specification, you eventually cross a threshold where a machine can execute it directly.
The cleanest illustration is regular expressions. A regex like \d{4}-\d{2}-\d{2} is a specification for what constitutes a valid date string. It is also directly executable. Nobody debates whether it is a “spec” or “code” because the question is irrelevant, and the same pattern appears at every level of the stack.
Type Systems as Lightweight Specs
Haskell’s type system is the obvious starting point for this argument, partly because Haskell types are unusually expressive and partly because the language has a long tradition of treating types seriously as documentation.
Consider a function signature like:
sortBy :: (a -> a -> Ordering) -> [a] -> [a]
This specifies that sortBy takes a comparison function and a list, and returns a list of the same element type. The type constrains behavior without fully specifying it. It does not say the output will be sorted; it just constrains the shapes involved.
Move to something like:
sort :: Ord a => [a] -> [a]
Here the Ord constraint says the function requires a total ordering on a. Still not fully specified, but the space of valid implementations has narrowed considerably.
Dependent types take this to the limit. In Idris or Agda, you can write:
sort : (xs : List a) -> (ys : List a ** (IsSorted ys, IsPermutation xs ys))
This signature specifies that the output ys is both sorted and a permutation of the input xs. The type system enforces these properties at compile time. At this point, the type signature is essentially the entire specification, and any function that type-checks is, by construction, a correct sort implementation.
Curry-Howard Makes It Formal
The Curry-Howard correspondence is the theoretical backbone here. The correspondence establishes that propositions in logic correspond to types in type theory, and proofs correspond to programs. A type annotation is a logical proposition; a well-typed program is a proof of that proposition.
Under this correspondence, writing a specification and writing a program are the same activity at different levels of precision. A vague spec is a weak logical statement. A precise spec is a strong one. A fully formal spec is a theorem, and a proof of that theorem is an executable program.
This is not metaphorical. In proof assistants like Coq and Lean 4, the distinction between specification and implementation collapses almost entirely. You write a statement about what you want, then you write a term that inhabits the corresponding type. The verification is the program.
Property-Based Testing as Executable Spec
Property-based testing is probably the most accessible point on the spectrum for everyday development. QuickCheck, Haskell’s property testing library, lets you write executable specifications as properties:
prop_sortIdempotent :: [Int] -> Bool
prop_sortIdempotent xs = sort (sort xs) == sort xs
prop_sortPreservesLength :: [Int] -> Bool
prop_sortPreservesLength xs = length (sort xs) == length xs
These are specifications. They say something precise about what sort must do. They are also directly runnable. The boundary between documentation and test is gone.
The same idea appears in fast-check for TypeScript, Hypothesis for Python, and PropEr for Erlang. Across ecosystems, the pattern is the same: write down what must be true, and let the framework generate counterexamples.
Property-based testing also makes explicit something that unit tests obscure. A unit test is a single data point. A property is a universal claim. When you write prop_sortPreservesLength, you are writing a specification fragment, something your implementation must satisfy for all inputs, not just the ones you remembered to try.
System-Level Specs: TLA+ and Alloy
At the distributed systems level, this dynamic shows up in tools like TLA+, Leslie Lamport’s specification language. TLA+ specs are formal, machine-checkable, and executable in the sense that the model checker can explore all reachable states.
AWS has publicly documented their use of TLA+ to specify distributed protocols. The specs are not documentation sitting alongside code. They are checked artifacts that find real bugs before implementation begins.
Alloy works similarly for structural properties. You write a model of your system’s state space, and the Alloy analyzer checks whether your invariants hold. Again, the spec is the artifact, and it runs.
The gap between these tools and running production code is real, but it is an operational gap, not a conceptual one. The TLA+ spec for a consensus protocol and the Go implementation of that protocol describe the same behavior at different levels of abstraction. One runs on the model checker; the other runs on hardware. The underlying specification is shared.
Refinement Types: Closing the Gap in Production Code
Liquid Haskell brings refinement types to ordinary Haskell. You annotate functions with refinement predicates that an SMT solver checks at compile time:
{-@ safeDiv :: Int -> {v:Int | v /= 0} -> Int @-}
safeDiv :: Int -> Int -> Int
safeDiv x y = x `div` y
The annotation says the second argument must be non-zero. This is a specification embedded directly in the type, checked by Z3 at compile time. No runtime check needed, and no defensive guard cluttering the implementation.
Dafny from Microsoft Research takes this further, building an entire language around pre/postcondition verification. Code written in Dafny carries its specification inline, and the verifier checks that the implementation satisfies the spec before compilation proceeds. The specification is not adjacent to the code; it is interleaved with it.
The LLM Angle
There is a contemporary dimension worth noting. As LLM-based code generation becomes routine, the relationship between specs and code shifts again.
When you prompt an LLM with a detailed natural language description, you are writing a specification. The LLM produces an implementation. The more precise your description, the more constrained the output. At the extreme, if your description is precise enough to have only one valid interpretation, you have effectively written the code in a higher-level notation and used the model as a compiler.
This is already how some teams work. Write the TLA+ model, use it to drive test generation, use the tests to verify LLM-generated implementations. The formal spec becomes the ground truth that everything else is checked against, including the generated code.
The observation from the original article extends naturally here. The spec was always doing the real work. The code was always just the spec at a level of precision the machine could execute directly.
What This Changes
The practical takeaway is less about adopting proof assistants and more about where you invest your precision. A vague spec generates ambiguity at every downstream step. Developers interpret it differently. Tests cover different cases. The implementation drifts from the intent without anyone noticing.
A precise spec, whether it is a type signature, a property test, a formal model, or a structured prompt, does real work regardless of whether it is technically “code.” The tools that make specs executable, QuickCheck, Liquid Haskell, TLA+, Dafny, Lean, have been around long enough that the argument is not theoretical.
The question is only how far along the precision gradient you need to go for a given problem, and whether you are being honest about what your current spec actually specifies.