The Specification Problem That LLMs Keep Exposing

Looking back at Martin Fowler’s conversation with Unmesh Joshi and Rebecca Parsons from January 2026, the framing that stands out most is how consistently the three participants focus on the “what” as a problem of its own. The premise is that software development requires managing cognitive load by keeping the what and how distinct. But embedded in that premise is an assumption worth examining: that the “what” is the tractable part, the thing you hand to a translator, while the “how” is where the complexity lives.

That assumption has never been entirely true, and LLMs are making its falsity visible.

What Formal Semantics Was For

Rebecca Parsons brings programming language theory to the conversation, and her angle matters more than it might initially appear. She describes the what/how separation in formal terms: a program’s semantics (what it means) versus its operational implementation (how those semantics execute). Denotational semantics and operational semantics are two formal frameworks for making this distinction precise.

The reason these frameworks had to be invented at all is that natural language is insufficient for specifying what software should do. Dana Scott and Christopher Strachey developed denotational semantics in the late 1960s precisely because informally-described semantics led to contradictions and ambiguities. When you need to know exactly what a program means, natural language descriptions fail; mathematical notation does not.

This is seventy years of evidence that the “what” is not the easy part.

Most software development obscures this. When you write code yourself, ambiguity in your mental specification gets resolved implicitly in the act of implementation. You cannot write code that is ambiguous about what it does: the code either does one thing or another; you are forced to choose. The specification problem is hidden inside the implementation process.

LLMs separate specification from implementation in a way that makes the ambiguity visible.

What the Loop Teaches About the “What”

The “loop” in the Fowler conversation’s title refers to the iterative relationship between what and how: when implementation reveals unexpected behavior, you reach into the how to diagnose it, and that contact updates your mental model of what you wanted. Joshi’s companion piece, The Learning Loop and LLMs (November 2025), argues that this loop is how developers build the evaluation capacity to use LLM output safely. The argument is largely about learning the “how.”

There is an equally important form of learning running in the same loop that receives less attention: learning to specify the “what” precisely.

Consider what happens when an LLM generates code that does something plausible but wrong. The instinctive diagnosis is that the model did not understand what you meant. The more careful diagnosis is often that your prompt was genuinely ambiguous about something you had not considered.

// Prompt: "return the user's recent orders"

// Generation A: 10 most recent by date
const orders = await db.orders.findMany({
  where: { userId },
  orderBy: { createdAt: 'desc' },
  take: 10
});

// Generation B: all orders from the last 30 days
const thirtyDaysAgo = new Date(Date.now() - 30 * 24 * 60 * 60 * 1000);
const orders = await db.orders.findMany({
  where: { userId, createdAt: { gte: thirtyDaysAgo } }
});

Both implementations are reasonable interpretations of “recent orders.” They produce different results in every non-trivial case. The prompt did not specify what “recent” means. Neither generation is wrong given the prompt; the prompt is incomplete given the domain.

Noticing this, and understanding exactly why both generations are defensible, teaches something about the specification rather than the implementation. That is a different form of learning from what Joshi describes, and it runs in the same loop.

Specification Debt

Ward Cunningham’s technical debt describes the accumulated cost of deliberate implementation shortcuts. There is an analogous phenomenon in specifications: decisions about what software should do that were never made explicitly, only implicitly in the act of writing code. The developer who built the orders page knew intuitively what “recent” meant in their application context. That knowledge is now load-bearing but invisible.

LLMs encounter this specification debt as ambiguity in prompts. Every time you reach for an LLM to extend a system you built, you are forced to externalize some of the implicit decisions that were previously encoded only in your mental model and in the code itself. If you cannot specify those decisions precisely, the LLM has no way to respect them. A more capable model given an ambiguous specification generates a more fluent implementation of one possible interpretation; the precision of the output is bounded by the precision of the input.

Andrej Karpathy’s description of “vibe coding” from early 2025 captures one end of this spectrum: accepting LLM output without reading it, treating generation as a black box that produces something approximately right. It works for prototypes where the specification is loose by design. It fails systematically in production systems where specifications are dense with implicit decisions accumulated over years of development.

What Better Specification Looks Like

Developers who get consistent results from LLMs tend to write prompts that look less like prose descriptions and more like partial formal specifications: they include examples, they name the failure cases, they state constraints explicitly.

// Vague:
"add pagination to the users list"

// Precise:
"add cursor-based pagination to the users list,
using created_at as the cursor column, with a default
page size of 20. Do not use offset pagination.
The response must include a next_cursor field, null when
there are no more results. Handle the case where the
cursor row has been deleted by returning results after
that timestamp."

The second version does the work that a formal specification does: it eliminates degrees of freedom. It forces the specification to make decisions that were previously made implicitly in the implementation. The developer who can write it has already done the design work; the LLM is producing a first-draft implementation of a complete spec.

This is what test-driven development was originally about, before it became primarily a testing practice. TDD was described by Kent Beck as a design discipline: writing the test first forces you to specify the behavior before you implement it. The test is the “what”; the implementation is the “how.” The test cannot be ambiguous in the way prose can be, because the test will either pass or fail.

Developers who use LLMs most effectively in production work are, in many cases, doing a form of specification-first design. They are writing the constraints and examples and edge cases before requesting the implementation. The loop from Fowler’s conversation is running, but the learning happening in it is as much about specifications as about implementations.

Where the Skill Shift Lands

The Fowler conversation frames the core challenge as building systems that survive change by managing cognitive load. The what/how loop is the mechanism. What the conversation leaves implicit is that managing cognitive load at the “what” level requires as much discipline as managing it at the “how” level.

The skill shift LLMs are demanding is not only “understand the how layer so you can evaluate LLM output,” though that remains true and Joshi argues it compellingly. It is also “develop the specification precision to express the what layer completely enough for translation to be reliable.”

These are related skills. Understanding how a system works gives you the vocabulary to specify it precisely; knowing what you want tells you which parts of the how are relevant. The loop runs both ways, as Fowler, Joshi, and Parsons recognize throughout the conversation.

But the practical implications differ. Building “how” knowledge requires engaging with implementations, debugging, and working through failures at the implementation level. Building “what” knowledge requires a discipline of explicit specification: naming constraints, stating assumptions, writing examples, and recognizing when a description is genuinely ambiguous rather than intuitively clear.

Specification languages like TLA+ and Alloy have existed for decades in formal methods research, adopted primarily in safety-critical domains where getting the “what” wrong is too costly to discover in production. The rest of the industry has historically relied on the implicit precision that comes from writing the implementation yourself. LLMs are extending what counts as “the rest of the industry,” and the informal approach is showing its limits.

Formal semantics was invented because natural language descriptions of software behavior are insufficient. LLMs have not changed that fact. They have made the insufficiency unavoidable in everyday development work, which is a different thing from solving it.