The What/How Loop Has Been Running Since Assembly, and LLMs Just Changed the Stakes
Source: martinfowler
In January 2026, Martin Fowler, Rebecca Parsons, and Unmesh Joshi published a conversation about LLMs and software abstraction that deserves more attention than it got. The framing they use, the what/how loop, is deceptively simple: you specify what you want at one level of abstraction, something fills in the how at the level below, and you iterate. That loop predates LLMs by about seventy years. Understanding that history makes the present moment clearer.
The Staircase
Every major shift in programming history has been a renegotiation of where the what/how line sits. Assembly language moved it above machine registers. Fortran moved it above mathematical notation. John Backus, who designed Fortran, worried at the time that abstracting away from machine code would degrade programmer discipline. It did not. It relocated the discipline to a different level.
SQL moved the line further. You declare what data you want; the query planner chooses join order, index strategies, and scan types. Developers never see B-tree traversals unless the abstraction leaks. Garbage-collected languages moved memory management below the line entirely. React, Terraform, and Kubernetes pushed state management into declarative specifications: describe desired state, let the tool realize it.
Each step follows the same pattern. The how at one level becomes cheap enough that it stops being the interesting problem, and the what above it becomes the new bottleneck. This is not a coincidence. It is structural.
Fred Brooks described the underlying mechanism in 1986 in “No Silver Bullet”. He distinguished accidental complexity, the friction imposed by tools and notation, from essential complexity, the inherent difficulty of the problem being solved. Every step up the abstraction staircase absorbs accidental complexity. None of them touch essential complexity. The hard part of building correct software has always been understanding what correct means, not writing the syntax that expresses it.
What LLMs Actually Change
LLMs are the newest step on this staircase, but they differ from prior steps in two ways that matter.
First, scope. Earlier abstraction tools were domain-specific. SQL handles queries. Compilers handle one language. Terraform handles infrastructure state. LLMs handle arbitrary how generation across REST handlers, database schemas, test scaffolding, UI components, async processors, and configuration files simultaneously. The breadth exceeds anything that came before.
Second, the nature of the guarantee. Compilers and query planners are deterministic translators. Given the same input, they produce the same output. More importantly, a compiler that accepts your code makes a formal guarantee: the output will faithfully execute the semantics of the input language. A query planner guarantees relational correctness. LLMs are probabilistic translators. The same prompt can produce three different implementations, each of which may contain subtle errors that look structurally plausible.
Parsons frames this in the Fowler conversation using programming language theory. The what/how distinction maps to denotational semantics, what a program means, versus operational semantics, how it executes. Compilers provide a formally correct bridge between these. LLMs provide a statistically likely one. This is not a minor technical distinction. The entire developer workflow around trusting compiler output assumes the formal guarantee. That assumption does not carry over.
The Bottleneck Shifts Upward Again
The history of abstraction tools demonstrates consistently that when how generation gets cheaper, the bottleneck relocates to what specification. Fourth-generation languages made database queries cheap to generate; specifying exactly what query to produce became the bottleneck. CASE tools tried to generate code from diagrams; analysts lacked the conceptual vocabulary to write adequate specs. DSLs require accurately modeling a domain before you can use them.
With LLMs, the same thing is happening at a larger scale and a faster pace.
The CCMenu experiment, documented by Erik Doernenburg on martinfowler.com, illustrates what this looks like in practice. An agent generated code that was technically correct for the specified feature. It worked. But it violated the architectural what of the existing codebase: module structure, abstraction boundaries, naming conventions. These were embedded in the design accumulated over a decade, not in the prompt. Tests did not catch it. Linters did not catch it. Doernenburg caught it because he had built the project himself and held the reference architecture in his head.
This points to a specific failure mode. The structural what, the intent at feature level, was in the prompt. The architectural what, the system-level properties the code must preserve, was not. LLMs optimize for the specification they receive. They cannot optimize for the specification that was never written.
GitClear’s 2024 analysis found that code duplication nearly doubled year-over-year in repositories using AI coding tools. This is precisely what you would expect if the structural what is specified in prompts but the architectural what is not. Agents generate features correctly; they degrade architecture silently.
The What Has to Become More Precise
The practical response is not to stop using LLMs. It is to recognize that the what, the specification side of the loop, now has to carry more weight than it did when developers wrote the how themselves.
When a developer writes code, there is a continuous feedback loop between intent and implementation. Implicit decisions get made during typing, and the developer sees them. When an LLM writes the code, those same decisions get made inside the translation and handed back as output. The developer sees the result, not the decision. Evaluating the result requires understanding what decisions were available, which means understanding the how even when you are not writing it.
Joshi’s companion piece on the learning loop makes this point directly. Developers who shortcut the how-level learning via LLMs lose the evaluative capacity to detect incorrect translations. A developer who has implemented pagination from scratch, hit N+1 queries, and traced a slow index scan has a mental model. That model is what makes LLM output evaluation possible. The productivity gain is real in the short term; the professional development cost accretes silently.
Domain-driven design offers a partial but incomplete answer here. Eric Evans’ ubiquitous language creates shared vocabulary between domain experts and developers, where each term carries stable semantics agreed upon within a bounded context. LLMs absorb domain vocabulary from training data. They learn that Order relates to Customer. They do not learn the bounded-context semantics that give these terms precise meaning within a specific system. An LLM generating a feature involving “active customers” will select one interpretation from its training distribution, not from the bounded context of the system being modified.
What bridges this gap is making semantic contracts machine-verifiable. Strong types that encode bounded context distinctions, so BillingActiveStatus and AccessActiveStatus cannot be accidentally substituted. Architecture fitness functions, as Ford, Parsons, and Kua describe in Building Evolutionary Architectures, are executable tests verifying structural and behavioral properties of a codebase. When an LLM crosses a bounded context boundary, the fitness function fails in CI before any human reviews it.
The Temporal What Nobody Writes Down
There is a category of specification that the current conversation about LLMs and abstraction systematically underweights: the temporal what. Structural intent is about what entities and operations a system should have. Temporal intent is about what happens when two things happen in sequence, or race, or partially fail.
Consider a welcome message feature: send a welcome when a user joins. The structural what is clear. The temporal what involves ordering constraints: write to the database before making the API call, so that a network failure does not produce a welcome message for a user the system does not know about. That constraint is not in “send welcome when user joins.” It is the kind of thing that gets learned by debugging a race condition at 2am, not by reading documentation.
Type systems and interfaces capture structural what well. The temporal what, which concerns ordering constraints, side-effect sequencing, and failure behavior, is almost never written down. Formal methods tools like TLA+ and Alloy exist precisely for specifying this class of property, but they are rarely used outside safety-critical domains. Most teams hold temporal specifications in their heads or in runbooks. LLMs cannot access either.
This is a genuine gap that the current tooling does not address, and it may be the area where probabilistic translation causes the most damage in production systems.
The Loop Does Not End
The Fowler conversation’s most useful contribution is refusing the framing that LLMs will end programming. The what/how loop does not end when a layer gets automated. It moves. The what that used to be a happy path through a domain problem now has to be precise enough that a probabilistic translator can fill in the correct how consistently, including the temporal constraints, the bounded-context semantics, the architectural invariants, and the failure behavior that developers previously encoded implicitly as they wrote.
That is more precision than the discipline has historically required of specifications. Joel Spolsky’s law of leaky abstractions holds that all non-trivial abstractions leak. When they leak, you need to understand what they abstract in order to fix them. LLMs leak. Every probabilistic translator leaks. The question is whether the developers working with them have built enough how-understanding to navigate the leaks.
Fortran did not end the need to understand machine behavior. SQL did not end the need to understand query execution. LLMs will not end the need to understand the code they generate. The loop keeps running. The line just moved again.