LLMs Changed Where the What/How Loop Breaks, Not Whether It Does
Source: martinfowler
Back in January, Martin Fowler published a conversation between himself, Rebecca Parsons, and Unmesh Joshi about how LLMs reshape the abstractions we build in software. The central idea is what they call the “what/how loop”: the iterative process of defining intent at one level of abstraction and filling it in with implementation at the next level down. Their framing treats this as the core mechanism for managing cognitive load in software development, and argues that LLMs are changing how we navigate it.
That framing is right. But it understates the loop’s age. The what/how loop is not something AI introduced to software engineering. It has been the field’s foundational structural problem since its beginning, and every major paradigm in programming history has been a renegotiation of where the loop sits and who or what crosses it. Seeing the Fowler conversation through that historical lens helps explain both why LLMs feel genuinely different from prior automation waves and why they keep breaking in the same places.
The Loop Is Fifty Years Old
The goto debate in the late 1960s was not, at its core, an argument about code readability. It was an argument about abstraction levels. Dijkstra’s 1968 letter to the Communications of the ACM argued that goto conflated the “what” of control flow intent with the “how” of memory jumps. Structured constructs, if/else/while/for, restored the separation: the loop structure is the “what”; the compiled machine code is the “how.” A programmer could reason about control flow without holding machine state in mind simultaneously.
David Parnas formalized the same principle in 1972 with information hiding. Modules should hide design decisions that were likely to change. The public interface is the “what”; the implementation is the “how.” The reason to hide the “how” is not aesthetic: the “how” is the part that evolves, and every other module that depends on it pays the cost of that evolution unless a stable “what” sits between them.
Bertrand Meyer’s Design by Contract in the late 1980s made this formally explicit. Pre-conditions, post-conditions, and invariants are a machine-readable “what.” The implementation is whatever satisfies the contract. The “how” is unconstrained provided the “what” is met.
Object-oriented programming, despite its cultural associations with objects and messages, is fundamentally an argument about where to draw the what/how line. The class interface is the “what.” The method bodies are the “how.” Polymorphism, the property that makes OOP’s flexibility possible, is simply a mechanism for substituting one “how” for another under a stable “what.”
None of this was invented by LLMs, and none of it was ever fully resolved.
The Loop Has a Structural Failure Mode
The what/how loop fails in a specific and repeatable way: whenever a layer of “how” generation becomes cheaper, the bottleneck shifts upward. The “what” at the level above has to become more precise, and that precision is hard.
Fourth-generation languages in the 1980s made generating database queries cheap. The bottleneck shifted to specifying what the query should produce. Business analysts who expected to replace developers found that precise specification required the same kinds of thinking as programming. The syntax got easier; the conceptual work did not.
Domain-specific languages make generating implementations cheap for a specific problem class, but writing a good DSL requires accurately modeling the “what” of an entire domain, which is a harder problem than writing any single implementation within it. The CASE tools of the late 1980s and early 1990s tried to automate “how” generation from visual diagrams and mostly failed, not because the generation mechanism was wrong, but because the analysts using the tools lacked the conceptual vocabulary to specify systems precisely enough for code generation to work.
This is not a failure of the automation tools. It is a structural property of the loop: every time you push “how” generation down, the “what” specification one level up becomes the binding constraint. The bottleneck relocates; it does not disappear.
What LLMs Specifically Change
LLMs change the economics of the loop at an unusual scope. Where 4GLs automated one class of “how” generation, and DSLs automate another, LLMs work across arbitrary “how” generation: REST handlers, database schemas, UI components, async event processors, test scaffolding, configuration files. The breadth exceeds any prior automation tool.
The Fowler conversation correctly identifies that this shifts the human’s role toward specifying “what” more carefully at every level. You are less likely to write the implementation and more likely to specify the interface, the contract, the behavioral expectation. The implementation gets generated. The specification has to be correct.
But the loop does not change structurally. It changes where it breaks.
With hand-written code, the loop fails when a developer loses track of the “what” while implementing the “how”: they write code that technically works but drifts from the intent. With LLM-generated code, the loop fails when the “what” specification is underspecified: the LLM generates a plausible “how” that satisfies the literal prompt but violates the surrounding “what” context that the prompt never stated.
Erik Doernenburg’s experiment with CCMenu, documented on the Fowler site, is a concrete instance of this failure. The agent generated a correct “how” for the feature as specified: the feature worked. But it violated the architectural “what” of the existing codebase. Module structure, abstraction boundaries, naming conventions: these are “what” specifications embedded in the code’s existing design. They were not in the prompt. The LLM had no way to honor them unless they were explicitly stated, so it produced a technically correct but architecturally incongruent result.
A GitClear analysis from 2024 tracking AI-assisted commits across a large corpus found that code duplication nearly doubled year over year among teams using AI coding tools. That statistic is the loop failure rate, made measurable. The agent optimizes for the “what” at the feature level while degrading the “what” at the architecture level, because the architecture-level “what” was not in the specification it received.
The “What” Has to Go All the Way Down
The practical implication is that the “what” specification now has to reach further into the system than it used to.
Consider a TypeScript codebase with a clearly defined interface:
interface PaymentProcessor {
charge(amount: Money, method: PaymentMethod): Promise<Receipt>;
refund(receiptId: string, amount: Money): Promise<RefundConfirmation>;
}
That interface communicates the “what” to the LLM without requiring it to read the implementation. When the agent is asked to add a new payment flow, the interface constrains what a correct “how” looks like. The agent cannot accidentally create a parallel implementation of payment processing without explicitly breaking the interface contract.
Contrast that with a JavaScript codebase where payment processing lives in a utils.js file as a collection of functions with no formal interface. The agent can read those functions, but it cannot infer from the code alone what the “what” is. It will generate something plausible based on the immediate prompt. Whether that something fits the design intent depends on luck and the quality of the prompt.
This is why TypeScript codebases tend to outperform JavaScript ones in AI-assisted coding tasks. It is not that LLMs understand TypeScript better. It is that TypeScript codebases make the “what” accessible at lower reading cost. The harness engineering framing that Birgitta Böckeler discusses on the Fowler site points at the same mechanism: small focused modules with descriptive names, standard interface patterns, and strong types all compress “what” information into forms the LLM can access without tracing through the “how.”
Rahul Garg’s design-first collaboration pattern is a procedural implementation of the same principle. You do not generate code until the “what” is explicit enough to be stated as an interface. Misunderstandings caught during interface design are cheap. Misunderstandings caught after 300 lines of generated code are not.
The Frontier Keeps Moving
One thing the Fowler conversation gestures at but does not fully explore is that the frontier does not stabilize. What counts as “what” versus “how” changes as tooling improves.
A year ago, implementing a REST API handler was “how.” With enough scaffolding, it is now “what”: you specify the endpoint behavior, the handler gets generated. The same progression has happened repeatedly in the history of the field. Assembly was “how” until compilers took over, and assembly became “what” for a narrower class of problems. Memory management was “how” until garbage collection took over for most contexts. Type checking was “how” until type systems automated it.
The loop does not end when a layer gets automated. It moves. Human responsibility migrates upward, the required precision at the new boundary increases, and the skills that matter are the ones that can produce accurate “what” specifications for increasingly complex systems.
Unmesh Joshi, one of the conversation’s co-authors, wrote a related piece noting that developers who delegate “how” to LLMs before understanding it themselves lose the ability to evaluate whether the generated “how” is correct. That is the same loop, expressed as a learning problem: you cannot close the what/how loop for an LLM if you do not understand the “how” well enough to verify the output.
The loop is not new. Dijkstra was writing about it in 1968. What is new is the breadth of the frontier and the pace at which it is moving. The structural challenge, specifying “what” precisely enough that the “how” generation succeeds, is the original challenge of software engineering. LLMs do not dissolve it. They relocate it upward and widen the surface area where it can fail.