The what/how distinction sits at the center of most abstraction work in software. When you write SQL, you specify what data you want, not how the database retrieves it. When you write a Kubernetes manifest, you declare the desired state and let the scheduler figure out placement. When you write a type signature in Haskell, you describe what a function should produce and let the compiler verify the how. The pattern is everywhere, and so is the tension: the more you separate intent from implementation, the more you lose direct control over correctness, performance, and behavior.
Martin Fowler, Rebecca Parsons, and Unmesh Joshi took up this tension in their January 2026 conversation, asking specifically what LLMs add to the picture. Their framing is worth engaging with directly. They treat the challenge as building systems that survive change by managing cognitive load, and they see the what/how mapping as the mechanism for doing that. This is a more useful lens than the typical “LLMs write code for you” framing, and it also has more history behind it than most discussions of LLM-assisted development acknowledge.
The Problem Is Older Than LLMs
The dream of expressing what you want rather than how to achieve it is as old as programming. Grace Hopper’s vision for machine-independent programs was rooted in it. COBOL was sold, partly, as a business-readable specification language. Fourth-generation languages in the 1980s promised that business analysts could express requirements and receive working software. Most of these efforts fell short not because the idea was wrong but because the translation from what to how turned out to require precision that natural language and semi-formal business logic could not provide.
SQL succeeded where 4GLs largely failed because it narrowed the scope dramatically. Set-based relational queries have a clean declarative semantics; general business process logic does not. The query optimizer can enumerate join strategies because the semantic contract of “give me these rows” is well-defined. There is no equivalent optimizer for “process this insurance claim correctly.”
Fowler’s earlier work on domain-specific languages explored the same territory through a different approach: build a vocabulary that lets domain experts express what in terms that map cleanly onto a how that developers control. The constraint there was identical: DSLs work best in narrow, well-understood domains where the abstraction boundary is stable. Behavior-Driven Development, with its Gherkin syntax, attempted something similar for test specifications, trying to write what-level statements that could be mechanically linked to executable test steps.
What Changes With LLMs
LLMs do not solve the semantic narrowness problem. They route around it through a different mechanism: they have enough statistical knowledge about code, natural language, and domain conventions that they can attempt a what-to-how translation even when the domain is broad and the specification is underspecified. This is structurally different from DSLs or 4GLs. The generated how is often wrong, sometimes subtly wrong, but the cost of generating a first attempt is low enough that iterating toward correctness becomes tractable in a way it was not before.
The conversation between Fowler, Parsons, and Joshi frames this as a loop, and that framing is precise. The loop is: express what you want, receive a generated how, evaluate it, refine your what, repeat. This is not a one-shot translation. It is a conversation that progressively clarifies intent and narrows the implementation space. The word “loop” matters here because it implies iteration is load-bearing, not incidental.
This changes the economics of abstraction translation in a specific way. Before LLMs, closing the gap between what and how required either a programmer who understood both the problem domain and the implementation space, or a formal specification language that could be mechanically compiled. Both are expensive. LLMs make first-draft translation cheap, which shifts the programmer’s job toward evaluation and refinement rather than initial construction. Whether that shift is net positive depends entirely on the quality of the evaluation step.
Cognitive Load as the Underlying Motivation
The cognitive load framing from the conversation is where the real insight lives. Abstractions exist because working memory is finite. When you call collections.sort(), you do not need to hold a TimSort implementation in your head; you need the contract: it returns elements in order, stably, in O(n log n) time. The implementation is hidden, and that hiding is structural, not cosmetic. It lets you reason about the sort as a unit.
LLMs can participate in this compression in both directions. Going down the abstraction stack, they expand a high-level what into low-level implementation details, allowing the programmer to stay at the conceptual level longer. Going up the stack, they summarize existing how code into what descriptions, which is useful when reading unfamiliar codebases or building a mental model of a large system. Both directions genuinely reduce cognitive load, which is why the productivity improvements in LLM-assisted development are real and measurable in many contexts.
The risk in the downward direction is that the generated how creates hidden complexity. If the programmer never engages with the implementation, the abstraction leaks in ways they cannot anticipate. Consider a case that comes up regularly in practice: you ask an LLM to generate a caching layer, it produces something that looks correct, and you move on. Later, you discover the cache invalidation logic is wrong in a specific edge case involving concurrent updates. The generated code satisfied your what, but the how contained a subtle race condition that you would have caught had you written it yourself.
This is not unique to LLMs. It is the same risk that comes with any black-box dependency. The difference is that a standard library has been reviewed, tested at scale, and documented; generated code has not. The abstraction boundary is in the same place, but the trust properties are different.
The Loop Requires Verification Infrastructure
If the what/how loop depends on evaluating the generated how, then the quality of that evaluation is the rate-limiting step. Most development workflows do not make evaluation easy, and this is where many LLM-assisted development practices fall short in practice.
Unit tests provide partial evaluation: they check specific cases, not general correctness. Type systems provide structural verification, particularly in languages with expressive type systems like Rust, TypeScript with strict null checks, or Haskell. Property-based testing frameworks like Hypothesis for Python or fast-check for JavaScript check invariants across generated inputs, which is a closer fit for verifying generated code than example-based tests.
Code review is another evaluation mechanism, but it faces a specific challenge with LLM-generated code: the reviewer may not know what the original what was. When a developer writes code, there is usually shared context about the intent. When the code was generated by an LLM, the intent lives in a prompt that may not be visible in the diff. This is a documentation problem as much as a review problem, and it suggests that good what/how loop practice requires preserving the what alongside the how, whether as a comment, a specification file, or a structured prompt in version control.
The most effective setups I have seen treat the evaluation step as a first-class concern: type-level constraints that the compiler enforces, property tests covering the core invariants, and explicit documentation of the what adjacent to the generated how. Without that infrastructure, the loop degenerates into: specify what, accept whatever was generated, discover problems in production.
Where the Abstraction Breaks Down
The what/how loop works best when the what is precise enough to significantly constrain the how. Natural language is notoriously imprecise. The gap between what the developer meant and what the LLM understood is a source of systematic errors that look like correctness but are not. The LLM produces confident, well-structured code that satisfies the literal words of the prompt while missing the intent. This is not a problem that better models fully resolve; it is a problem of specification precision that predates LLMs entirely and is the reason formal methods researchers have spent decades arguing that you eventually need machine-checkable specifications.
Fowler’s broader body of work returns repeatedly to the boundary between human-readable and machine-checkable specifications. The conversation about LLMs and the what/how loop sits in that same tradition. LLMs extend the range of what you can specify informally while still getting useful output, but they do not remove the fundamental trade-off between informal expressiveness and formal precision.
There is also a second-order effect worth noting. If LLMs generate the how routinely, the developer community’s collective familiarity with implementation details will erode over time. Junior developers who grow up using LLM-assisted workflows may have less intuition about the how than developers who wrote everything from scratch. This is arguably fine, in the same way that most web developers today do not need to understand TCP congestion control. But the analogy only holds if the abstractions are as stable and well-verified as TCP. Generated application code is not.
What Remains Human Work
Fowler, Parsons, and Joshi point toward a conclusion that is consistent with the history of abstraction tooling: LLMs change the mechanics of moving between abstraction levels, but they do not change the need to understand both levels. The programmer who can only work with the what and has no model of the how cannot evaluate whether the generated translation is correct. They can run the tests, but they cannot reason about the edge cases the tests do not cover.
This is the same conclusion that emerged from 4GL debates in the 1980s and from no-code platform debates in the 2010s. The tools that successfully raise the abstraction level are the ones that give practitioners enough visibility into the implementation to recognize when something is wrong. SQL succeeded partly because EXPLAIN exists, because query plans are readable, and because developers who use databases heavily develop intuition about execution even without writing the optimizer themselves.
The what/how loop is a productive frame for LLM-assisted development precisely because it acknowledges that generation and evaluation are both necessary. LLMs make the generation step faster and cheaper. The evaluation step, and the verification infrastructure that makes it reliable, remain the work that determines whether the loop produces software worth running.