Back in January, Martin Fowler sat down with Unmesh Joshi and Rebecca Parsons to talk about how LLMs interact with the what/how distinction in software design. Two months on, the conversation holds up as a useful framing, but I think it undersells just how deep that loop goes historically, and what it actually means to have LLMs operating inside it.
The what/how problem is not a modern concern. It is the oldest recurring problem in programming, and every major paradigm shift in the field can be read as a new attempt to answer the same question: how do we let developers express intent without being buried in mechanism?
The Loop Before LLMs
Dijkstra made the first systematic case for it in his 1972 “Notes on Structured Programming.” His argument was not just about eliminating goto; it was about building programs as hierarchies where each level is a coherent “what” relative to the level below it. Stepwise refinement, the methodology he and Wirth developed, is literally an iterative what/how decomposition: you write what you want at a high level, then refine it into how to achieve it at the next level down.
SQL is probably the most commercially successful application of this principle. You write SELECT orders WHERE customer_id = 42 and describe what you want; the query planner decides how to retrieve it. Robert Kowalski formalized the split even more crisply in 1979 with the equation Algorithm = Logic + Control: logic is the what (declarative specification of relationships), control is the how (search strategy). Logic programming languages let you write the logic and delegate the control to the runtime.
David Parnas’s 1972 paper “On the Criteria To Be Used in Decomposing Systems into Modules” put the principle into object-oriented terms: each module hides a design decision (a how) behind an interface that expresses only what the module does. Every interface/implementation split in every OOP language since traces back to this idea.
Eric Evans’s domain-driven design restated it again in 2003, this time as a cultural commitment: the domain model (what the business does, in business terms) must not be polluted by infrastructure concerns (how it communicates with databases or networks). The ubiquitous language is, precisely, a shared vocabulary for expressing what without reference to how.
Fowler’s own work revisits the theme constantly. The Fluent Interface pattern is about making code read like a description of desired state. The DSL book he wrote with Parsons argues that domain-specific languages exist to raise the abstraction level so domain experts can express what they want without needing to understand how it is accomplished. The semantic model at the center of that book, the object graph that represents intent, is defined independently of any surface syntax. The what can be captured in multiple notations; the how is delegated to the model underneath.
Why This Keeps Mattering
Cognitive load explains the persistence of this problem better than any purely aesthetic argument. George Miller’s 1956 paper established that human working memory holds roughly seven chunks simultaneously. A well-chosen abstraction collapses multiple implementation details into a single named concept, so it occupies one slot in working memory regardless of how many lines it contains. A poorly chosen abstraction, or an abstraction that leaks (Spolsky’s Law of Leaky Abstractions), forces you to hold the contents of the abstraction and the higher-level context at the same time, and you run out of working memory fast.
This is why Fowler’s extract-function refactoring is not merely aesthetic tidying. The name is the what; the body is the how. The calling code holds one chunk instead of N chunks. Every abstraction boundary that holds up under use is reducing cognitive load for future readers, including future you.
Where LLMs Enter the Loop
LLMs change the picture in a specific and limited way. Pre-LLM, the what/how gap was bridged by programmers. You understood the intent (what), and you wrote the code (how). The translation was the core of the job. LLMs automate that translation in a general way that high-level languages, visual programming tools, and fourth-generation languages could not.
Andrej Karpathy’s observation from 2023 that English is becoming the hottest new programming language was pointing at exactly this: natural language can now function as a what layer that sits above traditional code. You describe intent; the model generates implementation. The abstraction stack gains a new level at the top.
Microsoft’s Semantic Kernel framework is the most explicit commercial implementation of this idea. It formally separates semantic functions (described in natural language, the what) from native functions (implemented in code, the how). The kernel routes between them. The LLM sees descriptions; it never sees implementations. That is a strict what/how separation enforced by framework architecture.
OpenAI’s function calling schema works the same way. The schema (name, description, parameters) is the what; the function body is the how. The LLM generates calls based solely on descriptions. It cannot, and should not, inspect implementations to decide whether to use them.
The Gap Moves, It Does Not Close
Here is where I think Fowler’s conversation is worth extending. LLMs do not eliminate the what/how gap; they relocate it.
Before LLMs, the gap was between human intent and code. You carried the intent in your head and wrote code that expressed it, imperfectly, in a way that future readers would have to reconstruct. The gap lived between the programmer’s mental model and the artifact they produced.
With LLMs generating the how from your what, the gap shifts. It now lives between your natural language specification and the generated code. That gap is harder to inspect, not easier. When you wrote the code yourself, you at least knew what you meant when you wrote each line. When the LLM writes the code, you have to verify that the generated how correctly implements your what, and you are doing that verification across a translation you did not perform.
This demands better what specification discipline, not less. The discipline of writing good prompts is partly the discipline of writing good specifications. Few-shot prompting (showing examples of desired behavior without specifying algorithm) is declarative programming applied to natural language. Chain-of-thought prompting asks the model to externalize its how, which makes verification easier. Writing a SPEC.md before asking an LLM to generate implementation is Readme-Driven Development with a sharper edge: the spec is now the primary artifact and the code is derivative and regenerable.
The Verification Problem Gets Harder
The deepest tension in LLM-assisted development is this: formal verification and mechanical checking require that the what be expressed formally enough to compare against the how. When both are informal (natural language intent, generated code), there is no mechanical check that the how correctly implements the what.
This is why property-based testing (QuickCheck, Hypothesis) is more valuable alongside LLM code generation than it was before. A property is a machine-checkable what specification. When the LLM generates a function, you can run the property tests to verify the what/how alignment, at least partially. It is also why the combination of formal specification languages and LLMs is an active research area: if you can express the what in TLA+ or Coq, you have a mechanically checkable specification to validate generated implementations against.
Fowler and Parsons have spent their careers arguing that the what/how distinction is where the hard design work lives. LLMs change who performs the translation but not where the difficulty concentrates. The work of specifying what a system should do precisely enough to generate correct implementations from those specifications is, if anything, more demanding now than it was when programmers were the only translators available.
Abstraction design was always the job. The loop between what and how is just more visible when an LLM is running it.