Generated Code Is Only Half the Abstraction

Back in January, Martin Fowler, Unmesh Joshi, and Rebecca Parsons published a conversation about how LLMs are reshaping the abstractions we build software around. The framing they use is the what/how loop: the ongoing challenge of separating what a system should do from how it does it, and managing cognitive load in the process. Two months on, the ideas there have clarified some things I had been thinking about loosely.

The What/How Divide Is Older Than Any Tool

Every major abstraction in programming history can be read as an attempt to separate intent from mechanism. SQL is the clearest case. You write:

SELECT name FROM users WHERE age > 30 ORDER BY created_at DESC;

The query planner decides whether to use an index, how to traverse it, and how to sort the results. You state the what; the engine owns the how. The interface between them is a formal grammar with documented semantics.

Declarative UI frameworks converged on the same logic. In React, you describe what the UI should look like given some state, and the reconciler figures out the minimal DOM mutations needed to achieve it. The separation is clean enough that the renderer is swappable. You can target the DOM, React Native, or a canvas renderer without touching component logic, because the what layer is insulated from the how.

Infrastructure tooling followed the same pattern. Terraform lets you describe desired state:

resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1f0"
  instance_type = "t3.micro"
}

The what is the resource block. The how is the diff, plan, and apply cycle that Terraform manages. You do not need to track the EC2 API call sequence; you only need to know what you want to exist.

The same logic appears in build systems like Bazel, in regular expressions, in CSS, and in Unix pipelines. Each time, the benefit is the same: you give up direct control over the mechanism in exchange for being able to reason at the level of intent without carrying implementation details in working memory. John Sweller’s cognitive load theory gives this a name. Good abstractions reduce extraneous cognitive load, the burden imposed by the representation itself, so you can allocate attention to actually understanding the problem domain.

Where LLMs Fit, and How They Differ

LLMs extend this pattern, but they do it in a way that is structurally different from every prior abstraction in the lineage.

Traditional what/how abstractions have typed, bounded interfaces. SQL has a grammar; Terraform has a schema with validation; React has a component contract enforced by the renderer. The what layer is precise even when it reads as high-level. You know exactly what you are committing to when you write a resource block, and the tooling validates it.

LLMs accept what descriptions in natural language, which is untyped, context-dependent, and carries implicit assumptions that neither party has enumerated. When I describe a function to a coding assistant and ask for an implementation, the what is a prose paragraph, and the how is generated code that may or may not honor the assumptions I did not think to state. The interface between intent and implementation is not a formal language with a specification; it is a prompt with a probabilistic model on the other side.

This means LLMs are not just another point on the abstraction spectrum. They represent a different shape of abstraction, one where the contract between what and how is not explicit, not typed, and not inspectable.

For a lot of work, this does not matter much. If I am building a Discord bot and I need a function that parses a slash command argument string into a typed struct, I can describe the format, hand it to a coding assistant, and get working code in a few seconds. The how is generated; I stayed in the what. For mechanical, bounded tasks, the cognitive load reduction is genuine and the fuzziness of the interface does not cause problems.

The Artifact That Does Not Get Saved

The problem surfaces at maintenance time, and this is where the Fowler, Parsons, and Joshi conversation is pointing at something that deserves more attention.

When traditional abstractions erode, when a query starts returning wrong results or a Terraform plan produces unexpected behavior, you can fall through to the implementation in a structured way. SQL has EXPLAIN ANALYZE. Terraform has terraform plan -out and detailed diffs. React has the component tree in DevTools. There are documented paths from the what layer into the how layer for debugging.

More importantly, the what is always preserved as a legible artifact. The query is still there. The resource block is still there. The component’s render function is still there. You can read the intent, compare it to the behavior, and reason about the gap.

LLM-generated code does not automatically preserve the what. The how is ordinary code, which is readable. But the trace from intent to implementation is not captured in any artifact unless someone deliberately saves it. The prose description, the contextual assumptions, the reasoning about why this approach rather than another, all of it lives in a chat session that gets closed.

Six months later, when behavior needs to change, the developer who encounters that code is reading an implementation and trying to infer intent from it. That is exactly the inverse of what the what/how separation was designed to achieve. The abstraction ran in reverse: instead of preserving intent and treating implementation as replaceable, the implementation was preserved and the intent was discarded.

Architecture Decision Records exist because teams kept losing the reasoning behind architectural choices when people left or context shifted. The problem with LLM-generated code is the same problem at a smaller granularity, happening continuously across a codebase.

Treating the What as a First-Class Artifact

The practical response to this is straightforward in principle and inconsistently applied in practice: keep the what alongside the how.

Some teams are already doing a version of this with CLAUDE.md or AGENTS.md files that describe intent for AI coding tools. That is a start, but those files tend to capture project-level context, not the intent behind individual generated functions or modules. The gap is at a finer granularity.

Concretely, this means a few things:

First, detailed comments on generated code should describe the intent that produced it, not just what the code does. There is a convention in many codebases to avoid restating the code in comments, which is good advice for hand-written code but misses the point for generated code. The comment is not meant to explain the how; it is meant to preserve the what.

Second, the prompts or specifications used to generate significant pieces of code should be version-controlled alongside the code. Some teams keep these in a specs/ or prompts/ directory. The practice is not yet standard, but the rationale is the same as for any other design document.

Third, tests serve as executable what descriptions, and this is where the traditional testing argument for LLM-generated code actually has teeth. A test suite does not just verify that generated code works; it encodes what the code is supposed to do in a form that survives the session. When you regenerate or modify the implementation, the tests tell you whether the what was honored. This is why teams using LLMs heavily report that test coverage becomes more important, not less: the tests are doing double duty as the preserved intent.

What Survives Change

Fowler, Parsons, and Joshi frame the engineering challenge as building systems that survive change. LLMs are changing where the what/how boundary can live, and the ability to regenerate implementations from descriptions is a genuine advance. If your what is well-preserved and your tests are solid, regenerating a module is far less risky than it used to be.

The underlying principle does not change, though. Systems survive change when the what is durable and the how is replaceable. LLMs make the how more replaceable than ever; that is the opportunity. The risk is assuming that this comes for free, that you can skip preserving the what because the LLM will regenerate it on demand.

It will not regenerate your intent. It will generate code that plausibly matches a description you provide at that future moment, with that future context, by that future developer. Whether that matches the original intent depends entirely on whether the original intent was preserved somewhere it can actually be consulted.

The what/how loop has been the central organizing challenge of software abstractions for decades. LLMs do not resolve it; they add a powerful new tool that makes the discipline around it both more consequential and easier to neglect.