· 8 min read ·

LLMs and the Abstraction Loop That Has Been Running Since Assembly

Source: martinfowler

Back in January, Martin Fowler, Unmesh Joshi, and Rebecca Parsons sat down to talk about how LLMs are reshaping the abstractions we build software with. Their framing centers on a recurring challenge in software: the tension between what a system should do and how it should do it. They call this the what/how loop, and they frame managing it as the core of building systems that survive change. The conversation is worth reading, but it also points at something larger than LLMs specifically. The what/how loop is not a new problem LLMs introduced. It is the oldest structural problem in programming, and understanding its history makes the LLM moment clearer.

The Loop Predates LLMs by Decades

Programming has always been a negotiation between intent and implementation. You want to sort a collection (what); you need to choose a sort algorithm, reason about stability, consider memory constraints (how). A well-designed abstraction hides the how and lets you think at the what level. Every major leap in programming languages and paradigms has been an attempt to push that boundary upward.

Structure and Interpretation of Computer Programs, Abelson and Sussman’s foundational MIT text, opens with the observation that programs must be written for people to read, and only incidentally for machines to execute. That sentence is a claim about the what/how split: the readable part is the what, the executable machinery is the how. The discipline of software engineering since then has largely been about making the two correspond.

SQL is one of the most successful what/how abstractions ever deployed. You write SELECT customer_id, SUM(amount) FROM orders WHERE created_at > '2025-01-01' GROUP BY customer_id and the query optimizer decides join order, index selection, and scan strategy. You never see the B-tree traversal or the buffer pool management. The abstraction holds well enough that most application developers never need to think about it, except when it leaks, which it does, reliably, the moment data grows past a certain size or a query plan regresses after a statistics update.

Joel Spolsky captured this failure mode in 2002 with the Law of Leaky Abstractions: all non-trivial abstractions leak. The what/how loop never fully resolves; it only moves the boundary. When the abstraction breaks, you have to understand the how anyway. This is as true for SQL query optimizers as it will be for LLM-generated code.

React’s declarative rendering model is a more recent version of the same bargain. You describe what the UI should look like at any moment (the what); React figures out the DOM diffing and reconciliation (the how). Infrastructure as Code tools like Terraform and Kubernetes take the same approach at the infrastructure layer: you declare what you want, and the orchestrator figures out the sequence of API calls to achieve it. The pattern is consistent across decades of tooling.

Where LLMs Fit Into This History

LLMs are often described as code generators. That is accurate but not the most useful frame. The more precise description is that LLMs can operate across multiple what/how levels simultaneously, and they make the iteration loop between those levels dramatically faster.

Consider what happens when you prompt a model with “write a function that retries an HTTP request with exponential backoff.” You are expressing a what. The model produces a how. The interesting part is what happens next. The generated implementation surfaces implicit decisions you did not specify: does it retry on all exceptions or only transient ones, does it cap the retry count, does it preserve the original error on final failure? Each of these is a place where the how makes the what ambiguous. You refine the prompt. The loop runs again.

# First pass: vague "what" produces a vague "how"
def retry_request(url):
    for i in range(3):
        try:
            return requests.get(url)
        except:
            time.sleep(1)

# The implementation forces questions:
# Which exceptions should trigger a retry?
# Fixed delay or exponential backoff?
# What should happen when all retries are exhausted?

# Refined "what": retry on transient errors with exponential backoff,
# raise the original exception on final failure
def retry_request(url, max_retries=3, base_delay=1.0):
    for attempt in range(max_retries):
        try:
            response = requests.get(url, timeout=10)
            response.raise_for_status()
            return response
        except (requests.Timeout, requests.ConnectionError) as e:
            if attempt == max_retries - 1:
                raise
            time.sleep(base_delay * (2 ** attempt))

The second version came from seeing the first version’s implicit decisions. That is the loop. LLMs did not invent it; they compressed it. In traditional development, the cycle between a specification and its implementation runs over hours or days, mediated by meetings, tickets, and pull request reviews. With an LLM, it runs in seconds.

Cognitive Load and What Changes Hands

The Fowler conversation frames the challenge as managing cognitive load, which is the right framing. Systems survive change when the humans building them can maintain coherent mental models of what the system does. Good abstraction is cognitive load management: it puts complexity somewhere you do not have to think about it unless something breaks.

What LLMs change is which cognitive load gets offloaded. Before LLMs, a developer had to hold implementation details in working memory while building. The specific syntax for a retry library’s API, the exact parameters for a backoff algorithm, the boilerplate for a database transaction, all of that occupied mental space alongside the higher-level design. LLMs can absorb the how-level details and regenerate them on demand. What needs to be held is the what: the intent, the constraints, the edge cases that matter.

This shifts the locus of mastery in a meaningful way. Deep knowledge of the how, the exact method signatures in a library, the specific flags for a compiler option, matters less when the how can be regenerated. Deep knowledge of the what, understanding what properties a robust retry strategy needs, what consistency guarantees a transaction must provide, what invariants a data structure must maintain, matters more.

Fred Brooks distinguished between accidental complexity and essential complexity in his 1986 essay No Silver Bullet. Accidental complexity is the friction introduced by tools and representations, the syntax noise, the boilerplate, the impedance mismatch between your mental model and the language you are writing in. Essential complexity is the inherent difficulty of the problem itself. LLMs absorb a substantial portion of accidental complexity. They do not touch essential complexity.

The Leaky Abstraction Problem, Amplified

Spolsky’s law applies to LLM-generated code as directly as it applies to SQL query planners. The generated how will diverge from your what in subtle ways, and when it does, you need to understand the how to fix it. The challenge is that LLM-generated how can be more opaque than hand-written how, because it was not constructed incrementally from your mental model of the system.

This creates a specific failure mode: asymmetric abstraction. You work at the what level, the LLM generates the how, but when the abstraction breaks, you are forced back into the how with no navigational memory. The implementation was materialized from a probability distribution, not grown from your understanding. You may not know where to look.

One response to this is to invest more heavily in the what layer. If the what is specified precisely enough, it becomes possible to verify that the generated how satisfies it, and to regenerate the how when it does not. This pushes toward approaches that were always good practice but become more structurally important: type-driven development, property-based testing, contract-first API design. These are ways of making the what precise enough to serve as a specification against which the how can be checked.

Intentional Programming, Approached Differently

In the 1990s, Charles Simonyi led a project at Microsoft Research called intentional programming. The core idea was that programs should be stored and manipulated at the level of programmer intent rather than syntactic text. You would define domain-specific notations for your problem space and reason about the program in terms of what it means, not how it reads in a general-purpose language. The project was technically ambitious and never shipped as a mainstream tool.

LLMs are a different approach to the same intuition. They do not formalize intent into a structured representation; they work with natural language, which is far more accessible and far less precise. The tradeoff is that natural language expresses intent quickly but cannot enforce it, while formal systems like type systems and formal specifications can enforce constraints but require upfront investment.

The what/how loop in LLM-assisted development probably looks different at different stages of a project. Early, when requirements are exploratory and vague, natural language what combined with LLM-generated how enables rapid iteration and helps you discover what you actually need. Later, as requirements stabilize, the informal what needs to harden into something more durable: types, tests, contracts. The LLM transitions from co-designer to code generator operating on well-specified inputs.

What This Means for Architecture

If LLMs can generate reliable how from what, the durable artifact in a codebase is the what. This is already playing out in infrastructure. Terraform modules and Kubernetes manifests succeed precisely because they are declarative specifications of desired state, not scripts of how to reach it. They can be version-controlled, reviewed, and understood independently of the systems that interpret them.

Something analogous may develop in application code. Tests are currently the best candidate for what artifacts: they specify behavior precisely enough to generate against, they are cheap to write relative to formal specifications, and they provide immediate feedback on whether the generated how satisfies the stated what. The trend toward behavior-driven development takes on new significance when the thing being driven is a language model rather than a hand-written implementation.

Code review changes too. If the how is generated, reviewing it for style or implementation preferences is less valuable than verifying it correctly implements the intended what. The review question shifts from “is this the right way to write this” to “does this implementation satisfy the invariants we care about.”

None of this is settled. The Fowler conversation is valuable because it locates the question correctly: not whether LLMs will replace programmers, but where the what/how boundary sits now and who is responsible for each side. That boundary has moved continuously throughout the history of computing. Assembly programmers managed hardware registers and memory layouts so that C programmers did not have to. C programmers managed memory allocation so that garbage-collected language programmers did not have to. SQL hid join algorithms so that application programmers did not have to think about them. Each shift in the boundary changed what knowledge mattered, not whether knowledge mattered.

LLMs are moving the boundary again, upward and toward intent. The loop keeps running.

Was this interesting?