Code as a Thinking Tool: Why Programming Languages Won't Disappear When Agents Write Them

Martin Fowler’s site recently published “What is Code” by Unmesh Joshi, framing a question that’s been quietly gnawing at the industry: if agents write the code, do we still need source code at all? Joshi’s answer leans on a distinction worth taking seriously. Code is two things at once. It’s instructions a machine executes, and it’s a conceptual model of the problem domain that humans reason with. The first job could, in principle, be handed off to a sufficiently capable model that compiles intent straight to bytes. The second job cannot, and that’s the part this post wants to dig into.

The two-faces idea isn’t new, but it matters more now

The framing echoes Harold Abelson’s line from SICP: “Programs must be written for people to read, and only incidentally for machines to execute.” Knuth made the same point with literate programming in 1984, treating a program as a document addressed to humans first. Peter Naur went further in his 1985 essay “Programming as Theory Building”, arguing that the real artifact of programming is not the source text but the mental theory the programmers hold about what the system does and why. When the theory dies, the program dies with it, even if the source survives.

Naur’s thesis lands harder in 2026 than it did in 1985. If an LLM writes a thousand lines that pass tests, the source exists, but the theory does not. Nobody holds it. Nobody can answer the next question about it without re-deriving it from the diff. That’s the load-bearing problem behind Joshi’s piece.

Why the “vocabulary” point is the interesting one

Joshi spends time on something that sounds soft but isn’t: building a vocabulary to talk to the machine. The classic reference here is Eric Evans’ Domain-Driven Design, with its insistence on a ubiquitous language shared by domain experts and code. The names of types, functions, and modules are not decoration. They’re the load-bearing structure of the conceptual model.

A quick, concrete example. Consider two implementations of the same logic:

def p(o, c):
    if o.s == 1 and c.b >= o.t:
        c.b -= o.t
        o.s = 2
        return True
    return False

versus

def settle(order: Order, account: Account) -> bool:
    if order.status is OrderStatus.PENDING and account.balance >= order.total:
        account.balance -= order.total
        order.status = OrderStatus.SETTLED
        return True
    return False

Both compile to nearly identical bytecode. To a machine the difference is rounding error. To a human reading the second one for the first time, the entire domain shows up: there is a thing called an order, it has a status that moves through PENDING and SETTLED, settlement requires sufficient balance, and the operation reports whether it happened. The vocabulary is the model. Strip it out and you still have working instructions, but the theory is gone.

This is also the reason “just decompile the binary” was never a real argument against keeping source code. A decompiled binary gives you instructions without vocabulary, which is why reverse engineering tools like Ghidra spend so much effort recovering symbols and inferring types. The lost vocabulary is the expensive part.

Programming languages as thinking tools

Joshi’s second move is to call programming languages thinking tools, not just compilation targets. Iverson made this case explicitly in his 1979 Turing lecture “Notation as a Tool of Thought”, where APL’s notation wasn’t a way to talk to the machine but a way to think about array operations at all. Haskell’s type system does the same kind of work; the “Type-Driven Development” tradition treats types as a medium for reasoning before they’re a medium for checking. Rust’s borrow checker is a thinking tool for ownership, the actual machine code rarely cares.

This is the part of the picture that doesn’t disappear when agents write the code. Even if a model produces the final tokens, the language those tokens live in shapes what the model and the human reviewer can reason about. A pile of generated Python with no types is harder to audit than the same logic in TypeScript with discriminated unions, not because the runtime cares but because the reviewer’s eye and the model’s own next pass have less to grip.

What the LLM workflow actually changes

The practical shift, as Simon Willison keeps documenting, is that prose specifications and code-generating prompts are becoming a real layer of the stack. There’s a temptation to read this as “the prompt is the new source code,” and some commentators have run with it. The honest version is more constrained. A prompt is lossy. Re-running the same prompt against a different model, or the same model on a different day, produces different code. If the prompt were truly the source, builds wouldn’t be reproducible, and a system like Nix or Bazel would have nothing to hash.

What the prompt captures is intent. What the code captures is the resolved model. You need both, and you need a way to keep them in sync. This is roughly the bet behind tools like Aider and Cursor, where the diff between intent and code is treated as a first-class artifact. It’s also why the SPEC-driven approach keeps surfacing: people are rediscovering that prose alone doesn’t pin behavior tightly enough to delete the code underneath.

The ergonomic case for keeping source code around

There’s a tempting endpoint where the agent owns the binary and humans only ever see English. A few reasons that endpoint is further away than it looks:

Debugging is a model-fitting problem. When production breaks, the debugger steps through code, not through a prompt. The recent “AI-assisted debugging” literature consistently finds that engineers still anchor on stack traces and source lines; the model helps, but it helps by reading the same artifact the human is reading.
Diffs are how teams negotiate change. Code review is a conversation conducted in the medium of source diffs. Replace the source with a prompt and the unit of review becomes “trust the model’s interpretation of this paragraph,” which is exactly the artifact engineers tend to push back on in current AI PRs. GitHub’s own Copilot Workspace presents plans alongside code partly for this reason.
Verification needs a fixed target. Formal methods, fuzzers, property tests, and static analysis all operate on source or compiled artifacts. SMT solvers don’t reason about prompts. Until verifiers can take English as input and emit proofs, the code is the surface tools work on.
Modules outlive the people who wrote them. Naur’s theory-building point cuts both ways. The code is a partial, durable encoding of the theory; it’s how the next person reconstructs enough of the model to make a safe change.

Where this lands

Joshi’s piece is short on prescriptions and long on framing, which is appropriate for a question this open. The framing I find most useful is to stop treating “will there be code” as a yes/no question. The instructions-to-the-machine layer will increasingly be generated, recompiled, and regenerated. The conceptual-model layer, which is what good naming, types, and module boundaries actually encode, has to live somewhere humans can read and edit it, because that’s where the theory of the system gets stored.

If anything, the LLM era raises the price of vocabulary work. Models pattern-match on names. Bad names produce bad generations, which produce more bad names, which decay the model that the next prompt depends on. The teams that get the most out of agentic coding are the ones whose codebases were already legible: tight domain language, narrow types, modules that mean what they’re called. The teams that try to skip that work and prompt their way through are doing the equivalent of running a compiler on a corrupted symbol table.

For a longer treatment of the theory-building angle, Naur’s original paper is still the best forty minutes you can spend. For the domain-language side, Evans’ DDD reference is free and concise. Read those alongside Joshi’s article and the shape of the answer comes into focus: source code isn’t going away because the part of it that matters most was never really about the machine.