Code as a Thinking Tool: Why Source Won't Vanish When Agents Write It

Martin Fowler’s site just published What is Code, a piece by Unmesh Joshi that takes a question I’ve been turning over for months and gives it a useful frame. The question is whether source code survives the agent era. The frame is that code has two jobs, not one: it tells a machine what to do, and it captures a conceptual model of the problem. If you only see the first job, handing everything to an LLM looks like obvious progress. If you take the second seriously, the picture is messier.

I want to push on this from a slightly different angle than the article does. Joshi is mostly making the case that programming languages are thinking tools. I agree with that, but I think the more interesting argument is about where the model actually lives, and what happens when you move it.

The two-purpose view is older than it looks

The idea that code is a model, not just instructions, has been around for a long time. Harold Abelson’s line from SICP is the one everyone quotes: “Programs must be written for people to read, and only incidentally for machines to execute.” Knuth’s literate programming push in the 80s came from the same instinct, that the artifact you maintain is an explanation aimed at humans, with executable code as a byproduct. Eric Evans’ Domain-Driven Design made the model-in-the-code claim central to a whole methodology; the ubiquitous language exists precisely because Evans noticed teams kept losing the mapping between business concepts and the code that implemented them.

What Joshi adds is the agent-era stakes. If you accept that the code is the model, then “the LLM will write it” is not a labor-saving claim. It is a claim about where the model now lives. And that has consequences.

Natural language is a lossy specification format

The pitch for English-as-code is that we will describe what we want in prose and an agent will generate the implementation. The problem is that prose is a worse representation of a domain model than a typed program, and this is not a temporary limitation of current models.

Consider a tiny domain rule: an order can be cancelled only if it has not yet shipped, except for fraud cases, where cancellation is always permitted but requires a supervisor’s approval. In TypeScript:

type Order =
  | { status: 'placed'; id: OrderId }
  | { status: 'shipped'; id: OrderId; trackingNumber: string }
  | { status: 'cancelled'; id: OrderId; reason: CancellationReason };

type CancellationReason =
  | { kind: 'customer-request' }
  | { kind: 'fraud'; approvedBy: SupervisorId };

function cancel(order: Order, reason: CancellationReason): Order { /* ... */ }

The types do work that the prose does not. They make the fraud-approval coupling structural; you cannot construct a fraud cancellation without a supervisor id. They tell you which states can transition to which. They give every downstream caller a way to ask the compiler whether a change is safe. Rewriting that as a paragraph and asking an agent to regenerate the implementation each time loses every one of those properties. The model degrades into vibes about the model.

This is the point Hillel Wayne keeps making in his work on formal specification: precision in a specification is not bureaucratic overhead, it is the thing that lets you reason about whether a change broke an invariant. TLA+, Alloy, dependent types, refinement types; these all exist because natural language is too ambiguous to pin a real system down.

Agents are great at the first purpose and terrible at the second

LLMs are genuinely good at translating intent into syntactically valid code. The SWE-bench numbers tell that story; Anthropic’s Claude 4.5 Sonnet system card reports SWE-bench Verified scores in the high 70s, and GitHub’s own Copilot Workspace research shows agents resolving real issues end-to-end. That covers the instruction-to-machine purpose well.

The model-of-the-domain purpose is where they struggle. Anyone who has watched an agent refactor a codebase has seen it: the agent will happily rename a concept halfway through, conflate two types that the author kept separate for a reason, or invent an abstraction that papers over a distinction the business actually cares about. The agent is fluent in the syntax of your language and approximately fluent in the idioms of your stack, but it does not know your domain, and it has no incentive to keep the model coherent across changes.

Simon Willison’s running notes on coding with LLMs keep returning to the same observation: the human’s job shifts from typing to reviewing, and reviewing requires a model in your own head that you can compare against the generated code. If you no longer have that model, you cannot catch the agent’s drift.

What this means for source code as an artifact

I do not think source code disappears, but I think which parts of it matter is going to shift. The boilerplate, the glue, the third CRUD endpoint, the test scaffolding; agents will write more and more of that, and it will matter less whether a human ever reads it carefully. The parts that encode the domain model, the type definitions, the state machines, the invariants, the names; those become more valuable, not less. They are the interface humans use to think about the system, and they are the contract the agent has to respect.

You can already see teams optimizing for this. The Effect ecosystem in TypeScript is a bet that pushing more of the domain into the type system pays off in maintainability. Rust’s whole appeal is that the type system carries enough of the model that the compiler becomes a collaborator. Ghosts of Departed Proofs and similar Haskell patterns are about encoding invariants so precisely that wrong programs do not type-check. These are all moves in the same direction: make the model legible to the machine so the model survives contact with whoever, or whatever, edits the code next.

The pessimistic version of agent-driven development is the one where teams stop maintaining the model because the agent can regenerate code on demand. That world is the big ball of mud at planetary scale, with no one able to reason about why the system does what it does. The optimistic version is the one where humans spend more time on the model layer, because that is where their judgment is irreplaceable, and agents do the translation to executable form.

A small prediction

My guess is that we end up with a clearer split between two kinds of code. There will be a thin, carefully maintained layer of types, schemas, state machines, and protocol definitions that humans own. Around it will be a much larger layer of implementation that agents generate, regenerate, and discard. The artifact you commit will increasingly look like a specification with the implementation as a derived target, which is close to what literate programming was reaching for in the first place.

If that is right, the future of source code is not less of it but a different shape. The model survives because someone has to hold it. The instructions get cheaper because someone, or something, else can write them. Joshi’s piece is worth reading because it puts the vocabulary on this distinction; the part I would add is that the distinction is already showing up in which languages and which patterns teams are choosing today.