· 6 min read ·

Code as a Thinking Tool: Why Joshi's Dual View Matters in the LLM Era

Source: martinfowler

Unmesh Joshi’s recent essay on martinfowler.com asks a question that sounds almost philosophical until you sit with it: if agents write the code, will there even be source code in the future? His answer hinges on a distinction worth taking seriously. Code is two things welded together. It is instructions to a machine, and it is a conceptual model of a problem domain expressed in a vocabulary humans negotiate with each other.

The machine-instruction view is the one most non-programmers assume is the whole story. You type symbols, a compiler or interpreter turns them into something the CPU executes, the program runs. By that definition, if an LLM emits bytecode or WebAssembly directly from an English prompt, source code becomes a vestigial intermediate. The conceptual-model view says something different. The source is where domain concepts get names, boundaries, invariants, and relationships. Strip that away and you do not have a leaner system. You have a system nobody can reason about.

The vocabulary problem is older than LLMs

Eric Evans made this point in Domain-Driven Design twenty-two years ago with the term Ubiquitous Language. The idea was that the words used in conversations with domain experts should be the same words used in the code. If a trader says “limit order,” there should be a LimitOrder class, and it should behave the way a trader expects a limit order to behave. The translation tax between domain speech and code is where bugs hide.

Naur made an even stronger claim in his 1985 paper Programming as Theory Building. The program’s source code is not the program. The program is the theory in the heads of the people who wrote it. Source code is the most precise artifact that theory produces, but it is downstream of the understanding. When the team that built a system disperses, the theory dies even if the code keeps running, and the next team has to rebuild it from scratch by reading the code and guessing at intent.

Joshi’s framing connects these threads. The reason code is a thinking tool is that writing it forces you to commit to a vocabulary, and committing to a vocabulary forces you to decide what the domain actually contains. Types, function signatures, and module boundaries are not decorations on top of behavior. They are the shape of your theory.

Why this matters when an agent writes the code

If you accept that the source is the theory, then the question of whether LLMs replace source code reframes itself. The LLM is not a substitute for the theory. It is, at best, a fast typist. If you skip the theory-building step and let the model produce behavior directly from a vague prompt, you get something that runs, sometimes, until it encounters an input the prompt-writer did not imagine. There is no artifact to inspect that captures what the system is supposed to do, because the prompt is too underspecified and the generated code is too detached from any shared vocabulary.

This matches what people are seeing in practice. The METR study on AI tools and developer productivity found that experienced open-source maintainers were on average nineteen percent slower when using AI assistance on tasks in repositories they knew well, even though they predicted they would be twenty percent faster. The interpretation in the paper points at the cost of reviewing, correcting, and integrating model output, but a complementary reading is that mature codebases have a strong theory baked into them, and generated code that does not respect that theory creates friction.

GitHub’s own research on Copilot found large gains on a constrained JavaScript task, writing an HTTP server, where the theory is shallow and well-known. The interesting result is the gap. Where the conceptual model is small and standardized, generation helps a lot. Where the conceptual model is large and project-specific, the help shrinks or inverts.

Programming languages as thinking tools

The languages we use shape what theories we can express comfortably. Iverson’s 1979 Turing lecture, Notation as a Tool of Thought, made this case for APL: a sufficiently expressive notation lets you see structure that would be invisible in a verbose form. Rust’s borrow checker is a more recent example. The type system encodes a theory of ownership that you would otherwise have to maintain in your head, and the compiler refuses to build programs that violate it. The notation is doing some of the thinking.

Haskell does this for effects, with the IO monad and friends forcing you to make side effects visible in types. TLA+ does this for concurrent and distributed systems, with Lamport’s specifications treating the specification as the artifact and the code as a refinement of it. In all of these, the language is not just a way to talk to the machine. It is a constraint on the theories you can express, and the constraint is the feature.

When an LLM emits code, it tends to default to the lowest-common-denominator idiom of the target language. Generated Rust often leans on clone() and Arc<Mutex<_>> to sidestep the borrow checker’s harder questions. Generated Haskell often pulls everything into IO. Generated TypeScript drifts toward any and structural escape hatches. The model is solving the machine-instruction problem and leaving the conceptual-model problem to you. This is not a flaw in any single model. It is what happens when the prompt does not encode the theory.

A practical consequence for how we work with agents

The takeaway from Joshi’s essay, pushed a little further, is that the high-leverage human activity in an LLM-assisted workflow is not writing characters into a file. It is curating the vocabulary, the types, the module boundaries, the invariants. These are the things the agent cannot infer from a prompt because they encode decisions about what the domain is.

This lines up with how the more disciplined adopters are using these tools. Simon Willison’s running notes on coding with LLMs emphasize specifying types, examples, and constraints in detail before generation. The Aider project builds around a repo map that gives the model the existing vocabulary so generated code fits the theory already in place. Cursor’s rules files and Claude Code’s CLAUDE.md serve a similar purpose, telling the agent which words mean what in this codebase.

There is a tempting shortcut where you skip the theory and ask the agent for behavior. It works for throwaway scripts and shallow domains. It compounds badly in anything that has to be maintained, because each generation widens the gap between what the code does and what anyone on the team believes it does. Naur’s theory dies on contact with code nobody built a theory for.

Will there be source code in the future

Probably yes, and for the reason Joshi points at. Source code is the most precise notation we have invented for committing to a conceptual model, and the act of writing it is partly how the model gets built in the first place. The form may change. We may move toward higher-level specifications, more constraints expressed as types or proofs, less ceremony around the parts that are mechanical translation. The role of humans may shift toward writing the parts of the theory that are hardest to infer, and letting agents handle the rest. But the theory does not generate itself from a prompt, and the place we write it down is still going to look a lot like source code.

The failure mode to watch for is treating generated code as the artifact. It is not. It is the output of a process whose real artifact is the shared understanding in the team and the vocabulary that understanding has committed to. Lose that, and what the model produces is no longer code in the sense Joshi means. It is just instructions, and instructions without a theory are not maintainable by anyone, human or otherwise.

Was this interesting?