Code as a Thinking Tool: Why Joshi's Dual-Purpose Theory Matters in the LLM Era

Unmesh Joshi published a piece on Martin Fowler’s site asking a question that sounds philosophical but turns out to be very practical: if agents write our code, will there even be source code in the future? His answer hinges on a claim worth pulling apart. Code, he says, is two things at once. It is instructions to a machine, and it is a conceptual model of the problem domain. Those two purposes are intertwined but distinct, and the distinction is what determines whether LLM-generated software is sustainable or a slow-motion disaster.

I want to take that frame and push on it. The dual-purpose view is not new, but it has been quietly load-bearing in a lot of programming language design, and the current wave of agentic coding makes the stakes of getting it wrong much higher.

The two-purpose view has deep roots

The instruction-versus-model split shows up in different vocabulary across the field. Harold Abelson and Gerald Sussman opened SICP with the line that has become a cliche precisely because it captures something true: “Programs must be written for people to read, and only incidentally for machines to execute.” Fred Brooks made a stronger version of the claim in No Silver Bullet, distinguishing essential complexity, which lives in the problem domain, from accidental complexity, which lives in our tools. Eric Evans built Domain-Driven Design around the idea that the code should literally speak the language of the domain expert, with a “ubiquitous language” shared by the model and the team.

What Joshi is doing is reframing the same observation for an audience that is about to hand the instruction-writing part to machines. If code is only instructions, then yes, an LLM that emits working bytes is a complete replacement for a programmer. If code is also a model that humans use to think with, then handing the keyboard to an agent without preserving the model is a quiet form of bankruptcy.

Programming languages as thinking tools

The “thinking tool” claim is worth taking literally. Kenneth Iverson made it explicit in his 1979 Turing Award lecture, Notation as a Tool of Thought, arguing that APL’s notation was not just compact syntax but a medium for reasoning about array problems that other notations made awkward. The Sapir-Whorf-flavored version of this idea, that the language you use shapes what you can think, is contested in linguistics but uncontroversial in programming. Anyone who has tried to express a recursive tree traversal in COBOL, or a stateful UI in pure Haskell without effects, knows the syntax changes what feels natural.

Concrete examples are easy to find. Rust’s borrow checker is not just an instruction-level mechanism; it forces the programmer to think about ownership and lifetimes in a way C does not. The type-driven workflow in languages with rich type systems, where you write the signature first and let the compiler tell you what’s missing, is documented at length in Edwin Brady’s Type-Driven Development with Idris. Erlang’s actor model shapes how its users decompose concurrent systems; the language and the design approach are inseparable. Even Go’s deliberate refusal to add features is a thinking constraint: the language pushes you toward small interfaces and explicit error handling because it gives you nothing else.

A language that is only good at producing machine instructions, with no expressive purchase on the domain, leaves the model in the programmer’s head. That works for one developer on a weekend project. It does not work for a team maintaining a system over five years, and it especially does not work when the original author was a stochastic process that has now forgotten everything.

What LLMs actually produce

The optimistic framing of agentic coding is that LLMs handle the instruction layer so humans can focus on the model. The pessimistic framing is that LLMs are very good at producing fluent instructions and quietly indifferent to whether those instructions reflect any coherent model. Both can be true at once.

Studies of LLM-generated code have started to quantify the problem. A 2024 GitClear analysis of millions of commits reported a sharp rise in code churn and copy-paste duplication coinciding with AI coding tool adoption, with the share of code reverted or rewritten within two weeks roughly doubling between 2020 and 2024. The METR study released in July 2025 found that experienced open-source developers using AI tools were on average 19 percent slower at completing tasks in codebases they knew well, despite believing they were faster. The signal in both is consistent: LLMs produce a lot of locally plausible code that does not cohere as a model.

This is exactly the failure mode Joshi’s framing predicts. The instructions run. The model rots.

Vocabulary as the load-bearing artifact

The practical implication is that the parts of programming we should be most careful about handing to agents are the naming, the boundaries, and the vocabulary. These are not implementation details; they are the model.

DDD practitioners have argued this for two decades. The bounded context, the aggregate, the value object: these are decisions about how to carve up the problem, and they are encoded in code through the names of types, modules, and functions. When an LLM generates a 300-line file full of process, handle, manager, and helper, it is producing instructions without a model. The compiler is happy. The next person to read it has to reconstruct what the system thinks it is doing from scratch.

The interesting question is whether agents can be made to participate in the vocabulary work rather than around it. Some recent tooling efforts point this direction. Cursor’s project rules, Anthropic’s CLAUDE.md convention, and the broader practice of seeding agents with architecture decision records (ADRs) are all attempts to give the model layer somewhere durable to live, so the agent generates instructions in service of an existing vocabulary rather than reinventing one per session.

Where this leaves source code

Joshi’s question, will there be source code in the future, is the wrong frame to answer literally. Of course there will be source code; the question is what role humans play in writing it. The dual-purpose theory suggests a reasonable division: agents write more of the instruction-shaped parts, humans own more of the model-shaped parts, and the artifacts we produce as humans shift toward the vocabulary, the boundaries, the invariants, and the tests that pin down what the system is supposed to be.

That shift is already visible in how thoughtful teams use these tools. The valuable human output is the prompt, the spec, the type signature, the failing test, the ADR. The generated implementation is cheap and replaceable. This inverts the historical economics of software, where the implementation was the expensive artifact and the spec was usually missing or stale.

The risk, and Joshi gestures at this without quite naming it, is that teams will optimize for the visible output of agents (running code, closed tickets, green CI) and let the invisible output (the conceptual model, the shared vocabulary) atrophy. A codebase in that state still works, until it doesn’t, and by then the people who could have rebuilt the model have moved on. The defense is to treat the model as a first-class deliverable, not a side effect of typing.

Code is instructions. Code is also a way of thinking about a problem in a form precise enough that a machine will accept it. The second part is the part worth protecting.