Automating the Coder, Not the Engineer

Every few years, the software industry discovers that programming is ending. In 1959, it was COBOL, which Grace Hopper and her colleagues designed explicitly so that business managers could write their own programs without needing specialist translators. In the 1980s, it was fourth-generation languages and CASE tools, which were supposed to let non-programmers define applications visually. In the 2010s, it was low-code platforms, which were finally, definitively going to close the gap between intent and implementation.

Each wave was real, meaningful, and ultimately partial. The floor rose. Accessible programming became more accessible. Boilerplate got cheaper. But the ceiling stayed put, because every previous abstraction worked by encoding known patterns. COBOL encoded business data processing idioms. 4GLs encoded database application structure. Low-code platforms encoded common web application architectures. Anything outside those patterns still required someone who could reason from first principles about how software behaves.

Simon Willison’s recent piece on coding after coders describes a shift that shares surface features with these previous waves but has a different mechanism underneath. The difference matters.

Why This Wave Has a Different Mechanism

Previous automation waves worked by lifting specific, well-characterized patterns above the level of manual implementation. They created better tools for producing known things. What large language models do differently is operate on unfamiliar problem structures with some degree of competence, not by recognizing a known pattern, but by generalizing across a vast space of similar patterns to produce something plausible for a new case.

This means the floor has risen sharply again. Standard web applications, CRUD APIs, data processing pipelines, most tooling: these are now accessible at a much lower cost of implementation. The gap between describing what you want and having running code that does it has closed substantially for everything that resembles something in the training distribution.

The ceiling is still present, and it has moved.

What the Ceiling Looks Like Now

The ceiling is where a problem’s constraints are not well-represented in any existing corpus, where the interaction between components is subtle enough that a plausible-looking implementation can be silently wrong, or where the correctness criteria cannot be expressed as a test that the AI can write alongside the implementation.

Concrete examples help. The ongoing work to give CPython a practical JIT compiler involves getting register allocation right across a stencil-based copy-and-patch compilation pipeline. The correctness requirements live in the intersection of how the CPU’s register file works, how the Python bytecode interpreter manages its execution state, and how the stencil approach handles code with variable register pressure. PEP 744 describes the architecture, but the bugs requiring fixes across Python 3.13 and 3.14 emerged from interactions between constraints that are none of them obscure in isolation. An AI can generate plausible JIT code. Whether that code is correct requires reasoning about the whole system in a way that does not emerge from reviewing the generated output.

Similarly, the eBPF verifier’s state pruning algorithm involves a contract between what the verifier tracks and what the runtime can guarantee, and bugs in that contract have surfaced periodically as the verifier’s capabilities expanded. The formal properties of spinlock handling within pruned states are subtle, and the consequences of getting them wrong range from memory safety violations to privilege escalation. Reviewing generated BPF programs for verifier-bypass conditions requires building a mental model of the verifier’s own reasoning process, which is not derivable from reading the generated output alone.

These are ceiling cases. AI contribution is meaningful for scaffolding and pattern work; the correctness reasoning is not delegatable.

The Coder-Engineer Distinction

Willison’s framing points at something worth naming precisely: the coder and the engineer are different roles that have historically been bundled together, and AI is unbundling them.

The coder’s job is translation: take a specification, however implicit, and produce implementation. This is the part being automated. It requires implementation fluency, familiarity with APIs and idioms, and the ability to produce syntactically and semantically correct code quickly. These are real skills, but they are the skills most directly in scope for current LLM capabilities.

The engineer’s job is comprehension: understand a system well enough to specify what it should do, evaluate whether an implementation actually does that, and reason about failure modes that are not obvious from the implementation itself. This is the part not being automated, and the part that becomes more critical as generated code becomes the norm. If the code is written by AI, the only way to know whether it is correct is to understand it more deeply than the generator does.

This maps onto a real economic gradient. The market for translating specifications into code is being compressed. The market for understanding systems deeply enough to specify and evaluate them is expanding. These do not track each other automatically, and they are not built by the same learning path.

The Learning Gap

The traditional path into engineering, where you learn by writing a lot of code, making mistakes, debugging those mistakes, and gradually building an accurate mental model of how software behaves, produced the comprehension skills that supervisory work now demands. Writing code was the training mechanism for understanding code.

If the inner loop, writing implementation, is increasingly handled by AI, then the training signal that built comprehension is also being reduced. The developers who are competent at evaluating AI output today became competent by writing code manually for years. The developers beginning their careers in the supervisory model are building their understanding through a different and less tested path.

The goal of programming education was never to produce coders in the narrow sense; it was to produce people who understand software. The coding was in service of that understanding. Removing the coding without replacing the learning mechanism it provided is the specific risk in this transition. Being thoughtful about how early-career development gets restructured is therefore the most consequential challenge the field faces, more consequential than which tools to adopt.

This shows up concretely in how you learn to debug. The intuition that something is wrong before you know what is wrong, the sense that a function’s behavior doesn’t match what the calling code expects, comes from having been wrong before in recoverable ways and having traced those failures back to their source. That intuition is what lets you look at AI-generated output and notice that the indexing is off by one, or that the lock ordering is inverted in the edge case. You build it by breaking things yourself, not by accepting output that happens to work.

The End of One Kind, the Beginning of Another

Willison is right that something is ending. The programmer-as-translator, whose primary value was fluency in implementation, is being automated. That role was never the whole job, but it was a large enough component that losing it represents a genuine structural change in what professional programming means.

What follows depends on whether the field builds the infrastructure to develop the comprehension skills that remain irreplaceable. Formal methods, specification practices, and deep systems work are all seeing renewed interest, and the timing is not coincidental. The abstraction ceiling finally moved, and what sits above it turns out to be harder than it looked from below.