The Abstraction Ratchet: Why Programming Transforms Rather Than Ends

Simon Willison recently engaged with a piece titled “Coding After Coders: The End of Computer Programming as We Know It”, which revives one of the more persistent arguments in recent tech discourse: that AI has made human programmers structurally obsolete, or at least has degraded the skill to something unrecognizable from what it was a decade ago.

The argument keeps surfacing because it keeps finding evidence. Matt Welsh’s influential 2023 essay in Communications of the ACM argued that future systems would be trained rather than explicitly programmed, and that the programmer’s role would narrow dramatically. Since then, the tools have only become more capable. GitHub Copilot’s own productivity research found roughly 55% faster task completion on well-defined tasks. Cognition’s Devin reached 13.86% on SWE-bench at launch in 2024, a benchmark that requires resolving real GitHub issues in real repositories. These are not trivial numbers.

But the stronger version of the argument, that human programming judgment is becoming unnecessary, runs into problems when you examine what the tools are good at and where they consistently fail.

Where the Abstraction Has Always Been Moving

Every major shift in how software gets written has generated a version of this conversation. When Fortran arrived in 1957, hand-written assembly was what “real programming” meant; John Backus himself worried the abstraction would degrade the discipline. When C replaced assembly for systems work in the 1970s, the same concerns surfaced. When garbage-collected scripting languages made memory management optional for most development work, the worry was that this would produce a generation of programmers who did not understand what was happening in the machine.

What happened in each case was not the end of programming but a shift in the definition. The work that previously required intimate knowledge of instruction sets became the work of compiler engineers. The work that required manual memory management became the domain of runtime developers and language designers. The intellectual real estate freed up by each abstraction filled with new complexity at a higher level of the stack.

AI coding tools are running this same pattern, substantially faster, with one key difference: previous abstraction jumps required learning a new formal notation, a higher-level language. These tools accept natural language and intention directly. The threshold for producing working code has dropped far enough that the gap between “person who can describe what they want” and “person who can build it” is genuinely narrowing.

What Changes in Practice

Building a Discord bot in 2023 meant spending real time on scaffolding: setting up slash command registration, understanding the gateway intents system, wiring up the event handler model correctly. The discord.js documentation is thorough, but there is still a ramp. A substantial fraction of the early work was learning the API surface before writing any application logic.

In 2025, that friction is essentially gone. You describe the bot’s behavior to a model, get working scaffolding in seconds, and spend your time on the parts that require decisions: how state should persist across restarts, what happens when a downstream API goes down, what the user-facing error messages should communicate. These problems require understanding your users and your system rather than reading API documentation, and the judgment involved is harder to replicate than scaffolding production.

Removing those frictions has changed what a single developer can maintain. But the judgment problems remain, and concentrating on them is where the work now lives.

The Vibe Coding Failure Mode

Andrej Karpathy coined “vibe coding” in early 2025 to describe the practice of accepting AI-generated code without reading it carefully, relying on the model and iterating when things break. As an approach to throwaway scripts and personal prototypes, it works reasonably well. As a description of how production systems should be built, it has a consistent failure pattern.

The issue is that code accumulates. A working program produced by vibe coding is still a program someone will have to debug during a production incident. Vibe-coded systems have recognizable architectural weaknesses: error handling that covers the happy path and leaves edge cases silent, logging that captures what was convenient rather than what is diagnostic, implicit assumptions baked into the scaffold that become painful when requirements change.

The model does not know your operational context. It does not know that your Discord bot’s upstream API occasionally returns malformed JSON, or that rate limits reset on a schedule that differs from what the documentation claims. It does not know which edge cases your users will encounter first.

# What the model generates
response = await api.fetch_data(user_id)
return response["data"]

# What production requires
try:
    response = await api.fetch_data(user_id)
    data = response.get("data")
    if data is None:
        logger.warning(
            "fetch_data returned no data field",
            user_id=user_id,
            response_keys=list(response.keys()),
        )
        return default_value
    return data
except aiohttp.ClientTimeout:
    metrics.increment("api.timeout", tags={"endpoint": "fetch_data"})
    raise RetryableError("upstream timeout")

The difference between these is not syntax knowledge. It is operational awareness and understanding of failure modes. AI tools do not synthesize that awareness; they generate code consistent with patterns in their training data, which skews toward clean-path implementations.

What Does Not Compress in Systems Work

The areas of programming most resistant to AI assistance are precisely those where reasoning about an entire system matters. Distributed systems debugging, where a latency regression might originate in a network partition, a lock contention pattern, or a query planner change, requires holding a mental model of a large system under load. Designing protocols that must remain backward-compatible across multiple deployed versions requires reasoning about organizational dynamics as much as technical constraints.

Systems programming remains particularly expensive to generate correctly. Memory safety issues, use-after-free bugs, and data races are exactly the category where LLMs produce plausible-looking but subtly wrong code with confidence. The model cannot run the code under adversarial conditions; it can only pattern-match against what correct-looking code resembles. These two capabilities diverge at precisely the points that matter most for security and reliability.

This is the reasoning behind the Redox OS project’s decision to ban LLM-generated code from their kernel outright, as their contributing guidelines now state. For a safety-focused microkernel, the cost of subtle correctness failures is high enough that the verification overhead exceeds the productivity benefit. That calculus does not apply to every project, but it clarifies something important: the hardest parts of programming are the parts where the model’s tendency to produce plausible rather than correct output is most dangerous.

The Skill Set That Changes

If syntax production and scaffolding are commoditized, what remains valuable is the capacity to direct AI output toward correct and maintainable outcomes. That is not a simpler skill than writing code manually; it is a different one.

Specifying behavior precisely enough for an AI to produce correct output is harder than writing the code yourself for problems you understand well. Reviewing generated code critically, identifying where the model has made plausible but wrong assumptions, maintaining architectural coherence across a partially generated codebase: these take time to develop and matter more as the tools become more capable.

There is also a new class of failure to manage. When you write code yourself, your understanding of the system is implicit in how you wrote it. When you direct an AI, the code can look correct while encoding misunderstandings you did not catch during review. The gap between “code that passes tests” and “code that reflects accurate understanding of the problem” has always existed, but AI generation widens it in ways that are harder to detect.

The role under pressure is not “programmer” in the general sense. It is “programmer whose primary value was producing correct syntax faster than colleagues.” That role has been compressing since high-level languages arrived; AI tools have accelerated the final compression. The engineer who understands systems well enough to specify what correct behavior looks like, and to verify that generated code achieves it, is not in worse shape. The available leverage has increased.

What the Conversation Points At

What Willison’s engagement with “Coding After Coders” makes legible is that these transitions are genuinely disruptive without being terminal. Developers who built careers around fast syntax production will find that edge less competitive. The developer who built a career around understanding systems, communicating about them precisely, and making good decisions under uncertainty faces a different market. The surrounding work has been partially automated, which changes the economics of their time without eliminating the work itself.

The historical pattern is consistent enough to be predictive. Each abstraction jump produced warnings that real programming was ending and a narrowing of what counted as load-bearing expertise. Each time, the discipline migrated rather than disappeared. The work worth doing moved higher up the stack and became harder to specify, not easier.

Programming does not end in this process. It concentrates toward the parts that have always been the hard parts and away from the parts that were always going to be automated eventually. The abstraction ratchet clicks forward, and what remains is the judgment that cannot be compressed.