The announcement cycle around AI and programming has developed its own rhythm. A new model drops, coding benchmarks improve, and someone publishes a piece arguing that software developers are obsolete. Simon Willison’s engagement with “Coding After Coders: The End of Computer Programming as We Know It” arrives in this tradition, but the question deserves more than reflexive dismissal or credulous agreement.
I’ve spent the past few years building Discord bots, writing systems code, and watching what AI tools actually do when applied to a production codebase. The “end of programming” framing isn’t wrong so much as it’s asking the wrong question.
The Death That Keeps Not Happening
This isn’t the first time programming’s end has been announced with confidence.
In the 1980s, fourth-generation languages (4GLs) promised that business users could specify what they wanted in near-English terms without needing programmers. COBOL was partly designed to be readable by managers. The CASE (Computer-Aided Software Engineering) tools of that era generated code from diagrams and had serious commercial backing from IBM and others. Visual Basic in the early 90s genuinely lowered the barrier, and no-code platforms like Bubble and Webflow in the 2010s enable non-programmers to ship working software today.
None of these killed programming. Each one expanded who could build software and what got built, which increased demand for professional developers rather than reduced it. The Bureau of Labor Statistics tracked consistent growth in software development jobs through each of these waves.
The honest question isn’t whether AI tools are different from these predecessors. They clearly are. The question is whether the difference is one of degree or kind, and whether this time the outcome is different.
What AI Actually Changes
The genuine difference with current AI coding tools is that they handle ambiguity in a way earlier automation couldn’t. Earlier tools required precise specifications in a formal language, which is roughly as hard as writing the code itself. GitHub Copilot, Cursor, Claude Code, and similar tools can take something like “add rate limiting to this endpoint, with a sliding window per user” and produce something that compiles and often works correctly on the first attempt.
GitHub’s own research measured a 55% reduction in task completion time for specific coding exercises. Those numbers hold up in practice for certain classes of work. Integrations that would have taken a couple of days now take an afternoon, and the code quality is often fine.
Andrej Karpathy coined the term “vibe coding” in early 2025 to describe a mode where you mostly describe what you want and iterate on AI output without carefully reading the generated code. It’s a real practice and works for certain classes of problem: throwaway scripts, prototypes, one-off data transformations. Where it stops working is when something goes wrong and you have no mental model of what the code is doing.
The Specification Problem Relocates, Not Disappears
The hard parts of software development were never about typing code. They were about knowing what to build, how to structure it, what the edge cases are, and how it will fail. AI doesn’t solve these problems. It relocates them.
When you write a function yourself, the act of writing forces you to confront exactly what the function should do. You have to specify the inputs, the outputs, the error cases. When you prompt an AI for the same function, you still have to specify those things, just in natural language instead of a type signature. The specification doesn’t get easier; it becomes less formal, which makes it harder to reason about precisely.
Consider a concrete example. You want a function that deduplicates incoming webhook events:
# Prompt: "deduplicate webhook events by event ID, expire entries after 1 hour"
import time
from threading import Lock
class EventDeduplicator:
def __init__(self, ttl_seconds=3600):
self._seen = {}
self._lock = Lock()
self._ttl = ttl_seconds
def is_duplicate(self, event_id: str) -> bool:
now = time.time()
with self._lock:
if event_id in self._seen:
if now - self._seen[event_id] < self._ttl:
return True
self._seen[event_id] = now
return False
This looks correct. It has a lock; it handles expiry. But it has a memory leak because it never evicts expired entries, only checks them. Under a high-volume event stream, self._seen grows unboundedly. The AI didn’t lie: it produced something that matches your specification. Your specification was incomplete. The specification problem is harder when you can’t inspect the code mechanically and need to audit it after the fact.
Where the Systems Layer Gets Stubborn
Application-layer code is where AI performs best: CRUD operations, REST endpoint handlers, string manipulation, UI components, configuration parsers, test boilerplate. These have well-defined patterns that appear frequently in training data, and the failure modes tend to be shallow.
Systems programming is a different environment. Writing a correct async executor, a lock-free data structure, a network protocol implementation, or anything that touches shared mutable state under concurrency requires holding a lot of invariants in mind simultaneously. The correctness criteria are stricter because a wrong answer doesn’t produce a bad UI; it corrupts state, causes races, or deadlocks.
The Rust Nomicon exists precisely because the failure modes in unsafe code are non-obvious and the compiler can’t catch all of them. Generating unsafe Rust with an LLM and shipping it without careful review is genuinely risky. LLMs can sketch these structures, but the sketch needs review by someone who understands why the invariants matter. This isn’t a fundamental limitation that future models will necessarily fix; it reflects where the difficulty actually lives in this kind of work.
The same applies to anything close to the metal in Discord bot development. The application layer, the slash command handlers, the message formatters, are straightforward to generate. The parts that deal with reconnection logic, rate limit backpressure, and concurrent state across shards require understanding the Discord Gateway protocol at a level that AI-generated code frequently gets wrong in subtle ways.
The Auditor’s Burden
If you accept that AI will handle an increasing share of first-draft code, the programmer’s role shifts toward directing and auditing rather than authoring. That’s a more plausible description of where we are now, and Willison’s piece seems to land somewhere in this territory.
But directing and auditing require understanding the thing being directed and audited. A music producer who can’t evaluate whether a take is good isn’t a producer; they’re a bystander. A developer who can’t read generated code, spot a subtle security issue, or reason about performance characteristics isn’t directing the AI. They’re hoping it doesn’t break anything.
The value of deep technical knowledge hasn’t decreased. The ability to evaluate a large volume of generated code quickly, recognize which patterns carry risk, and catch cases where an LLM is confidently wrong has become more valuable. The bottleneck shifted from writing to reviewing, and reviewing requires the same understanding that writing always did.
This is not a comforting message for people who were hoping AI would lower the floor enough to skip the hard parts. Those parts were never syntax.
What Actually Changes
What does shrink is the category of entry-level work that consisted primarily of translating a clear specification into straightforward code: scaffolding CRUD endpoints, writing integration layers that connect two APIs with documented behavior, adding boilerplate to an established pattern. That work exists now because it’s necessary, but it was never where the difficulty of software development lived.
Matt Welsh’s original 2023 ACM essay argued that the programming paradigm itself would shift, from writing explicit logic to training and prompting models. That’s a more interesting thesis than “programmers become obsolete”; it’s closer to what’s actually happening. The skill set required is evolving toward specifying systems, evaluating outputs, and understanding failure modes at a level of abstraction that’s one step removed from the code.
What doesn’t change is the need to understand what you’re building, how the underlying systems behave, and what will go wrong in production. The question “what does this code actually do” remains the central question of software development. The answer now often starts with “it’s what the AI generated, and here’s what I found when I traced through it.”
That’s a real change in how programming feels day to day. It isn’t the end of the field.