The Syntax Was Never the Bottleneck

Simon Willison’s piece on the end of programming is worth reading alongside the debate it joins. Willison has spent years as one of the more careful observers of what LLM tools actually do for working programmers, as opposed to what the marketing claims, and his framing reflects that care. The title borrows a certain maximalist tone, but the argument underneath is more surgical than the headline suggests.

The instinctive response from working programmers is to reach for the 4GL analogy. In the 1980s, fourth-generation languages, FOCUS, Natural, and NOMAD among them, were going to eliminate programmers. Business analysts would write their own reports. DBAs would become obsolete. What happened instead: demand for programmers grew, the tools created their own specialist ecosystems, and the abstraction floor rose without shrinking the total surface area of software work. People deploy this story as reassurance that LLM coding tools will follow the same arc.

The analogy has real problems, and those problems are worth being specific about.

Fourth-generation languages automated the translation of structured intent into database queries and formatted reports. They were effective at that narrow, well-defined class of task. They could not reason about arbitrary program logic, read an existing codebase to locate a bug, or synthesize solutions from underspecified descriptions. Their scope was bounded in ways that made “they didn’t eliminate programmers” a safe prediction.

Current LLM tools generate working code across a broad range of domains, translate between languages, explain unfamiliar codebases, propose fixes for ambiguous error messages, and draft architecture documents. The scope is qualitatively different. The 4GL argument does not collapse, but it stops doing the work people need it to do.

What the historical record more reliably shows is that every major abstraction shift eliminates the most mechanical layer of the previous skill while leaving the judgment layer intact and more exposed. When C replaced assembly for most application development, assembly programmers did not vanish. Systems software, embedded work, and performance-critical paths still needed them. The population of programmers doing nothing more than manually translating structured logic into machine instructions mostly migrated up the stack. The work that remained in the assembly layer required understanding what the hardware was doing, not just how to instruct it.

When Python and higher-level languages became dominant for web and data work, C programmers did not disappear either. What changed was who needed to understand the full stack. The C layer was still present; it just became less frequently the appropriate tool. The programmers doing interesting work in C were doing it because the problem required it, not because there was no alternative.

LLM coding tools are doing something structurally similar to the syntax layer of programming. For a large class of tasks, translating reasonably clear intent into syntactically correct, functionally plausible code is something a model can handle. Building a Discord bot that parses slash commands, hits an API, formats a response, and handles rate limiting is something I can prototype in a single conversation and get running code from. The mechanical translation step is largely automated.

The question worth attending to is what “functionally plausible” means in practice.

@bot.slash_command(name="lookup", description="Look up a user profile")
async def lookup(ctx, username: str):
    # A model produces this. It compiles. It runs.
    # It does not know your rate limit budget, your auth model,
    # your error handling conventions, the unicode edge case
    # your team found six months ago, or the security boundary
    # that makes this endpoint safe or not safe to expose.
    result = await api_client.get_user(username)
    await ctx.respond(format_profile(result))

The model does not know your system. It does not know the implicit contracts between services, the edge cases discovered in production, the performance constraints that matter at your actual scale, or the security requirements your deployment environment imposes. It generates code that is plausible given the description. Evaluating whether that code is correct, appropriate, and safe for a specific context requires understanding the system, and that understanding is what programming has always been in its most valuable form, with syntax always downstream of it.

The skills that become more central as a result are not traditional programming skills in the sense of “knows how to write idiomatic Rust” or “can implement a binary search tree from memory.” They are closer to: can you read code and determine whether it does what it claims? Can you specify intent precisely enough that a model generates something useful? Can you evaluate the security implications of a function you did not write? Can you debug a system where you wrote a small fraction of the code?

This is closer to code review as a primary skill than code authorship; closer to architecture than implementation; closer to testing and verification than writing. Andrej Karpathy’s framing of vibe coding from early last year captured something real about this new workflow mode, but it also pointed at a risk: working with generated code without deeply understanding it is a different cognitive mode than constructing code yourself, and it may not build the same underlying model of the system.

That is the concern sitting at the center of Willison’s piece, and I think it is the one worth taking seriously. The question of whether there will be software engineers in ten years is probably the wrong framing. Software will be built. Systems will need to be understood, evaluated, secured, and maintained. The judgment layer does not disappear.

The more uncomfortable question is about the apprenticeship pipeline. The traditional path to deep programming expertise runs through writing a lot of code. You write enough that you develop an intuitive model of how systems behave. You debug enough failures that you recognize patterns. You read enough source code that you understand what good and bad look like at a structural level. That path was never efficient, but it worked.

If fewer people spend time writing code from scratch because models handle that step, fewer people are building the low-level mental models that make expert judgment possible. Code review is a skill that presupposes having written code. Architecture decisions improve when the architect has debugged the thing they are designing. The judgment layer depends on the syntax layer to develop, even if it does not depend on it to operate.

A generation of developers who primarily work by specifying and evaluating model output will be operating at a genuinely different cognitive level than developers who spent years in the syntax layer. Whether that produces equivalent depth of understanding is an open question, and nobody has good evidence either way yet.

My working position, based on building with these tools over the past couple of years: the shift is real, the productivity gains are real for certain classes of work, and the judgment layer is not going anywhere. The open problem is whether the pipeline that produces people with good judgment will still function when the path through the syntax layer is no longer the default route. That is a harder problem than the headline question, and it does not have a clean historical analogy.

The 4GL story ended well partly because the jobs that automated away were not the jobs that built the skills needed for the next level of work. The question now is whether that still holds, and the honest answer is that we do not know yet.