What Skeptical AI Agent Coding Looks Like When Someone Documents It Carefully

Simon Willison has been one of the more measured voices in the AI space, which is part of what makes his deep-dive into AI agent coding worth reading. The piece was published in late February and documents his experience from the inside, with enough specificity to be useful rather than just another hot take in either direction.

The framing matters here. Willison is not someone who dismisses AI tooling, but he is someone who has consistently pushed back on overclaiming. So when he spends a long post documenting what it actually looked like to let an agent run on a real task, the details carry weight. He is not trying to convert anyone. He is reporting what he saw.

What stands out to me, reading this as someone who spends a fair amount of time building things with agent-adjacent workflows, is how much the value seems to depend on documentation and verification overhead. Agents do work. They also confidently produce plausible-looking results that are subtly wrong in ways that take longer to catch than they would have taken to write correctly. The net time savings depends on the task, your familiarity with the domain, and how quickly you can review output.

I have run into this building Discord bots. Letting a model scaffold a feature is fast. Trusting that scaffold without reading it carefully is where the time debt accumulates. The cases where I come out ahead are the ones where I already understand the code well enough to skim the output and spot issues. The cases where I do not are the ones where I would have been better off writing it myself.

Willison’s approach, documenting the process in excessive detail, is itself a useful practice regardless of your position on AI tooling. Understanding when you are saving time versus deferring review costs is something most developers have not systematized. The post is a case study in that kind of accounting.

The skeptic framing in the title is honest, but the piece is not a takedown. It reads more like field notes from someone willing to update their priors based on what they actually observed. That is a rarer thing than it should be in writing about this space.

If you have been putting off forming an opinion on AI agent coding because the discourse is too polarized to be useful, Willison’s post is a good place to start. It is specific, it is honest about limitations, and it does not try to sell you anything.