· 5 min read ·

The Backlog You Never Clear: What Eight Years of Deferred Projects Says About AI-Assisted Development

Source: simonwillison

The Backlog You Never Clear

Some ideas sit in your notes for years. Not because they are too hard, not because you lack the skill, but because the math never works out. You know exactly what you want to build, you could spec it in an afternoon, but the gap between that spec and a working implementation costs more time than you can justify for something that serves fifty people, or just yourself, or a niche you care about but nobody else does.

Simon Willison, co-creator of Django, creator of Datasette, and one of the more careful writers on practical AI use, published a post this week describing roughly three months of building with AI assistance that cleared a backlog he had been accumulating for eight years. The title alone carries a lot of weight: eight years of wanting, three months of building. That ratio is the thing worth sitting with.

Willison is not a beginner. He has been writing Python professionally for over two decades, maintains a sprawling ecosystem of open source tools, and publishes Today I Learned posts at a pace that is frankly unsettling. The bottleneck for him was never ability. It was the sheer overhead cost of context-switching into a new project, holding its architecture in memory, writing the scaffolding, running into the third or fourth unfamiliar library, and deciding whether the output was worth the hours invested. For every idea he shipped, there were probably five that stalled at exactly that friction point.

AI tools compressed that friction. Not eliminated, compressed.

Why Experience Matters Here

There is a version of the “AI helps you code faster” story that focuses on beginners or non-programmers. That version gets most of the coverage. But Willison’s experience points at something different: experienced developers with deep, well-understood backlogs may be the cohort that benefits most, at least right now.

Here is why. When you ask a coding assistant to help you build something you know well, you can evaluate its output immediately. You know when the generated code is subtly wrong. You catch the hallucinated API. You recognize the pattern that looks correct but will break under load. You are using the AI as an accelerant inside a framework of judgment you already have.

Contrast that with a beginner using the same tool. They can get code that runs, but evaluating whether it is correct, idiomatic, or maintainable requires exactly the knowledge they are still building. The tool can obscure that gap as easily as it can close it.

Willison has written about this distinction before. In his LLM CLI tool, which lets you run prompts against local and remote models from the terminal, the design philosophy is oriented toward people who already know what they want and need a fast path to get there. The tool is not trying to explain programming to you; it is trying to get out of your way.

# A typical Willison-style workflow: pipe context directly into the model
git diff HEAD~1 | llm -s "Explain what changed and flag anything suspicious"
cat schema.sql | llm "Write a Python function using sqlite-utils to insert a new row"

This pattern works because the person running it can read the output critically. The AI is handling the translation layer between intent and syntax, not doing the thinking.

The Three-Month Accounting

When someone says they built eight years of ideas in three months, the natural question is: what actually got built, and does it hold up?

Willison’s output over the past year or so gives some evidence. He shipped shot-scraper features, extensions to sqlite-utils, various Datasette plugins, and a range of smaller utilities that would have been reasonable weekend projects if weekends were free and infinite. The pattern across these is consistent: small tools with tight scope, clear interfaces, and well-understood problem domains. None of them required inventing a new algorithm. All of them required knowing enough about the problem to write a good spec.

That last point matters. The AI-assisted workflow Willison describes is essentially spec-driven. You write a clear description of what you want, you iterate on the output, you apply your own judgment to the result. The three months were not three months of prompting; they were three months of making decisions, with the implementation layer moving faster than it used to.

This mirrors what I have found building Discord bots. The parts that used to take the longest were the parts I could describe precisely but had to type out carefully: argument parsing, permission checks, the boilerplate around slash command registration, the retry logic around rate limits. None of that is interesting work. All of it takes time. With a coding assistant handling the scaffolding, the time compresses and the interesting work expands to fill the session.

The Maintenance Horizon

The reasonable skeptical question about AI-assisted development is about longevity. Code you generated quickly, with help, that you reviewed but did not write line by line: will you be able to maintain it in two years?

Willison has been transparent about this concern in his writing. His answer, roughly, is that the test is whether you understand the code well enough to explain it and modify it, not whether you wrote every character. A lot of software fails the maintenance test because it was written quickly and poorly, not because it was generated. The quality bar is the same; the path to getting there is different.

There is also a counterargument that cuts the other way. Projects that would have stayed unbuilt are now built. An unmaintained tool that exists is, in many cases, more useful than a perfectly maintained tool that was never written. Willison’s smaller utilities fall into this category: they solve real problems for a real (small) audience, and the maintenance surface is manageable because the scope is tight.

The sqlite-utils philosophy applies here. Small, composable, well-documented, opinionated about scope. Tools built in that mold tend to age better regardless of how they were written, because the interface is stable and the implementation is not doing too much.

What Eight Years Represents

There is something worth acknowledging about the eight-year figure that goes beyond productivity metrics. A backlog of ideas that spans eight years is a record of judgment: what was worth wanting, what was worth keeping on the list, what survived long enough to still seem valuable when the tools finally caught up.

Not all of those ideas deserved to be built. Some of them were right-direction-wrong-timing ideas that aged out. Some were superseded by someone else’s project. But the ones that survived eight years of periodic reconsideration are probably the ones with genuine staying power. Building them with AI assistance does not diminish that judgment; it acts on it.

For anyone maintaining a similar list, the practical takeaway from Willison’s experience is not “use Claude and ship faster” in the abstract. It is more specific than that. The tools work best when you already understand the problem deeply, when you can evaluate the output critically, and when the scope is tight enough that you can hold the whole thing in your head. Those constraints are not limitations of the current tools; they are descriptions of the work that was always worth doing.

Was this interesting?