· 6 min read ·

Writing Software With LLMs Has a New Bottleneck, and It's Not the Code

Source: hackernews

Stavros Korokithakis’s write-up on how he uses LLMs landed on Hacker News with over 500 points and generated the kind of comments thread that only happens when something resonates with people doing the same thing every day. The piece is worth reading on its own terms, but what it represents is more interesting than any single workflow detail: a growing body of practitioners converging on similar patterns independently, which usually means those patterns reflect something real about the problem structure.

The most important thing Stavros describes, and the thing that keeps appearing in every honest account of LLM-assisted development, is that the bottleneck has moved. It used to be typing the code. Now it’s specifying your intent precisely enough that the model can execute it without producing something technically correct but semantically wrong.

The Specification Problem

This shift sounds minor until you hit it in practice. Writing a spec for an LLM is different from writing one for a human colleague. A human colleague can ask clarifying questions, infer intent from context, and push back when something seems off. An LLM will confidently implement exactly what you described, including the parts you forgot to describe.

I’ve run into this building Discord bots. The model will produce a perfectly functional message handler that does what I asked, but doesn’t handle rate limits because I didn’t mention rate limits. It’s not that the model doesn’t know about Discord rate limits. It’s that it treated my spec as complete, and a complete spec with no mention of rate limits means rate limits are not a concern.

This is why the practitioners who get the most out of LLMs tend to work spec-first. Not necessarily writing a formal document, but forcing themselves to externalize their assumptions before generating any code. The discipline of writing “this function should handle the case where the user provides an empty list” before asking for the implementation is the same discipline as writing the test first in TDD. Both force you to think about the contract before you think about the implementation.

Test-driven development never became universal partly because writing the test first felt unnatural when you weren’t sure what the implementation would look like. LLM-assisted development has an interesting inversion: you’re often not sure what the implementation will look like either, but the model is generating it so fast that the cost of a wrong implementation is mostly the cost of noticing it’s wrong and correcting your spec.

Context Is the Resource You’re Actually Managing

The other thing Stavros and most experienced LLM users converge on is context management. The context window is not infinite, and what you put in it shapes what you get out. This is less obvious than it sounds.

The naive approach is to open a chat session, paste in your codebase, and ask for changes. This works for small projects. As projects grow, you’re making tradeoffs: include more context and risk hitting the window limit or degrading model attention on the parts that matter, or include less context and risk the model making changes that conflict with code it hasn’t seen.

The better approach, which I’ve settled on for my own work, is treating each model interaction as a scoped operation with explicitly defined inputs and outputs. Rather than “here is my whole bot, please add a command that does X”, it becomes: here is the interface this new command needs to implement, here are the data structures it will touch, here is how similar commands are structured, here is the spec for this command. The model then has a complete picture of the local problem without needing the global codebase.

This is structurally similar to how good software is written anyway. Functions with clear interfaces, limited external dependencies, and well-defined behavior are easier to reason about, easier to test, and easier to hand off to an LLM. Projects that are already well-structured are much easier to extend with LLM assistance. The models aren’t making bad code harder to work with; they’re just making the cost of bad structure more immediately visible.

The Verification Loop

The third convergent pattern is what happens after generation. The model produces code. You run it. Something doesn’t work, or it works but not the way you wanted. You correct the spec and generate again. This loop is fast enough that people often run it three or four times to home in on the right implementation.

What makes this work well or poorly is the quality of your feedback signal. If your test is “does it seem to work when I poke at it manually”, the loop is slow and unreliable. If your test is an automated suite that runs in two seconds, the loop is fast and you can let the model attempt fixes with high confidence that you’ll know immediately whether they worked.

This is the argument for writing tests alongside or before using LLMs to generate implementations. The tests aren’t just validation; they’re communication. A failing test is a more precise spec than a paragraph of prose, because it’s executable and unambiguous.

For my Discord bot work, I’ve started writing integration tests that spin up a real bot instance against a test server before asking the model to implement new features. The test failures give the model concrete feedback rather than my prose description of what went wrong. The difference in output quality is significant.

What Still Requires Human Judgment

The things LLMs handle poorly cluster around architectural decisions, tradeoffs that require understanding the full system, and cases where the right answer is “don’t do this at all”.

A model asked to implement a caching layer will implement a caching layer. It won’t tell you that the real bottleneck is a missing index on a database query and the caching layer will mask the problem rather than solve it. It won’t have opinions about whether the added complexity is worth the marginal performance gain for your specific use case. These require knowing things about the system that aren’t in the context window and can’t be.

Similarly, models are good at implementing patterns they’ve seen frequently and poor at reasoning about novel constraints. If you’re building something with unusual requirements, like a bot that needs to maintain strict ordering guarantees across distributed message handling, the model will produce something that looks reasonable but may have subtle correctness issues that only surface under specific load patterns. The model doesn’t know what it doesn’t know about your specific constraints.

This is where the experienced developers diverge from the inexperienced ones in LLM-assisted workflows. It’s not about prompting skill. It’s about knowing when the model’s confidence is warranted and when it isn’t, which requires understanding the domain well enough to evaluate the output.

The Craft Is Still There

Something Stavros’s post captures that I think gets lost in the broader discourse is that working effectively with LLMs is a craft. It’s a different craft from the one we had before, but it’s not easier or less demanding. The people who say LLMs have replaced programming skill are usually thinking about the wrong skill.

The skill of translating intent into precise specifications, managing what context the model needs to do useful work, knowing when to accept generated code and when to push back, and structuring systems so that LLM-assisted modification is tractable: these are real skills that take time to develop. They compound in the same way programming skills compound. The developer who has been working this way for two years will outperform the developer who picked it up last month, not because they know better prompts, but because they’ve internalized the failure modes and built habits that avoid them.

What’s changed is the mix. Less time on the mechanics of implementation, more on design, specification, and verification. Whether that’s an improvement depends on what you enjoyed about programming in the first place. For me, the interesting parts were always the design problems. Having the implementation step accelerated mostly means I can get to more of them.

Was this interesting?