Thinking Before You Prompt: The Real Work in LLM-Assisted Development

There is a genre of LLM workflow post that catalogues prompt templates, lists the right models for the right tasks, and generally frames the skill as one of interface ergonomics. Stavros Korokithakis’s recent piece sits closer to the useful end of this spectrum: he’s actually describing how he thinks, not just what he types. That distinction matters more than it first appears.

I’ve been building Discord bots and doing occasional systems work with heavy LLM assistance for the past year. The thing that took longest to internalize was that the workflow changes aren’t really about the LLM. They’re about what you have to do before you ever open the chat window.

The Specification You Were Going to Write Anyway

When you write a function by hand, the act of writing forces a kind of specification. You define the parameters, decide what the return type should be, work out the edge cases as you go. The thinking and the coding are interleaved, which is fine because the feedback loop is tight.

LLMs break that loop. If you give a vague prompt, you get a vague implementation that requires substantial editing, and now you’re reviewing someone else’s code with incomplete context instead of writing code you already understand. The workflow works best when you’ve already done the design work before prompting.

In practice, this means writing a short spec before asking for code. Not a formal document, just a few sentences that answer: what does this take, what does it return, what are the constraints, what can go wrong? Once you have that written down, the prompt almost writes itself, and the output is much easier to evaluate because you have a clear target to check against.

# Before prompting:
# - Input: list of webhook events with IDs and timestamps
# - Output: deduplicated list, keeping first occurrence
# - Constraint: must handle events arriving out of order
# - Edge cases: empty list, duplicate IDs with different timestamps, events older than 1 hour

def deduplicate_events(events: list[Event]) -> list[Event]:
    ...

This sounds obvious, and in some sense it is. The discipline is doing it consistently, because the temptation is to go straight to the prompt and iterate when it comes back wrong. Iteration works, but it’s slower and tends to accumulate cruft: the conversation fills up with wrong attempts, and later completions are influenced by the entire history, including the parts that were incorrect.

Context Is Not the Same as Information

The most common advice about LLM coding is to give the model more context: include the relevant files, paste in the error message, share the surrounding code. This advice is correct, but it elides an important distinction. Context the model can use and information you’ve dumped into the prompt are not the same thing.

Models attend to context unevenly. Research on long-context performance has consistently found that retrieval accuracy degrades for content in the middle of long contexts, a problem researchers call “lost in the middle.” In practice, this means that pasting in 2,000 lines of surrounding code when only 30 are relevant is not neutral. It’s actively harmful, because the relevant parts are now diluted.

The discipline is doing the filtering yourself. Before prompting, identify exactly which parts of the codebase are relevant to the task. Paste only those. If you’re implementing a new method on a class, paste the class definition and the interface it implements, not the entire module. The model will perform better, and the constraint forces you to think about the dependency structure, which is itself useful.

This connects to a practice some teams have started calling “repo maps”, where you generate a compressed representation of your codebase’s structure rather than pasting raw code. Aider, the command-line AI coding tool, builds these automatically: it uses tree-sitter to extract function signatures, class definitions, and call relationships, then provides the model with a navigational skeleton rather than full source. The model can request specific files it needs. This is a more principled solution to the filtering problem than manual paste selection.

Fresh Context and the Conversation Decay Problem

Conversational AI interfaces encourage long threads, and long threads are bad for code generation. Not because the models forget (they don’t, within their context window), but because wrong turns accumulate and the model becomes anchored to earlier, incorrect approaches.

The Cursor composer has a distinct mode for this: you can start a fresh context for each task, keeping only the files you explicitly include. Claude Code handles it similarly. The workflows that seem to work best treat each bounded task as a fresh conversation: gather context, generate, review, commit, then start fresh for the next task.

Starting fresh is psychologically uncomfortable because it feels like throwing away the history you’ve built up. The history is often a liability. A conversation where you’ve spent four exchanges debugging a wrong approach is not valuable context for the next implementation; it’s noise that the model will try to stay consistent with.

The Critical Review Step Is Not Optional

Every effective LLM coding workflow I’ve seen or used has a mandatory review step before anything gets committed. This sounds obvious and is routinely skipped, which is how subtle bugs ship.

LLM-generated code has a specific failure signature: it is locally plausible but globally wrong. Each line looks reasonable, the logic in any given block makes sense, and the function does approximately what was asked. The errors live in the interactions between components: the function that never evicts its cache, the lock that doesn’t cover all the shared state, the error path that forgets to decrement a counter.

This kind of error is harder to catch than a syntax error or an obvious logic mistake precisely because the code looks like correct code. It requires reading with the question “what assumption is this making that might not hold” rather than “does this look right.” That question requires knowing enough about the domain to have opinions about the assumptions.

For the Discord bot work I do, this is particularly sharp around the Gateway protocol’s rate limiting behavior. LLMs generate reconnection and backpressure code that looks structurally correct but frequently mishandles the interaction between identify rate limits and the heartbeat cycle. The generated code works in the happy path and fails exactly when you need it to work: under connection instability. Catching this requires knowing what the protocol actually guarantees, which the model doesn’t reliably know.

Where the Workflow Breaks Down

LLM-assisted development degrades gracefully up to a point and then falls off sharply. The useful region is roughly: bounded tasks with clear interfaces, operating in well-trodden domains, where correctness is verifiable by running the code.

Outside that region, the model’s confidence becomes a liability. Algorithmic work where the correct approach isn’t obvious, stateful systems with complex invariants, anything touching concurrency in languages without strong memory safety guarantees: these categories require understanding that the model doesn’t reliably have and that its confident presentation can obscure.

Simon Willison has written about this as a skill of knowing when to trust the output. That’s part of it. The other part is knowing when to abandon the LLM workflow entirely and just write the code, because for certain problems the cost of generating, reviewing, and correcting is higher than the cost of writing with understanding from the start.

The cases where I write code without LLM assistance have gotten narrower over time, but they haven’t gone to zero. Complex state machine logic, anything involving unsafe memory operations, protocol implementations where the spec is the source of truth rather than existing examples: these I still write by hand. Not because the models can’t generate plausible code for them, but because “plausible” isn’t good enough and I can’t verify correctness without understanding the code, which means I effectively have to write it anyway.

What This Requires

The pattern that emerges from effective LLM coding is not primarily about prompt engineering. It’s about doing more thinking upfront, being more disciplined about context selection, and maintaining the technical depth to evaluate what gets generated.

The developers who struggle with LLM-assisted coding often struggle because they want the tool to replace the thinking, and it can’t. The developers who get the most out of it are using it to accelerate the translation of clear thinking into working code, and they’ve retained enough technical understanding to catch the places where the translation went wrong.

That’s a less exciting description than “AI writes your code,” but it’s what’s actually happening in the workflows that work.