Prompting Is Not the Skill: Writing Specifications for LLM-Assisted Development

Stavros Korokithakis published a workflow post recently that generated 519 Hacker News points and over 500 comments. Much of that discussion circled the same axis: context management, task scoping, and the frustration that comes from getting output that looks right but does not fit the system. What those debates have in common is that they are really debates about specification, even when nobody uses that word.

The single thing that separates productive LLM coding sessions from frustrating ones is how precisely you have specified what you want before the model starts generating. Not which model, not which tool, not even the stylistic quality of the prompt. Precision of specification.

The Failure Mode Without One

When a developer types “write a function that handles user authentication” into an LLM chat, something goes wrong before the model generates a single token. The request is under-specified. What kind of authentication? What does the user object look like? What persistence layer? What error handling is expected? What are the security requirements?

The model will answer regardless. It will make plausible choices for each unspecified dimension, drawing on patterns from training data. The output will look reasonable and might even work in isolation. The problem surfaces when the generated code makes assumptions that conflict with decisions already made elsewhere in the codebase. You have received a solution to a slightly different problem than the one you actually had.

This is the dominant failure mode in LLM-assisted development. It has nothing to do with the model being weak. It is a specification problem, and the model cannot fix it for you.

The Components of an Effective Spec

A specification for an LLM differs from a specification for a human colleague. Human colleagues have ambient project knowledge. They have seen the codebase, attended the design discussions, absorbed the conventions. They fill gaps from context. An LLM has none of that unless you provide it explicitly.

A useful LLM spec has four components.

Purpose: What this code needs to accomplish, defined at the interface rather than the implementation. Describe the observable behavior, not how to produce it.

Constraints: What must be preserved. Which existing interfaces must be respected. What performance characteristics matter. Which invariants must hold. If there is something the implementation must not do, say so explicitly.

Relevant context: Which parts of the codebase are directly involved. This is the component most developers skip. If your function needs to interact with a database layer, include the relevant schema or interface definition. If it integrates with an existing service, include that service’s signature. Do not include everything; include what the implementation decisions will depend on.

Definition of done: What does a correct output look like. If you have tests, reference or include them. If you do not, describe the cases the implementation must handle, including the failure cases.

Here is a concrete comparison. A weak prompt:

Write a reminder command for my Discord bot that lets users set reminders

A spec with the same goal:

Implement a /remind slash command handler for a Discord bot using discord.py 2.x.

Arguments:
- duration: string in the format "10m", "2h", "1d"
- message: string, max 200 characters

Requirements:
- Parse duration using parse_duration(s: str) -> timedelta in utils/time.py
- Store via ReminderService.create_reminder defined in services/reminders.py
- Respond with an ephemeral confirmation showing the scheduled fire time
- Return an error response if duration parsing fails

Do not implement the delivery side. Only the command handler.

The second version tells the model which existing functions to use, which interfaces to satisfy, what the success response looks like, and where the boundary of this task is. A model given the first version will invent a persistence layer, pick an error handling approach, and write delivery code you did not want. A model given the second version has a bounded, answerable problem.

How Tools Surface the Spec Layer

Different tools have different models for managing the context that surrounds your per-task specification.

Aider separates repository context from session context explicitly. You use /add to pull specific files into scope for the current task and /drop to remove them. There is also an optional repository map (--map-tokens) that gives the model an index of the full codebase without flooding the context with file contents. The separation is manual and explicit, which gives you control at the cost of overhead.

Claude Code uses a CLAUDE.md file at the repository root for persistent project context: conventions, key commands, architectural notes, things the model should always know about the project. Session-specific context supplements this per interaction. The design tries to separate “always know this about the project” from “know this for this specific task.”

Cursor uses @-mentions to pull files, documentation, or external references into context for a given interaction. The model is more fluid and suits exploratory work; the tradeoff is that the context boundary is less explicit.

All three are solving the same problem: the model needs project knowledge, the context window is finite, and irrelevant context degrades output quality as reliably as missing context does. Your per-task specification sits on top of whatever persistent project context your tool manages. The tool cannot replace the specification; it can only provide the background against which the specification operates.

The Spec as Independent Engineering Practice

Writing a precise specification before implementing anything is independently valuable, with or without an LLM. This is why the practice keeps reappearing in software engineering under different names: design by contract, interface-first design, test-driven development. They share a common mechanism: forcing explicit specification before implementation catches category errors while they are cheap to resolve. You surface ambiguity before it becomes a bug.

LLMs add a practical incentive to do this work that was not previously present. A vague direction costs an hour of cleanup when a human writes the code to it. A vague prompt costs a session of increasingly confused back-and-forth with a model accumulating corrections and partial attempts in a context that degrades with each exchange. The feedback loop is compressed, and the penalty for skipping the spec is more immediate.

Developers who adopt spec-first LLM workflows often report that the practice bleeds into their work generally. Writing down what a function should do, what it should not do, and what invariants surround it, before touching the keyboard, is a habit that makes code better regardless of who or what produces the implementation.

Where the Workflow Gets Hard

The spec-first model works well for new implementation, where the problem is genuinely about producing something that does not exist yet. It becomes harder in two common scenarios.

Debugging is the hardest case. The task is not “implement X” but “determine why Y behaves unexpectedly.” This is exploratory and hypothesis-driven, and it resists tight specification by nature. The approach that works is to use the LLM for specific sub-questions within a debugging session, such as “given this stack trace and this function signature, what are the plausible causes of this exception type?” rather than delegating the debugging process as a whole.

Large refactors have the same problem in a different form. “Make this code better” is not a spec. The productive approach is to decompose the refactor into concrete, specifiable steps with defined interfaces: extract this logic into a function with this signature, move this class to this module and update all call sites. Each step can be specified precisely. The whole refactor cannot.

The Skill That Transfers

The HN discussion around Stavros’s post was long partly because the underlying skill is genuinely hard to articulate. It is not prompting skill in the stylistic sense. It is the engineering discipline of being precise about what you want before you start building. That discipline predates LLMs and will outlast them. LLMs make it more consequential, because the penalty for vagueness is immediate and the reward for precision is a well-fitted implementation delivered in seconds rather than minutes.

For most developers, the bottleneck in LLM-assisted development is not learning better prompt patterns. It is building the habit of writing the specification first, as a first-class artifact, before the prompt exists at all.