Treating the Spec as an Interface: What GSD Gets Right About AI Workflow Design

Most AI coding tools give you a capable model and an empty text box. What you do with that is your problem. The implicit assumption is that productivity comes from model quality, and that better models will produce better results regardless of how you structure the work. That assumption has not aged well.

The Get Shit Done system starts from a different premise: the bottleneck is not the model, it is the workflow. GSD combines meta-prompting, context engineering, and spec-driven development into a methodology that treats AI-assisted development as a discipline rather than a dialogue. The HN thread around its release accumulated over 230 points and 128 comments, which suggests the underlying problem resonates even if the specific implementation is debatable.

The Spec as Interface Contract

Systems programmers spend a lot of time thinking about interfaces before implementations. You define a struct’s fields and method signatures. You write the .h header before the .c body. You specify the API surface of a library before writing its internals. The discipline exists because thinking about interfaces forces you to clarify what a component promises to callers, separate from how it delivers on that promise.

The spec in GSD serves the same role, but the “caller” is an AI model rather than a downstream module. When you ask a model to implement something without a spec, it fills the gap between your vague requirement and a concrete implementation using its own priors. Those priors are trained on millions of codebases and are generally reasonable, but they are not your codebase, your constraints, or your users. A spec narrows that gap before implementation begins.

The format GSD uses is markdown, which is the right choice for two reasons. Models read markdown naturally, having been trained on vast quantities of it. And markdown is low-friction enough that developers will actually write it, unlike formal specification notations such as TLA+ or Z notation, which carry substantial cognitive overhead.

A spec in this system captures what a feature should do, what the interfaces look like, what edge cases matter, and what success means. That last item is underappreciated. When you define success criteria upfront, you have something to evaluate generated code against, rather than eyeballing the output and deciding it looks plausible.

Meta-Prompting as a Type System for Intent

Meta-prompting is the part of GSD that most tools do not have an equivalent for.

A meta-prompt does not ask the model to produce the final artifact. It asks the model to reason about the task before producing anything. “Given this spec and these project constraints, what is the correct implementation sequence, and what ambiguities need resolution before I start?” The output of that prompt becomes context for subsequent work.

This is structurally similar to what a type system does for code: it makes implicit assumptions explicit and checkable at a level above the implementation. When a Rust function signature says it takes &str and returns Result<Vec<u8>, io::Error>, it is encoding intent that would otherwise live only in the implementer’s head. Meta-prompting does the same for a model’s implementation plan. The plan is legible, reviewable, and correctable before any code exists.

The alternative, which is how most people work with AI tools, is to let the model plan and implement simultaneously. The planning is invisible. When the output is wrong, you cannot tell whether the model misunderstood the task, made a poor decomposition decision, or hit a genuine ambiguity in the spec. Meta-prompting separates these so each failure mode is diagnosable.

What This Adds Beyond CLAUDE.md and Cursor Rules

Context management via static instruction files is already widespread. Claude Code has CLAUDE.md, Cursor has .cursorrules, GitHub Copilot has .github/copilot-instructions.md. These files inject project-level context into every session: coding conventions, library preferences, file organization, things the model should never do. They are effective for their purpose.

The limitation is that they are static. A CLAUDE.md file tells the model what the project looks like at baseline. It does not tell the model how to think through a novel problem in that project, what process to follow when requirements are ambiguous, or how to review its own output against original intent. Context engineering in the fuller sense means managing all of those layers, not just the project conventions layer.

Aider approaches this differently, using a repository map that parses the codebase for function signatures, class names, and imports to give the model structural awareness without requiring manual curation. That produces solid context for understanding how things connect; it does not produce context for understanding why things are the way they are or what a new feature should accomplish.

GSD’s context is hand-crafted, which means it is more expensive to maintain but carries intent that cannot be inferred from structure. The two approaches are not competing; they solve different parts of the context problem. A well-maintained spec document and a structural repo map together give a model more to work with than either alone.

The Token Economics Are Real

Meta-prompting and spec injection add tokens to every request. On a small feature, the overhead is measurable. On a complex, multi-session feature with ambiguous requirements, the overhead is a rounding error compared to the tokens spent in correction loops.

Research on SWE-bench tasks, which benchmark models against real GitHub issues, consistently shows that agents working from precise specifications outperform those working from issue descriptions alone, often by substantial margins. The performance gap is not primarily about model capability; it is about information quality going in. More precise input means fewer iterations, fewer hallucinated assumptions, and less time spent diagnosing why the generated code does not match what was wanted.

For solo projects and small scripts, this overhead is rarely worth it. For anything complex enough that you would otherwise spend time in review and revision cycles, the spec-first investment usually pays back in reduced correction cost.

Where the Discipline Breaks Down

GSD is a methodology without enforcement. Nothing in the toolchain prevents you from skipping the spec step and going straight to implementation. No test fails. No compiler complains. The workflow depends entirely on the developer maintaining the discipline to write the spec before the code.

This is the same problem that code review norms face. The process is valuable when followed, but it adds friction that makes shortcuts tempting under time pressure. Teams that make spec-writing a genuine first-class step, rather than optional documentation, are teams that have explicitly agreed the upfront cost is worth the downstream consistency.

For solo developers, the most practical version of this is probably lighter-weight: a brief intent document that captures what you want, what constraints apply, and what done looks like, written before any generation happens. Not every feature needs a full spec. But any feature where you find yourself iterating through multiple rounds of AI-generated code and corrections is a feature that would have benefited from writing the spec first.

The Pattern Behind the Tool

GSD is not the only project exploring this space. Tools like Plandex and Devon approach long-horizon AI coding tasks with various forms of planning and context management. Amp has built spec-first workflows into its core interface. The common thread is that serious AI-assisted development requires something above the individual prompt, a layer that manages how context accumulates and how intent propagates through a multi-step task.

GSD’s specific contribution is making that layer explicit and tool-agnostic. Because it lives in markdown files and prompt templates, it works across Claude Code, Cursor, Aider, or any interface where you can paste text. That portability means the workflow survives model changes and tool shifts, which matters more than any specific tool integration.

The deeper observation is that AI coding productivity is primarily a workflow engineering problem. Model capability is the ceiling; workflow quality determines how close you get to it.