Treating the Spec as a Model Input: The Insight Behind GSD

The popular framing for AI-assisted development is that it’s a writing tool. You describe what you want, the model generates code, you review and edit. The mental model is a faster keyboard. What GSD (Get Shit Done) proposes, with 237 points and substantive discussion on Hacker News, is that this framing causes predictable, structural failures, and that three practices together can fix them: context engineering, spec-driven development, and meta-prompting.

The argument is worth taking seriously not because GSD is particularly novel, but because it formalizes patterns that experienced practitioners have independently converged on. When multiple people working across different tools, codebases, and languages arrive at the same workarounds, that convergence suggests the patterns reflect something real about how the underlying models work.

The Problem GSD Is Solving

AI coding tools have compressed the inner loop of software development: write code, run tests, fix errors, repeat. That part is genuinely faster. What hasn’t changed is the work that surrounds it, decomposing features into implementable units, making architectural decisions explicit before they’re encoded in code, maintaining coherent context across sessions that don’t share state. GSD calls this the “middle loop,” borrowing terminology from Martin Fowler’s recent writing on how AI is restructuring engineering work. The middle loop is where most delivery problems live, and AI coding tools don’t address it at all.

The specific failure mode is worth understanding precisely. When you give an LLM an underspecified request, it doesn’t fail visibly. It fills gaps silently using statistically plausible decisions. If you ask for a /remind command for a Discord bot without specifying persistence requirements, you’ll get a working implementation that uses in-memory storage. The code passes any tests you write against it. It ships to production and works until the bot restarts. The model didn’t make an error; it made a decision you didn’t know you were delegating. This is a different failure mode than a syntax error or a wrong algorithm. It’s an architectural choice made without your awareness, at a moment when correcting it required no effort.

Specs as Model Inputs, Not Documentation

The spec-driven component of GSD rehabilitates spec-first thinking, but with a different justification than traditional software engineering offered. The traditional argument for writing specs was that they catch misunderstandings early and serve as reference documentation. Both arguments are correct, and both are routinely ignored because the upfront cost is visible and the benefit is diffuse.

GSD’s argument is more immediate: the spec is a model input, and the quality of the model’s output scales directly with the precision of the input. This isn’t about documentation overhead. It’s about what information the model has access to at inference time.

The spec format GSD proposes is deliberately lean: purpose, inputs, constraints, error cases. The constraints section is the load-bearing part. Writing “All pending reminders must be restored from DB on restart” in the constraints list is the difference between getting the in-memory implementation and getting the correct one. The constraint is in the spec; it ends up in the code. This is a closed loop that traditional spec-writing never had, because traditionally the spec was read by humans who might or might not carry it into their implementation decisions.

SWE-bench research consistently shows that agents working from well-specified intent substantially outperform those working from implicit context, even when both have equivalent codebase access. The information available at planning time shapes the solution more than raw capability.

The Meta-Prompting Insight

The most technically interesting part of GSD is the meta-prompting layer, which addresses something easy to overlook: every code generation request involves two sequential steps that happen invisibly inside the model. The model first constructs a plan, resolving implicit questions about architecture, persistence, error handling, and data model, and then generates code that reflects that plan. You see the code. The planning step is gone.

Meta-prompting separates these steps. Instead of asking the model to implement the spec, you ask it to produce a reviewable implementation plan:

Given this spec, produce an implementation plan that covers:
- Sequence of implementation steps
- Any ambiguities in the spec and how you'd resolve them
- Data model decisions
- Error handling strategy
- Test cases

Do not generate any code yet.

The plan reveals the model’s interpretation of the spec before any code exists. If the model resolves an ambiguity in a way you didn’t intend, you correct it in one exchange and regenerate the plan. Caught at plan level, a constraint mismatch costs one message. Caught after implementation, it costs a full cycle including code review, revision, and retesting.

This is the underlying logic of BDD and TDD applied to the model’s planning step rather than to the output’s behavior. Write the plan, review the plan, implement against the plan. The discipline is the same; the target is different.

For anyone building bots or small services with an AI coding assistant, this pattern is worth internalizing. The model’s first-pass plan for something like a rate limiter or a background job scheduler will almost always resolve ambiguities about storage, retry behavior, and failure modes. Most of those resolutions will be wrong for your specific context. Seeing them in a plan before they’re baked into code is much cheaper than finding them in a code review.

Context Engineering and the Living Anchor

The third component addresses a more subtle problem: context degrades across a session as the model accumulates conversation history, tool outputs, and intermediate results. Liu et al.’s 2023 “Lost in the Middle” research established that transformers reliably attend to content at the beginning and end of long contexts; material that drifts to the middle sees substantially degraded recall.

Static project files like CLAUDE.md or Cursor’s .cursorrules handle stable project invariants well. They describe what the project is. They don’t capture what was decided during a session: an approach that was ruled out and why, a constraint that was negotiated, a scope boundary that was drawn to keep the current task tractable. GSD introduces a “context anchor” for this purpose, a structured document that evolves during a session and tracks the live decisions that static project files don’t capture.

The anchor format includes: Active Constraints, Decisions Made with rationale, Ruled-Out Paths, Current Scope, and Open Questions. The rationale field is not optional decoration; it’s the part that prevents the model from re-litigating decisions in a later exchange when the original context has compressed out of the window.

Rahul Garg’s work on context anchoring at MartinFowler.com formalizes similar patterns and is worth reading alongside GSD. The convergence between that work and GSD on the same structural solutions suggests these aren’t personal preferences; they’re responses to constraints that are fixed in the current generation of models.

The distinction between a static CLAUDE.md and a live context anchor maps onto a familiar systems distinction: configuration versus state. CLAUDE.md is configuration, updated infrequently, describing invariants. The context anchor is state, updated continuously during a session, describing what has changed. Mixing them produces the same class of bugs that mixing configuration and state always produces: stale state masquerading as invariants, or invariants that silently change.

The Adoption Problem

GSD’s own documentation acknowledges the comparison to TDD, and it’s accurate. Both require upfront work with no immediately visible output. Both are the first things dropped under deadline pressure. Both compound their benefit over time in ways that are hard to demonstrate in a single session.

There is one meaningful difference: GSD’s return on investment is visible in the same session. A well-specified prompt with an explicit implementation plan produces noticeably better first-pass output than an underspecified request. The feedback loop is tight enough that the discipline pays off before the session ends, not in a future sprint.

The maintenance concern raised in the HN discussion is real: a stale spec is potentially worse than no spec if the model treats it as authoritative when the codebase has moved on. GSD’s answer is to use meta-prompting to regenerate context documents rather than hand-patching them forward, treating them as generated artifacts rather than curated records. That shifts the synchronization burden but doesn’t eliminate it.

For teams already using Claude Code, Cursor, or Aider, GSD is worth adopting incrementally. Start with the spec format for any non-trivial task. Add the meta-prompt planning step when the task involves architectural decisions. The context anchor becomes valuable once sessions get long enough that you notice context drift. None of it requires tooling changes; it’s markdown and discipline.

The project ships as a set of markdown templates with no install required. The repository includes ready-to-use prompt templates for each phase. The investment is front-loaded in learning the pattern, not in setting up infrastructure.