· 6 min read ·

Spec First, Code Later: The Workflow Layer That AI Coding Tools Don't Give You

Source: hackernews

When Andrej Karpathy popularized the term “context engineering” as a replacement for “prompt engineering,” the framing shift was useful even if the underlying practice was not entirely new. Prompt engineering focused on the wording of individual instructions. Context engineering treats the entire information environment surrounding a model as the thing you are optimizing, with individual prompts as one input among many.

Most practitioners who have moved past copy-paste prompting have internalized this at some level. But knowing that context matters and having a systematic way to manage it are different things. The Get Shit Done system is an attempt to bridge that gap. It combines three techniques that are individually well understood but rarely composed into a coherent workflow: meta-prompting, context engineering, and spec-driven development.

What “Context Engineering” Actually Means in Practice

The CLAUDE.md file in Claude Code, the .cursorrules file in Cursor, GitHub Copilot’s .github/copilot-instructions.md — these are all implementations of the same idea. You write project-specific conventions into a file, and the tool injects it into every session so the model has baseline project knowledge without you reconstructing it from scratch each time.

That works well for constraints: “use zod for validation, not joi,” “database access only through lib/db.ts,” “we are on TypeScript 5.4 and can use decorators.” It works less well for behavior during a task. The static instruction file tells the model what the project looks like; it does not tell it how to think through a problem, when to ask for clarification, how to structure its output, or what to do when it hits ambiguity.

Context engineering in the fuller sense includes all of those layers: what the project is, what the conventions are, what the task requires, and what process to follow while working on it. GSD treats all four as explicit artifacts rather than implicit expectations.

Spec-Driven Development as a Workflow, Not Just a Philosophy

Spec-driven development predates AI coding tools. The idea is that writing a specification before writing code forces you to think through what you are building, surfaces ambiguities before they manifest as implementation bugs, and gives you a reference to test against. Test-driven development is one flavor. Writing a detailed RFC before touching code is another. Kent Beck’s core insight from TDD — that specifying behavior before implementing it produces better-designed code — applies at higher levels of abstraction too.

With AI as an implementation partner, spec-driven development gains a different kind of leverage. When you give a model a vague requirement, it produces something plausible-looking that may not match what you actually wanted. When you give it a detailed spec, the gap between intent and output narrows considerably. The model has more to work with and less room to fill in gaps with its own assumptions.

GSD makes spec creation a first-class step rather than optional documentation. Before any code generation begins, you write a spec file in markdown (because the model reads it naturally) that defines what the feature should do, what the interfaces look like, what edge cases exist, and what success means. That spec becomes part of the context for every subsequent prompt in the task, so the model works from the same reference document throughout rather than reconstructing intent from message history.

This is not fundamentally different from good software practice. What changes is that the spec is written partly for the model to read, not just for human reviewers. Precision that was optional when humans could infer intent becomes load-bearing when the reader is a language model.

Meta-Prompting: The Layer Most Workflows Skip

Meta-prompting is where GSD differentiates itself most clearly from simpler context management approaches.

A meta-prompt is a prompt whose purpose is to produce another prompt or to structure the generation process rather than produce the final artifact directly. Instead of asking “implement this feature,” a meta-prompt might ask “given this spec and these constraints, what information do you need before implementing this feature, and what is the correct implementation sequence?” The output of that prompt becomes context for the prompts that follow.

The value is that it externalizes the planning step. When you ask a model to implement something directly, the planning and the implementation happen together and are largely invisible to you. The model makes implicit choices about sequencing, decomposition, and trade-offs. Meta-prompting makes those choices explicit and reviewable before any code is generated. You can course-correct at the plan level rather than the code level, which is a much cheaper place to fix mistakes.

In practice, a meta-prompting workflow for a non-trivial feature looks something like this:

  1. Write or generate a spec document
  2. Use a meta-prompt to produce an implementation plan from the spec
  3. Review and revise the plan
  4. Execute each step in the plan with the spec and plan as persistent context
  5. Use a meta-prompt to review what was generated against the original spec

Step 5 is skipped in most ad-hoc workflows, which is why AI-generated code frequently passes casual review but drifts subtly from the original intent. A systematic review prompt closes that gap.

How GSD Composes These Into a System

The GSD repository organizes this as a set of template files and conventions rather than a framework with code dependencies. This is a deliberate choice: if the workflow lives in markdown files and prompt templates, it works with any AI coding tool rather than being coupled to one provider or interface. The same spec file and meta-prompt templates work whether you are in Claude Code, Cursor, or Aider.

The structure includes templates for project context (similar to CLAUDE.md but more comprehensive in scope), spec templates that guide you through what to capture before implementation, meta-prompt templates for planning and review phases, and workflow documentation describing the sequence.

This is closer to a methodology than a library. Adoption requires writing files, not installing packages. The downside is that the workflow depends on discipline rather than tooling to enforce it, so teams need to decide whether the structure is worth maintaining explicitly across the codebase.

Compared to Aider’s repo map, which generates structural context automatically by parsing the codebase for function signatures and class names, GSD’s context is entirely hand-crafted. That is more expensive to maintain but produces more precise context for the model. An automated repo map knows your function signatures; a hand-crafted spec document knows your intent. The two approaches are complementary. Research on SWE-bench tasks consistently shows that agents with well-specified intent dramatically outperform those working from implicit context, even when the codebase structure is equally available.

The Honest Trade-Off

The system works well if you are prepared to do the upfront work. Writing a spec before every feature adds friction that feels unnecessary for small changes. The meta-prompting workflow adds latency to each task. Teams under time pressure will skip steps, and once you start skipping the spec step, you are left with ad-hoc prompting plus extra process overhead.

The payoff is most visible on complex features with non-obvious requirements, or in teams where multiple people are working with AI tools and need shared context rather than individual session context. A well-maintained spec document means anyone on the team, using any AI tool, starts from the same understanding of what should be built. That consistency is harder to quantify than token counts, but the correction loops it eliminates are real.

The deeper point that GSD illustrates is that AI coding productivity is primarily a workflow problem. Model quality matters. Tool ergonomics matter. But the largest variance in output quality comes from how systematically context and intent are communicated before a single line of code is generated. Most teams are leaving that variance on the floor, session after session, and attributing the friction to model limitations rather than to the absence of structure.

The techniques GSD brings together — spec-first development, deliberate context management, meta-prompting as a planning layer — each have prior art going back years. The contribution here is composing them into a workflow that is concrete enough to actually follow.

Was this interesting?