Externalizing the Plan: What Meta-Prompting Actually Does to Your AI Coding Loop

Every implementation request to an AI coding tool contains two steps that happen in sequence, invisible to the reviewer. Planning first, implementation second. The planning step determines almost everything about what gets generated: how the problem is decomposed, what assumptions are made, which constraints are treated as binding and which get silently relaxed. Meta-prompting separates these two steps so the planning becomes a reviewable artifact before implementation begins.

This is the third and least discussed component of the Get Shit Done system (GSD), which describes itself as a meta-prompting, context engineering, and spec-driven development workflow. The 237-point Hacker News thread around it skewed heavily toward recognition from people who had arrived at similar practices independently. The context engineering and spec-driven components drew most of the discussion. Meta-prompting, the component that actually changes where mistakes get caught, got less attention than it deserves.

The Invisible Planning Step

When you send “implement the /remind command with a 7-day maximum and per-user limits” to a model, what happens first is not code generation. The model resolves a set of questions that your prompt left implicit: should the timer survive a bot restart, how should duration strings be parsed, where do reminders get stored, what happens when the storage call fails. It constructs an implicit plan that answers these questions in some order, then generates code that reflects those answers.

You see the code. The planning is gone.

This is not a limitation of current models that will eventually be fixed with better reasoning. It is structural: natural language implementation requests do not distinguish between “plan then implement” and “plan-while-implementing.” The model collapses both into a single generation pass because nothing in the prompt asks it to do otherwise. The decisions embedded in that pass are made based on statistical likelihood given the training distribution, not based on your actual requirements.

Better models make this harder to detect, not easier. A capable model makes confident implicit decisions that produce code which passes casual review. The planning it did was plausible. Whether it was correct is only visible to someone who already had a precise intent and can compare the output to it.

What a Meta-Prompt Does

A meta-prompt changes the task. Instead of asking the model to implement something, you ask it to produce a plan from a spec. The output of that request is explicit, reviewable, and correctable before a single line of code is generated.

A concrete example, derived from the GSD workflow, illustrates the difference. Given a task spec for the /remind command, a direct implementation prompt yields a code artifact. A meta-prompt framed as follows yields something reviewable:

Given this spec, produce an implementation plan that covers:
- The sequence of steps in implementation order
- Any ambiguities in the spec that need resolution before coding
- The data model, including what persists across restarts
- Error handling strategy for each error case
- What tests would verify correctness

Do not generate any code yet.

The output is a plan document. That document surfaces the model’s interpretation of the spec: how it understood the restart requirement, what data shape it planned for reminders, how it intended to handle the duration parser. You can read it in two minutes and catch misalignments before they become code.

More importantly, the plan reveals questions the spec left implicit. If the model produces a plan that says “reminders stored in memory as a Map” and your spec says “restored from database on restart,” the contradiction is visible at the plan level rather than buried in a code review. Fixing a misunderstood constraint in a plan takes one message. Fixing it after code generation takes a full cycle of review, correction, and re-generation.

The Review Problem With Implementation-First Workflows

Code review is expensive compared to plan review, and not just in time. Code embeds reasoning in a form that makes it hard to evaluate whether the reasoning was correct. You can check whether the code does what it says. Checking whether what it does matches what was intended requires reconstructing the intent from the request and then tracing through the implementation to verify alignment, which is precisely the step that most code review skips under time pressure.

A plan review is different. The plan states the model’s interpretation of the requirements in prose before implementation. You are comparing a statement of intent to a spec, not a code artifact to an implicit requirement. The alignment check is direct and cheap.

SWE-bench research has consistently shown that agents working with well-specified intent dramatically outperform those working from implicit context. That result holds even when the codebase structure is equally available to both. The specification is doing work that codebase navigation cannot substitute for. A meta-prompting workflow makes specification an input to planning, and planning an input to implementation, rather than collapsing all three into a single generation pass.

How GSD Structures the Loop

The four-phase GSD workflow shows where meta-prompting fits:

Bootstrap: Describe project intent in natural language, use a meta-prompt to generate spec files, architecture docs, and a CLAUDE.md. Review and revise the outputs.
Task planning: Describe a feature, use a meta-prompt to generate a task spec informed by the project context. Review the spec before any implementation begins.
Implementation: Feed the spec plus a meta-prompt for the implementation plan to the model. Review the plan, then implement step by step.
Context maintenance: Use a meta-prompt to update the context anchor with decisions made during implementation, regenerating coherent documentation rather than patching stale notes forward.

The meta-prompting step appears at the beginning of every phase, not just the implementation phase. At bootstrap, it generates the documentation that drives everything else. At task planning, it generates the spec. At implementation, it generates the plan. At context maintenance, it regenerates coherent context from the current state rather than accumulating forward from the initial state.

This structure means context compounds rather than drifts. A CLAUDE.md maintained by a meta-prompting step reflects the project as it is now, not as it historically grew. The typical hand-maintained version reflects the project’s history: you add a note when something went wrong, append a constraint after a model mistake, and six months in the file is a palimpsest with no organizing principle. Regenerating from a description of the current state produces a document with the structure and granularity that models handle well.

The Connection to How Senior Engineers Actually Work

The practices that GSD formalizes are not novel. Senior engineers working on complex features typically do not jump directly from requirement to implementation. They sketch a plan, often on paper or in a doc, identifying the sequence, the edge cases, the dependencies. They share it with a colleague before writing code. The planning is the part that gets discussed in design reviews, not the code.

Junior engineers skip this step under time pressure and produce implementations that require significant revision because the planning embedded in the code was wrong. The code has to be substantially rewritten, not because the syntax was wrong but because the approach was wrong.

AI coding tools democratize implementation speed while providing no structure that enforces or even encourages the planning step that senior engineers apply before implementation begins. The model will produce code immediately for any request, whether or not the planning embedded in that code is correct. Meta-prompting reintroduces the planning step as an explicit phase with a reviewable output, which is what design reviews and technical specs did for human implementation partners.

Princeton’s SWE-agent research documented a related point at the tool level: deliberately designed tools with explicit context, line numbers, and structured error messages improved resolve rates substantially over naive shell access. The scaffolding around the model shapes its behavior as much as the model’s capability does. Meta-prompting is scaffolding for the planning step.

The Discipline Cost, Honestly

The meta-prompting step adds latency to every feature. For small, clearly specified tasks where the only plausible implementation is obvious, the plan review is overhead without payoff. Teams under time pressure will skip it, and skipping it selectively on “obvious” features means the discipline degrades exactly when requirements are implicitly complex.

GSD is a methodology, not a library. The templates are markdown files. Nothing enforces the workflow; you have to apply it consistently yourself. The existing tools already give you CLAUDE.md for Claude Code, .cursorrules for Cursor, auto-generated repository maps in Aider. GSD adds a workflow layer above those tools that none of them provide by default.

The payoff is most visible on features with non-obvious requirements, on tasks where multiple people need shared understanding of what was decided before implementation, and in correction loops. Mistakes caught in a plan cost one exchange. Mistakes caught in code review cost a full implementation cycle. Over a sprint with ten features, shifting even three or four mistake catches from code to plan level is recoverable time.

The Hacker News response suggests experienced teams are already doing something like this informally. GSD’s contribution is making it concrete enough to adopt deliberately rather than rediscover session by session through accumulated frustration.