· 7 min read ·

The Contract Your AI Coding Session Is Actually Missing

Source: hackernews

The GSD system landed on HackerNews this week with 237 points and 128 comments, and the reception told a clear story: most upvotes came from practitioners who had independently arrived at some version of these patterns themselves, through failure rather than reading. That convergence across different tools, teams, and domains is the most useful thing about the project. It suggests the patterns reflect real constraints in how language models work, not personal preference.

GSD stands for Get Shit Done. It is not a library, not a CLI tool, and not a package you install. It is a methodology built from markdown templates and workflow conventions that works with any AI coding assistant: Claude Code, Cursor, Aider, GitHub Copilot. Three practices: context engineering, spec-driven development, and meta-prompting. Each addresses a different failure mode in how developers currently use AI coding tools.

The Context Anchoring Problem

Most developers using AI coding tools have experienced a version of this: you establish a constraint early in a conversation, generate some code, and several exchanges later the model violates the constraint. Not because the model forgot, exactly, but because attention is not uniform across a long context window.

Liu et al.’s 2023 paper “Lost in the Middle” established this empirically. Transformer models attend reliably to content at the beginning and end of their context window, with substantial degradation for content in the middle. A constraint introduced three messages into a conversation sits somewhere in the middle by the time code generation is happening fifteen messages later. The model’s attention has moved on.

The GSD fix is a “context anchor”: a single document injected at the consistent position the model attends to. For Claude Code it is CLAUDE.md; for Cursor it is .cursorrules; for GitHub Copilot it is .github/copilot-instructions.md. The content is deliberately different from what you would write for a human reader. An Architecture Decision Record explains reasoning, context, and alternatives considered. A context anchor states the decision and its constraints, optimized for token efficiency rather than comprehension. It communicates not “why we made this choice” but “this choice was made, here are the constraints it implies, do not violate them.”

This is a meaningful distinction. Human readers infer, generalize, and fill in implications. Models fill gaps silently with statistically plausible choices, which may or may not match your intent. The difference only becomes visible in production.

The Spec as a Model Input

Spec-driven development is not new. What GSD adds is a reframing: the spec is not written for the developer or the team, it is written for the model. The spec is a model input that directly controls what gets generated.

This changes what you put in the spec and how precise you are. Consider the canonical example from the project: a /remind Discord command.

## Command: /remind
**Purpose**: Schedule a message in the current channel after specified duration.
**Inputs**:
- `time`: Duration string (e.g., "30m", "2h", "1d")
- `message`: The reminder text
**Constraints**:
- Maximum reminder duration: 7 days
- Maximum active reminders per user: 10
- Times stored in UTC, displayed in user's registered timezone
- All pending reminders must be restored from the database on bot restart
**Error cases**:
- Invalid duration format: reply with examples
- Duration exceeds maximum: reply with the limit
- User at reminder cap: reply with count and list offer

The “restored from the database on bot restart” line is the load-bearing constraint. Without it, every model I have tested defaults to in-memory storage. It is the statistically plausible choice: simple, fast, and correct in all cases except the one that matters in production. The model has no mechanism for inferring persistence requirements that are not stated. A human developer would think to ask; the model generates.

Bot development makes this failure mode unusually visible because bots restart. The Discord API rate-limits reconnections, processes get killed during deploys, servers get rebooted. A bot that loses state on restart is broken in a way that may not surface in testing but surfaces immediately when users find their scheduled reminders gone after a deploy. The spec constraint is not a nice-to-have; it is the difference between a working feature and a silently broken one.

The same pattern appears everywhere in systems work: “this endpoint must be idempotent,” “this job must not run concurrently,” “this table will be queried by date range so the index order matters.” Human developers know to think about these things because they have been burned by them. Models generate the simplest implementation that satisfies the stated requirements. The spec’s job is to state the requirements completely enough that “simplest implementation” and “correct implementation” are the same thing.

The SWE-bench benchmark data is consistent here: agents working from well-specified intent outperform those working from implicit context even when codebase access is equal. The bottleneck is not model capability; it is specification quality.

Meta-Prompting: Making the Plan Visible

This is the component the HN thread discussed least, and the one I find most technically interesting.

When you prompt a model directly with “implement the /remind command,” planning and execution collapse into a single generation pass. The model plans invisibly, then generates code based on that invisible plan. You cannot review the plan. You cannot catch a wrong assumption before it propagates into code. You cannot redirect before the code exists.

A meta-prompt separates these steps. Instead of asking for code, you ask for a plan:

Given this spec, produce an implementation plan covering: the sequence of steps, any spec ambiguities, the data model including what must persist across restarts, error handling strategy, and what tests would verify correctness. Do not generate any code.

The output is a document you can read, check against your intent, and correct before any code exists. If the model’s plan stores reminders in memory rather than a database, you catch that before it is wired into code and tests. Once it is in the code, you are not reviewing a plan, you are reviewing an implementation, which takes longer and costs more to change.

GSD extends meta-prompting to context maintenance. The standard failure mode for CLAUDE.md files maintained by hand is that they accumulate notes rather than reflect the current designed state of the project. You add a note when something breaks, when a decision changes, when a new library is introduced. After six months the file is a timeline of decisions, not a coherent description of the project. A meta-prompt that regenerates the context anchor from the project’s current state produces a document organized for how models process information, rather than one that has grown by accretion from a developer’s incident log.

This is the piece that distinguishes GSD from simply having a CLAUDE.md. Any project can have a CLAUDE.md. GSD specifies how to write it, what to put in it, and how to keep it coherent as the project evolves, using the same meta-prompting workflow to maintain the document that you use to generate code.

The Adoption Problem

The TDD parallel that GSD makes explicitly is accurate and honest. There is strong evidence that test-first development produces better software. Adoption rates remain low because the practice adds friction at the point of writing code, with returns distributed across the project’s lifespan. Teams under deadline pressure skip tests.

GSD has the same structure with one meaningful difference: the feedback loop is shorter. A well-specified prompt produces noticeably better first-pass output in the same session. The return on writing a spec is visible immediately, not in some future sprint. That should help adoption. But “should help” is not the same as “does help.” Teams under deadline pressure will skip the spec phase and paste a vague description into the chat. The methodology’s instructions will go unused in the repository.

GSD amplifies rigor you bring to a session; it does not create rigor that is absent. The project names this honestly. Whether your team actually uses the workflow is a separate question from whether the workflow is sound. The answer to the first question depends on team culture, deadline pressure, and whether someone on the team has been burned badly enough by AI-generated code that silently violated unstated requirements.

What the HN Reception Signals

237 points and 128 comments, with most reactions being recognition rather than discovery, means the patterns are real and the formalization is the contribution. Developers working seriously with AI coding tools have been reinventing some version of context anchoring and spec-driven prompting independently, across different tools and domains.

What GSD provides is a codified, transferable methodology that can be adopted by a team rather than rediscovered by each developer individually. Patterns that live only in experienced practitioners’ heads do not transfer, do not scale, and do not survive team turnover. Putting them in a repository, linking them in an onboarding document, referencing them in a code review: that is how practices propagate.

The weakness is the one the project names honestly: nothing enforces the workflow. A spec template in the repository is not a spec culture on the team. If your team already has the discipline to write specs before implementing, GSD gives that discipline a structure optimized for the models you are working with. If the discipline is not there, GSD is a repository of unused markdown templates. That is not a criticism of the project. It is an accurate description of what any methodology can and cannot do.

Was this interesting?