The TDD Problem in Disguise: Why GSD's Adoption Challenge Is Familiar
Source: hackernews
Software engineering has a long history of practices that work when applied with discipline and collapse when that discipline lapses. Test-driven development is the canonical case: rigorous studies show it improves code quality and reduces defect rates, experienced practitioners swear by it, and most teams practice it intermittently at best. The constraint is not belief in the method. It is the upfront cost in a world where deadlines press against the work that ships.
Get Shit Done (GSD), a methodology for AI-assisted development that arrived on Hacker News with 237 points, has the same structure. It combines three practices, context engineering, spec-driven development, and meta-prompting, into a coherent workflow with concrete templates. The practices are individually recognized. The combination is the contribution. And the adoption challenge is nearly identical to TDD’s.
What GSD Actually Is
GSD is not a tool or a library. It is a set of markdown templates and workflow conventions that work with any AI coding assistant. The system has three layers.
Context engineering is the practice of deliberately managing what information the model has access to at inference time. GSD formalizes this as a “context anchor,” a living document kept at the project root and consumed by every session. In Claude Code this is CLAUDE.md; in Cursor it is .cursorrules; in GitHub Copilot it is .github/copilot-instructions.md. The document captures project goals, key architectural decisions with their rationale, and things explicitly ruled out. It is written for token efficiency and model consumption, not human readers.
The scientific basis for this practice is specific. Liu et al.’s 2023 paper “Lost in the Middle” demonstrated that transformer models attend reliably to information at the beginning and end of long contexts, with substantial degradation for content in the middle. In a multi-turn coding session where you introduce a constraint mid-conversation, that constraint may not reliably reach the model’s attention by the time it generates code many exchanges later. The context anchor addresses this directly by putting critical constraints at a position the model consistently attends to.
Spec-driven development, GSD’s second layer, is the practice of writing precise task specifications before any code generation. A spec for a Discord bot reminder command would specify the duration format, the per-user limit, UTC storage with user-timezone display, and the requirement that reminders survive bot restarts. That last constraint is the important one. Without explicit specification, it will not appear in generated code. The model has no mechanism for inferring that a scheduled job should persist across process restarts if you do not say so. The forcing function of writing the spec is that implicit assumptions become legible before code generation rather than visible only when the system fails.
SWE-bench research consistently shows agents working from well-specified intent outperform those working from implicit context, even when equal codebase access is provided. The specification is the leverage point, not the model capability.
Meta-prompting, the third layer, uses the model itself to generate and maintain the prompts and context files that drive subsequent interactions. More specifically, it separates planning from implementation. When you send a complex implementation request, the model first constructs a plan, resolves implicit questions, and then generates code reflecting that plan. The planning is invisible. A meta-prompt that says “given this spec, produce an implementation plan; do not generate any code yet” makes the planning step a reviewable artifact before any implementation begins. Contradictions between spec and plan surface there rather than in code review, where catching them costs a full implementation cycle.
Why the HN Reception Matters
The 237 points and 128 comments GSD received skewed heavily toward recognition rather than discovery. Experienced practitioners who work seriously with LLMs had already arrived at variations of all three practices through independent trial and error. This convergence is itself a data point. When practitioners across different tools, languages, and domains independently develop the same workflow patterns, those patterns are more likely to reflect fundamental constraints in how models work than to reflect stylistic preferences or community drift.
The constraint is not obscure. AI coding tools accelerate implementation. They do not accelerate specification. When writing code is fast and specifying intent remains slow and error-prone, the bottleneck shifts to specification. Teams that do not address this bottleneck explicitly will continue attributing output quality variance to model capability, when much of that variance comes from how systematically context and intent are communicated before generation.
The TDD Parallel and Where It Breaks
Here is where the structural parallel to TDD is useful: both practices require upfront work that generates no immediately visible output. Writing a test before the code it validates, or writing a spec before the implementation, both feel like overhead. Under deadline pressure, both get skipped. Both also have strong evidence bases that the discipline pays off at scale.
TDD’s adoption has been studied for decades. Adoption is highest in teams with strong testing cultures, external quality mandates, or leaders who protect the practice under pressure. Left to its own devices, teams drift toward writing tests after implementation, then toward writing tests only for complex logic, then toward writing tests only when something breaks. The practice works best when it is structurally enforced rather than individually maintained.
GSD has a potential advantage here. The feedback loop is fundamentally tighter. With TDD, the payoff for writing the test first is visible later: in fewer debugging sessions, in safer refactoring, in defect rates over a sprint. The signal is delayed and diffuse. With GSD, the payoff for writing a precise spec is visible in the same prompt cycle. A well-specified prompt produces noticeably better first-pass output than a vague one. The model’s output is the immediate signal.
That tighter feedback loop may not be enough to overcome the discipline problem. “Noticeably better” is a relative term, and when you are under pressure, “good enough for now” competes favorably with “better if I spend fifteen minutes writing a spec.” But the feedback is at least in the right place. TDD asks you to trust a delayed return on investment. GSD’s return is visible in the same session.
What the Methodology Cannot Do For You
GSD’s templates and workflow do not create rigor that was not there. They amplify rigor you bring. A team that already writes specifications and reviews plans before implementation will find GSD formalizes what they do and makes it consistent across sessions. A team that does not will find the overhead unsustainable under time pressure and will drift toward using the templates only for complex features, then only occasionally, then not at all.
This is not a critique of the system. It reflects the honest structure of workflow methodologies in general. The value of GSD is that it provides concrete templates that lower the activation energy for each practice: you do not have to design the meta-prompt from scratch, you do not have to decide what goes in the context anchor, you have a starting point. Lower activation energy helps, but it does not substitute for team commitment to the practice.
The deeper implication is structural. AI coding tools have accelerated implementation to the point where specification is the primary constraint on development velocity. GSD addresses that constraint with discipline and templates. The alternative, waiting for tooling to enforce spec-first workflows the way compilers enforce types, may be closer than it appears. Several coding agents are beginning to surface planning steps as reviewable artifacts rather than hiding them inside generation. When the planning step becomes structurally required rather than optionally elicited through a meta-prompt, the adoption problem changes shape.
Until then, GSD is a clear answer to a real problem. Whether a given team sustains it is the question TDD practitioners have been answering for themselves for twenty years.