Prompts as Source Code: What Thoughtworks' SPDD Workflow Gets Right

Most teams using LLM coding assistants today have an awkward asymmetry in their workflow. The code is in Git, reviewed, tagged, and traceable. The prompts that produced that code live in a Slack thread, a tab someone forgot to close, or nowhere at all. If the generated module breaks six months from now, nobody can replay the conversation that built it.

Thoughtworks’ internal IT organization has been pushing on this gap with a workflow they call Structured Prompt-Driven Development, or SPDD. Wei Zhang and Jessie Jie Xia wrote it up on Martin Fowler’s site with a companion GitHub example. The headline idea is straightforward: treat prompts as first-class artifacts, version them with the code, and structure them so that a team, not just one developer, can collaborate around them.

The interesting part is not that prompts are useful. Everyone using Copilot, Cursor, or Claude Code already knows that. The interesting part is the claim that the prompt itself is the unit of work worth governing.

Why this is a real shift

To see why versioning prompts matters, it helps to remember what a prompt actually is in a coding context. It is a specification, expressed in natural language, against which a probabilistic interpreter generates an artifact. That makes it closer to a build script than to a chat message. A build script you do not check in is a build script that does not exist.

The ad-hoc pattern most developers fall into looks like this. You open the assistant, describe the change, accept or edit the diff, and move on. The diff lands in version control. The description does not. Three things get lost: the requirement that produced the code, the constraints you imposed (“do not touch the auth module”, “use the existing Result type”), and the negative space of things you explicitly rejected during the back-and-forth.

SPDD’s answer is to externalize the prompt into a file that lives next to the code. Their example layout, visible in the referenced repository, separates a high-level intent prompt from lower-level task prompts, and keeps both reviewable in pull requests. Reviewers look at the prompt and the diff together. That changes what a code review is.

The three skills, and what they’re really about

Zhang and Xia name three skills developers need to make SPDD work: alignment, abstraction-first thinking, and iterative review. These read as soft skills on first pass, but each one is a response to a concrete failure mode of LLM-generated code.

Alignment is about closing the gap between business intent and prompt text. The classic failure is a prompt like “add a discount calculation” that produces something technically plausible and semantically wrong, because the developer never wrote down which discounts apply, in what order, and to whom. SPDD pushes that clarification into the prompt artifact, where product and engineering can both see it. This is the same impulse behind behavior-driven development twenty years ago: make the spec executable, or at least co-located.

Abstraction-first is a defense against a known LLM weakness. Models will happily generate three hundred lines of inline logic when the right answer is a ten-line function that delegates. Asking the model to design the interface before generating the implementation flips the order. You review a signature and a contract, agree on it, and then ask for the body. The intermediate artifact is a small, cheap thing to throw away. The full implementation is not.

Iterative review acknowledges that one-shot generation does not work past trivial scope. The prompt becomes a document you edit, the way you edit a design doc, with each revision tightening constraints. This is where the version-control angle pays off: a prompt’s history is a record of which constraints were added and why, the same way a commit log is a record of decisions in the code.

How this compares to adjacent ideas

SPDD is not the only attempt to formalize LLM-assisted development. It sits in a small but growing field.

Spec-driven development with tools like Kiro and Amazon’s recent push around agentic specs takes a heavier approach: a structured spec document with sections for requirements, design, and tasks, processed by the model in stages. SPDD is lighter; the prompt is the spec, and the structure comes from team convention rather than tooling.

GitHub’s Spec Kit, released in 2025, similarly treats /specify, /plan, and /tasks as explicit phases backed by markdown files in the repo. The phase boundaries are the main difference from SPDD, which leaves the workflow loose and the artifacts simple.

Anthropic’s Claude Code documentation recommends CLAUDE.md files at the project and directory level, which is a narrower version of the same instinct: persistent, versioned guidance for the model, separated from per-task prompts. Cursor has .cursorrules playing the same role.

What SPDD adds, and what the others mostly do not, is a team workflow rather than a tooling story. The prompts are not just there to make the next generation better. They are there so that another developer, or the same developer in three months, can understand what was asked for and why.

Where it strains

A few problems are visible if you sit with the idea for a while.

The first is that prompts and code drift apart. When you edit the generated code by hand, the prompt no longer describes what is there. You can re-run the prompt and get something different, because the model is non-deterministic and because the surrounding code has changed. SPDD does not solve this, and the article does not pretend it does. In practice you end up treating the prompt as a historical record of intent, not a regeneration script. That is fine, but it is a weaker claim than “the prompt is the source of truth.”

The second is review fatigue. If every PR contains a prompt and a diff, reviewers now have two artifacts to read. In theory the prompt is shorter and the diff falls out from it; in practice both need scrutiny, because the model can produce a plausible diff from a prompt that nobody would actually approve in isolation. The cost of review goes up before it goes down.

The third is that the abstractions a model picks under “abstraction-first” prompting are still a model’s abstractions. They look reasonable. They are statistically average. If your codebase has strong conventions, you need to encode them in the prompt or in a persistent rules file, or the model will quietly regress toward the mean of its training data. This is solvable but it is work, and it is the kind of work that is invisible until it is missing.

What I’d actually steal from this

For my own projects, the pieces of SPDD worth borrowing are narrow and concrete. Keep a prompts/ directory in the repo. For any non-trivial generated change, drop the prompt that produced it next to the diff in the PR description, or in a sibling markdown file. Treat the prompt as part of the change, not as scratch work.

The heavier ceremony, the formal alignment and abstraction-first phases, makes sense for a Thoughtworks-sized client engagement with multiple developers on the same feature. For a solo Discord bot, it is overkill, and the friction will eat the benefit.

The deeper point Zhang and Xia are making, though, is one I think holds at any scale. The output of an LLM-assisted workflow is not just code. It is code plus the recoverable reasoning that produced it. Throwing away the second half because it lived in a chat window is the same mistake teams made twenty years ago when they threw away design documents because they lived in someone’s inbox. We figured that one out eventually. This is the same lesson, with a new artifact to mishandle.