· 8 min read ·

The Missing Retrospective: Turning Individual AI Learnings into Team Infrastructure

Source: martinfowler

Most teams that adopt AI coding assistants go through a similar arc. In the first few weeks, developers individually discover what works: which kinds of tasks benefit from detailed upfront context, which prompts produce clean diffs versus rambling prose, which aspects of the codebase the AI consistently misunderstands. Some of this gets shared in Slack threads or casual hallway conversations. Most of it stays in people’s heads.

Six months in, the team’s senior developer has become an expert at prompting Claude or Copilot. The junior developers are still re-discovering patterns the senior already knows. Onboarding a new team member means they start from zero with the AI tools, even though the team has collectively burned hundreds of hours figuring things out.

This is the structural problem that Rahul Garg addresses in the final installment of his series on reducing friction in AI-assisted development on Martin Fowler’s site. He calls the solution a feedback flywheel: a structured practice for harvesting learnings from individual AI sessions and feeding them back into shared team artifacts, so that individual experience compounds into collective improvement. The problem is not that developers are hoarding their AI knowledge; there is simply no workflow mechanism to capture and redistribute it, and the feedback flywheel is designed to fill that gap.

Why Session Learnings Don’t Transfer on Their Own

AI coding sessions produce two kinds of output: the code or artifact the AI helped create, and the invisible knowledge of how to work with the AI effectively. The first output gets committed and reviewed. The second evaporates.

This asymmetry is not unique to AI tools. The same problem existed with build systems, testing frameworks, and any new tool a team adopts. The difference is that AI tools surface a much larger space of behavioral variation. A linter has a finite configuration space. A large language model’s effective behavior depends on how you frame problems, what context you include, which caveats you add, which parts of the codebase you explicitly point to. The surface area of “how to use this well” is enormous, and it shifts as models are updated.

That enormous surface area means individual discovery is expensive and uneven. Developers who spend more time with AI tools, who are naturally inclined to experiment, or who happen to work on problem types where the AI excels will develop much stronger intuitions. Those intuitions stay personal unless something actively moves them into shared artifacts.

The comparison to pair programming is useful here. Pair programming transfers tacit coding knowledge between developers as a side effect of the collaboration itself. There is no equivalent built-in transfer mechanism for AI tool knowledge, because AI sessions are almost always solo. The feedback flywheel is an explicit substitute for the organic knowledge transfer that pairing provides.

What the Flywheel Mechanism Does

The core of Garg’s proposal is structured harvest sessions: regular team rituals where developers bring observations from their AI workflows, identify patterns, and update shared artifacts based on those patterns.

This maps directly onto retrospective practice in agile teams: reflect on what happened, extract learnings, change process. What Garg adds is specificity about the artifact types that should receive those learnings, and the regular cadence that keeps the capture habit alive.

In a project using Claude Code, the obvious candidate artifact is the CLAUDE.md file. This is the project-level instruction file that Claude reads at the start of every session, and it is exactly the kind of shared artifact that benefits from accumulated team knowledge. If your team has discovered that the AI consistently misunderstands your domain model’s naming conventions, that belongs in CLAUDE.md. If certain tasks require a specific framing to avoid hallucinated imports, that belongs there too.

A minimal CLAUDE.md section capturing discovered AI behavior might look like this:

## Known AI Interaction Patterns

When working on the `OrderProcessor` module, always specify that `Order.status`
uses the enum from `src/domain/types.ts`, not the string literals in the legacy
API layer. The AI will otherwise conflate the two and generate incorrect type assertions.

When asking for test coverage, specify "unit tests using Vitest with the
existing mock patterns in `__tests__/setup.ts`" to avoid generating
incompatible mock setups that require refactoring before they run.

The `PaymentGateway` abstraction wraps three providers. When the AI suggests
changes to this module, ask it to confirm which provider it is targeting before
writing code, as it frequently conflates provider-specific methods.

These entries are small, but they represent hours of debugging and experimentation compressed into a few sentences. Every developer who reads them gets that knowledge for free, and new team members inherit the team’s accumulated AI proficiency rather than starting from scratch.

The artifact is not limited to AI instruction files. GitHub Copilot and similar tools support repository-level configuration files with similar purposes. Prompt libraries, documented context templates for recurring task types, and updated onboarding guides all qualify as flywheel outputs. The choice of artifact depends on what the team actually uses; what matters is that the learning lands somewhere persistent and shared, not in someone’s personal notes or a Slack message that scrolls off within a week.

Running a Harvest Session

The practical question is how often to run these sessions and what to collect.

A weekly or bi-weekly cadence makes sense during the early adoption phase, when the team is discovering a lot and the signal-to-noise ratio of observations is high. Once AI workflows stabilize, the cadence can shift to whenever a significant new pattern emerges, or the sessions can be folded into an existing sprint retrospective as a standing agenda item.

The session itself should focus on three question types: what did the AI handle surprisingly well, what did it consistently get wrong, and what context or framing changes made a noticeable difference. The first question matters because teams often under-exploit AI capabilities in areas they have not thought to try. The second feeds directly into guardrails and corrective instructions in shared prompts. The third generates reusable patterns that reduce setup time for common task types.

Keeping these sessions short is important. Fifteen to twenty minutes, with a shared document where developers pre-populate their observations before the meeting, is more sustainable than an open-ended retrospective. If the harvest session becomes a lengthy discussion, it will get deprioritized when the team is under pressure, which is exactly when consistent process matters most.

The Compounding Effect

The flywheel framing captures what happens over time. The first few harvest sessions produce modest returns: a handful of CLAUDE.md entries, maybe a prompt template or two. But as those shared artifacts improve, the team’s baseline AI effectiveness goes up. Better baseline effectiveness means more ambitious use of AI tools, which generates richer learnings, which feeds back into better artifacts.

This is the compounding dynamic that makes the practice worth sustaining even when individual sessions produce small outputs. A team that runs harvest sessions consistently for six months will have accumulated a shared understanding of AI interaction patterns that would take a new developer weeks to acquire independently. That is a real advantage in teams where AI tool proficiency translates directly into throughput.

The comparison to documentation is instructive. Developers often resist writing documentation because the immediate return is low and the effort is real. Harvest sessions face the same resistance. The answer in both cases is that compounding returns only appear after consistent investment; the value is not in any single session, and measuring the practice by the output of its first few iterations misses the point.

The Architecture Decision Record practice follows the same logic. Individual ADRs often seem like overhead when written. Collectively, after a year, they become the institutional memory that makes refactoring and onboarding tractable. AI interaction patterns are a newer category of team knowledge, but they have the same compounding property.

Failure Modes

The most common failure mode is capturing observations without updating artifacts. Teams schedule the harvest session, have a productive discussion, and then no one actually edits the CLAUDE.md or the prompt library. Assigning artifact ownership explicitly, with a named person responsible for converting observations into updates before the next session, addresses this directly.

A subtler failure mode is over-generalizing from individual experience. One developer’s observation that the AI struggles with a particular module may reflect their prompting approach rather than an inherent AI limitation. Treating that observation as a team-wide rule can make shared prompts noisy, long, and counterproductive. The harvest session should involve brief discussion before committing observations to shared artifacts, not a direct transcription of individual reports.

There is also the problem of artifact staleness. Models improve, team codebases evolve, and interaction patterns that were accurate six months ago may no longer apply. Without a periodic review pass over shared AI guidance, CLAUDE.md files accumulate outdated instructions that mislead rather than help. The harvest process needs a corresponding pruning process; otherwise the artifact grows without being maintained.

Finally, the flywheel requires that the shared artifacts actually get read. A CLAUDE.md file that no one consults before starting an AI session provides no value. The practice only works if the team has established the habit of reviewing shared AI guidance at the start of relevant work. That habit is a prerequisite; the flywheel creates and sustains the artifacts, but not the discipline to use them.

Where This Fits in the Broader Picture

Garg’s series on reducing friction in AI-assisted development covers several complementary practices. The feedback flywheel is specifically about the knowledge retention layer. It assumes teams have already established basic AI workflows and accumulated some experience; the flywheel is what prevents that experience from remaining trapped in individual heads.

For teams in the early adoption phase, the most useful immediate step is not to wait until they have a lot to harvest. Starting the artifact early, even with minimal entries, establishes the habit and the structure. An empty CLAUDE.md with a few placeholder sections that the team knows to update is more valuable than a detailed one that exists only in the abstract.

The broader principle is familiar from any knowledge management practice: the workflow that captures knowledge at the moment it is generated is more reliable than the workflow that asks developers to recall and document retrospectively. The feedback flywheel is designed to make the capture moment explicit and regular, which is the minimum viable knowledge management structure for teams that want to get compounding returns from AI-assisted development rather than just individual productivity gains.

Was this interesting?