Git Commits as Checkpoints: How Coding Agents Make Their Work Recoverable
Source: simonwillison
When a coding agent makes twenty file edits in a session, the hard problem is not whether any individual edit was correct. It is whether you can return to any prior state without manually undoing each change, and whether the session leaves the repository in a comprehensible condition for the developers who come after.
Simon Willison’s guide on how coding agents work covers the loop and the tools. What the loop description does not dwell on is the persistence mechanism that makes agent sessions recoverable: the git commit. Not as a formality, but as an architectural choice that changes what an agentic workflow can safely attempt.
The Problem of Accumulated State
The agent loop generates changes quickly. A session fixing a bug might read six files, edit three, run tests twice, edit one file again, and run tests once more. In a typical development session this takes twenty minutes and produces a few deliberate changes. In an agent session it can take ninety seconds and produce the same volume of changes plus an uncertain number of corrections.
Without checkpoints, those corrections are invisible. The working tree shows the net result, but the intermediate states are gone. If the session converged on a correct fix, that is fine. If it converged on the wrong fix, or left the repository in a partially edited state after a context budget ran out, there is no clean way to revert without inspecting each changed file and manually undoing what looks wrong.
These are not hypothetical failure modes; they are predictable outcomes of any sufficiently long agent session. Agent sessions end prematurely when context windows fill. They produce subtly wrong fixes when the test suite does not cover the edge case the agent introduced. They make changes to files that were not supposed to be in scope because a search returned a false positive. In each case, the recovery path is significantly simpler if the agent left a clean trail.
The Commit-Per-Edit Pattern
Aider commits every edit it makes, tagged with the model that made it and a brief description of what changed. A session fixing a bug in an authentication module might produce a sequence like this:
f1c3a2b (aider) fix: update token expiry check in session.py
3e8d910 (aider) fix: add missing import in middleware.py
8c4ef12 (aider) fix: handle None case in verify_token
0d1b73a (aider) refactor: remove dead code path in session.py
Each is a snapshot of a working state. If the final result looks wrong, you can inspect the diff between any two commits to understand what the agent was doing at each step. If a specific commit introduced a regression, git bisect works exactly as it would with human-authored commits. This granularity is higher than a developer working manually would typically produce, and that is precisely the point. The agent iterates faster than a human does, and granular commits give you visibility into that iteration at native resolution rather than as a compressed summary.
Aider’s implementation stores the model name and a generated description in the commit message, which also makes the session’s decision history part of the repository’s permanent record. When you run git log --oneline on an Aider-authored branch, you get a readable summary of what the agent decided to do and in what order.
Working Against Clean State
The inverse of the commit pattern is equally important: coding agents should generally start from a clean working tree. If there are existing modifications to tracked files when the agent starts, those modifications mix with the agent’s changes in the working tree. It becomes impossible to distinguish what the human did from what the agent did without inspecting each file individually.
Both Aider and Claude Code enforce this as a starting condition. Aider will warn and refuse to run if the repository has unstaged changes unless explicitly told to proceed. Claude Code recommends operating from a clean working tree for the same reason. This is not just courtesy; it is a design constraint that makes the agent’s work auditable. The session starts clean, the agent makes changes, and a single git diff HEAD shows exactly what happened.
For automated workflows, this translates to always running agents from a fresh branch checked out from the current main. The branch preserves context about what the agent was asked to do; the clean base makes the diff unambiguous.
Multi-File Atomicity and Incomplete Sessions
A specific failure mode that git helps surface is the multi-file edit that completes some but not all of its changes. An agent refactoring a public API might update the implementation, then run into a context budget limit before updating all the callers. The working tree is now in an inconsistent state: the function signature changed in one file, but callers elsewhere still use the old signature.
With granular commits, you can see exactly where the session stopped. The commits for the implementation files exist; the planned caller-update commits do not. The incomplete work is visible from the git log rather than only discoverable from a compilation error or a code review catching broken callers.
The recovery is a git revert of the implementation change followed by a fresh session with a more targeted task scope. Without commits, the recovery requires manually inspecting every changed file to find the boundary between complete and incomplete work. The more files the agent touched, the more expensive that inspection becomes.
Review and the Squash Question
Teams using coding agents on shared repositories tend to encounter this as a pull request review problem. An agent-authored branch with a single commit is hard to review; the diff covers everything the agent did but the reasoning is opaque. An agent-authored branch with granular commits is reviewable because each commit captures a decision and its rationale.
Some teams squash agent commits before merge, producing one clean commit per logical change. Aider provides this as a configuration option. The squash preserves clean history on main while retaining the granular session history on the branch for review. This maps to how teams already handle incremental work-in-progress commits from human developers: keep the detail during review, consolidate before landing.
The question of whether to squash is a team preference, not a correctness issue. What matters is that the granular history exists somewhere before the decision to squash. Squashing without reviewing the individual commits is no better than not having them.
What This Changes About Agent Risk
The SWE-bench benchmark evaluates agents on whether the final state of the repository passes the relevant tests. It does not assess the quality of the path taken to get there. For production use, the path matters considerably more. A session that produces a correct fix through a chaotic sequence of incorrect intermediate states is harder to trust than one that progresses logically, and harder to maintain when the fix needs to be understood later.
Git commits are the simplest mechanism for making that path legible. They integrate with every existing review and deployment workflow, make the agent’s session history permanent and inspectable, and cost almost nothing to produce. The commit pattern is not about accounting; it is about creating a substrate on which the human review and oversight that comes after an agent session can actually function.
The engineering work in a coding agent workflow does not end when the tests pass. The agent’s changes have to be understood, reviewed, and merged by people who were not in the session. Granular commits are what make the agent’s work part of a sustainable development process rather than an opaque artifact that landed in the repository.