· 7 min read ·

Git Discipline for Coding Agents: Clean State, Atomic Commits, and the Auto-Commit Trade-off

Source: simonwillison

When I first started letting Claude Code write substantial chunks of code, my Git workflow got worse before it got better. Not because the code was bad, but because I had no consistent strategy for capturing what the agent had done. Sessions would run long, touch a dozen files, and leave me staring at an undifferentiated wall of changes without a clear way to accept some and reject others.

Simon Willison’s guide on using Git with coding agents crystallized something I’d been doing ad-hoc into a set of deliberate practices. The core insight is that when an agent writes code, Git stops being a journal of your decisions and becomes a control surface for reviewing the agent’s decisions. That shift requires a different set of habits.

The Clean State Rule

The first discipline is the one that unlocks everything else: always start an agent session from a clean working tree. Commit your own work-in-progress before invoking any agent.

git add -A && git commit -m "wip: checkpoint before agent session"

The reason matters more than the habit. After the agent runs, git diff HEAD shows exactly and only what the agent did. When you’re reviewing a 400-line diff, removing ambiguity about provenance is significant. “Is this change mine or the agent’s” is a question you don’t want to be asking 20 minutes into review.

The corollary is that git reset --hard HEAD becomes your abort mechanism. If the agent goes in the wrong direction, one command returns you to where you started with no residue.

Atom Everything

Willison calls the second pattern “atom everything”: commit after every discrete logical unit of agent work rather than accumulating changes into a single large commit at session end. What this gives you:

  • Fine-grained rollback via git reset --hard HEAD applied to a small change, not an entire session
  • git bisect-compatible history when an agent-authored change introduces a regression weeks later
  • A readable audit trail that maps to individual tasks rather than monolithic sessions

The resulting history looks deliberate:

[AI] add exponential backoff to token refresh
[AI] fix: null check before accessing session.user
[AI] refactor: extract validateToken into shared middleware
wip: checkpoint before agent session

Each commit is independently reviewable and independently revertable. That granularity serves you better than a clean-looking linear history would, because agent work rarely fails at session granularity. It fails at task granularity, and the commit graph should reflect that.

The Auto-Commit Spectrum

The most instructive design divergence across current coding agents is when they commit, and what that choice implies about where human review is expected to happen.

Aider auto-commits every accepted change by default with a generated message prefixed aider:. It refuses to run on a dirty repo unless you pass --no-auto-commits. The philosophy is: commit continuously, then review after the fact. aider --undo reverts the last Aider commit. The benefit is that granular commits happen automatically without requiring discipline from the user; the cost is that your review checkpoint comes after the history is already written.

Aider also builds a “repo map” from git ls-files to give the model a structural index of all tracked files without loading every file into context. This is a useful use of Git as a codebase discovery mechanism: the model gets architectural shape without token cost. The repo map feeds into which files Aider considers relevant to a task, so keeping your .gitignore accurate matters more than it used to.

Claude Code does not auto-commit by default. It reads git status, git diff, and git log autonomously to understand repository state, but treats committing as a human action. The review checkpoint is before the commit rather than after. You can configure auto-commit behavior via CLAUDE.md:

## Git Policy
After each discrete logical change:
1. Run the test suite
2. Commit with: `git commit -m "<type>: <what> [agent]"`
3. Never amend published commits or force-push shared branches

Claude Code’s safety protocol also explicitly avoids git add -A and git add . to prevent accidentally staging files outside the intended scope, and skips --no-verify to ensure hooks run regardless of who is committing.

Cursor moves the checkpoint earlier still: it shows diffs before writing changes to disk. You click Apply to accept. Git interaction is entirely in your hands after that. The review point is before the working tree changes, not before the commit.

GitHub Copilot Workspace moves the session to a cloud environment. The output is a pull request, not a local branch diff. Isolation is handled at infrastructure level rather than local Git primitives.

The spectrum from latest to earliest review checkpoint:

ToolHuman reviews at
AiderAfter commit (can undo)
Claude CodeBefore commit
CursorBefore disk write
Copilot WorkspaceBefore PR merge

None of these is strictly better. Earlier review gives you more control and catches problems before they enter history; later review gives you less interrupt-driven workflow and keeps the agent moving. The choice should match how much you trust the agent for the specific task, and how consequential a wrong output would be.

Worktrees for Parallel Sessions

Once you want to run multiple agent sessions concurrently on the same repository, a complication emerges: agents running in the same working directory clobber each other’s changes and read stale diffs. Cloning the repository twice wastes disk and loses the shared object graph.

git worktree, available since Git 2.5 (July 2015), solves this. Each worktree has its own directory, its own HEAD, and its own staging area, but shares the single .git object store:

git worktree add ../myproject-auth feature/auth-refactor
git worktree add ../myproject-api feature/new-endpoints
git worktree add ../myproject-fix fix/memory-leak

git worktree list
# /home/user/myproject          abc1234  [main]
# /home/user/myproject-auth     def5678  [feature/auth-refactor]
# /home/user/myproject-api      ghi9012  [feature/new-endpoints]
# /home/user/myproject-fix      jkl3456  [fix/memory-leak]

For a 2 GB repository with 50 MB of source files, a full clone costs 2 GB. A worktree costs roughly 50 MB, because only the working tree files are duplicated, not the history.

The hard constraint is that a branch can only be checked out in one worktree at a time. Git refuses to let two worktrees share a branch. Plan your branch names accordingly when launching parallel sessions.

Combined with direnv, each worktree can have its own environment variables, database URLs, and API keys, which matters when agents are making database writes or calling external services. Claude Code includes an EnterWorktree built-in tool, which reflects how commonly this pattern comes up in practice.

When a session doesn’t pan out, cleanup is cheap:

git worktree remove /tmp/agent-session
git branch -D agent/session

Commit Messages as Future Context

One underappreciated consequence of agent-heavy workflows is that commit messages become inputs to future agent sessions, not just records for human readers. Tools like Aider and Claude Code read recent git log as context when starting a session. A message like fix bug tells a future agent nothing useful. A message like fix null pointer in auth middleware when session token is expired via third-party refresh gives a future agent enough to avoid retreading the same ground.

The same principle applies to encoding the original task in the commit body:

git commit -m "Add token refresh with backoff

Task: the auth token refresh logic fails silently on 429 responses
from the token endpoint. Add exponential backoff with jitter, cap
at 32 seconds, abort after 5 attempts."

This is more work at commit time, but when you run a session weeks later and the agent asks why a function does exponential backoff, the answer is in the history.

For auditability, particularly on teams, labelling agent commits helps:

git commit --trailer "Co-Authored-By: Claude <noreply@anthropic.com>" -m "implement rate limiting"

This makes agent-authored commits filterable with git log --grep and supports any compliance process that needs to distinguish human-reviewed code from agent-generated code.

Pre-Commit Hooks as the Non-Negotiable Layer

Instructions in CLAUDE.md or a system prompt are advisory. Pre-commit hooks run on every commit regardless of whether a human or agent triggered it, making them the appropriate place for non-negotiable enforcement.

#!/bin/bash
git secrets --scan       # block secret leakage
npx tsc --noEmit         # type errors as a commit gate
npm run lint --silent

Claude Code’s hooks system operates at a tighter level than pre-commit. PreToolUse hooks fire before tool calls; PostToolUse hooks fire after. A PostToolUse hook can run linting after every file write and inject the results back into the agent’s context, meaning the agent sees lint errors before it attempts to commit. That feedback loop is meaningfully tighter than catching failures at commit time.

When a pre-commit hook fails, the commit did not happen. The correct response is to fix the issue and create a new commit, not to amend, which would modify the previous successful commit.

The Practical Starting Point

If you are doing none of this: start with clean state before invocation and git diff HEAD as the primary review step. Those two habits change the experience meaningfully and cost almost nothing to adopt. Atomic commits, worktrees, commit message discipline, and hook enforcement can be layered in incrementally as sessions get longer and the stakes get higher.

The broader pattern Willison points to is that Git’s existing primitives map well onto agentic workflows because Git was designed for distributed, asynchronous collaboration between parties who do not share a working directory. Agents have exactly that collaboration model with their human operators. The tooling fits, which means the main work is building the habits to use it deliberately.

Was this interesting?