· 6 min read ·

Git Is the Safety Contract That Makes Coding Agents Trustworthy

Source: simonwillison

There is a consistent pattern in how experienced developers talk about working with AI coding agents. They almost never talk about the model. They talk about the workflow around it. Specifically, they talk about Git.

Simon Willison’s agentic engineering patterns guide covers this directly. The advice is simple in principle: always run agents inside a clean git repository, commit before you start, and review the diff before you do anything else. Simple, but the implications run deep.

The reason Git matters so much here is not that agents are unreliable, though they can be. It’s that agents are fast and thorough in ways that make their mistakes difficult to spot without structural help. A skilled human developer touching ten files will leave traces: the problem statement in their head, the files they had open, the order in which they made changes. An agent has none of that context for you. It produces a finished state, not a process. Git is what gives you the process back.

Start Clean, Every Time

The most important single discipline is to establish a clean baseline before running an agent. That means:

git checkout -b agent/task-description
git status  # must be clean

If there are uncommitted changes, stash them or commit them first. This is not optional. A dirty working tree when an agent starts means the diff afterward is uninterpretable: you cannot tell what the agent changed versus what you had sitting around.

Aider, the command-line AI coding assistant, enforces this by default. It refuses to run if the working tree is dirty unless you pass --dirty-commits. This is good design. It makes the discipline structural rather than voluntary.

With a clean start and a dedicated branch, the review afterward becomes straightforward:

git diff main..agent/task-description  # everything the agent produced
git diff --stat HEAD                    # file-level summary

You are not trying to remember what changed. The diff tells you.

Aider’s Auto-Commit Philosophy

Aider’s default behavior is to commit after every successful file edit with an AI-generated commit message. This creates a granular audit log of every agent action. If an agent touches twelve files across seven commits, you can inspect each one individually, understand what it was doing, and reject specific commits without discarding the whole run.

The commit messages look like:

aider: Add input validation to the registration handler
aider: Update unit tests to cover the new validation cases
aider: Fix import ordering after linter complaint

You can configure this behavior:

# Turn off auto-commits and review manually
aider --no-auto-commits

# Or in .aider.conf.yml:
auto-commits: false

The built-in /undo command inside Aider’s REPL reverts the most recent commit it made. For anything further back, standard git operations apply:

git reset --hard HEAD~3   # discard last three agent commits
git revert <sha>          # revert a specific commit while preserving history

Claude Code takes the opposite stance: it produces changes but does not auto-commit, leaving the commit decision to you. This means you get more control by default, at the cost of needing to explicitly commit after reviewing. Neither approach is wrong. Auto-committing builds the audit trail automatically; manual committing forces a review gate before anything is recorded.

Git Worktrees for Running Agents in Parallel

The most underused git feature in this context is worktrees. The standard git worktree command lets you check out a second (or third, or fourth) branch into a separate directory on disk, sharing the same object store but maintaining independent working trees and indexes.

For agents, this solves a specific problem: you want to explore two different approaches to the same task simultaneously without them interfering with each other. Without worktrees, you need full repo clones. With worktrees, you get isolation without duplication:

# Create two isolated agent environments
git worktree add ../myrepo-approach-a -b agent/approach-a
git worktree add ../myrepo-approach-b -b agent/approach-b

# Run agents in each, concurrently in separate terminals
cd ../myrepo-approach-a
aider --model claude-3-5-sonnet-20241022 "Implement auth using JWT"

cd ../myrepo-approach-b
aider --model claude-3-5-sonnet-20241022 "Implement auth using session cookies"

# Compare results
git diff agent/approach-a..agent/approach-b

# Keep the better one, discard the other
git worktree remove ../myrepo-approach-b
git branch -D agent/approach-b

Git enforces that a given branch can only be checked out in one worktree at a time, which prevents accidental conflicts. Running git worktree list shows you all active worktrees and their associated branches.

The shared object store means no duplication of your git history. Only the working files differ between worktrees. For large repos, this is significantly cheaper than maintaining separate clones.

Reviewing Agent Diffs Without Burning Out

The constraint when working with agents is not the agent’s speed. It is your review bandwidth. An agent can produce 500 lines of changes in 30 seconds. Reading 500 lines of changes carefully takes much longer than 30 seconds. If you do not pace this, you will stop reviewing carefully, and you will eventually ship something you did not intend to ship.

Git’s interactive staging is the most reliable forcing function for careful review:

git add -p

This steps you through each hunk one at a time, asking whether to stage it. You cannot mark everything staged without having looked at it. It is slower than git add ., which is exactly the point.

When reviewing agent diffs, specific things warrant scrutiny:

  • Files outside the stated scope of the task. Agent scope creep is real.
  • Deleted tests or assertions. Agents sometimes remove a failing test rather than fixing the underlying issue.
  • New imports or dependencies. These expand your attack surface and may introduce license obligations.
  • Hardcoded values where a variable or config reference should be.
  • Changes to CI configuration, deployment scripts, or anything in an infrastructure directory that was not mentioned in the task.

The --word-diff flag in git is useful for reviewing prose-adjacent changes like documentation, error messages, or configuration files:

git diff --word-diff main..agent/branch

For large changesets, pushing to a remote branch and reviewing in the GitHub pull request UI can be more practical than terminal diffs:

git push -u origin agent/task-description
gh pr create --draft --title "agent: task description"

Draft PRs are good here because they make the changeset visible to your team without signaling it is ready to merge. CI runs automatically, which catches the class of errors agents introduce most often: type errors, missing imports, broken tests.

Git as Undo Log, Not Just History

The mental model worth building is that git is not just a record of intentional decisions when agents are involved. It is an undo log. The git reflog records every HEAD movement, not just commits. If an agent run goes badly and you reset hard, you can still find the pre-reset state:

git reflog          # see every HEAD position
git checkout HEAD@{7}  # recover state from seven moves ago

For longer agent runs, tagging checkpoints explicitly before each major instruction gives you named recovery points:

git tag checkpoint-before-db-migration
# run agent for the migration task
# if bad:
git reset --hard checkpoint-before-db-migration

git bisect applies to agent-introduced regressions in the same way it applies to any other regression. If the agent produced twelve commits and something broke, bisect identifies which commit:

git bisect start
git bisect bad HEAD
git bisect good main
# git checks out midpoints; run your tests at each
git bisect run npm test

The Principle Behind the Workflow

None of these techniques are specific to AI coding agents. They are standard git workflow. What changes when agents are involved is not the tools but the frequency and the stakes.

Agents move fast enough that the gap between “started a task” and “significant changes to the codebase” is very short. The git practices that careful human developers maintain sometimes through habit become load-bearing infrastructure when agents are in the loop.

Willison’s framing is useful here: the question is not whether to trust the agent. The question is whether your workflow makes the agent’s output inspectable and reversible regardless of whether you trust it. A clean branch, a readable diff, and a disciplined review process answer that question affirmatively without requiring you to adjudicate trust for every run.

The agent generates; git preserves the ability to decide what to keep.

Was this interesting?