· 7 min read ·

Git Is Your Control Surface When Coding Agents Do the Work

Source: simonwillison

Simon Willison recently published a guide on agentic engineering patterns that covers using Git with coding agents. The advice there is solid, but the topic deserves more concrete exploration of the mechanics involved, because most developers who reach for a coding agent do so without thinking through how Git fits into the loop at all. They end up either over-trusting the agent (no staging, no review) or under-trusting it (constant manual intervention that defeats the purpose).

The right mental model is that Git becomes your control surface. Not just version control in the passive sense, but the active interface through which you understand, approve, and selectively accept what the agent produced.

The stash-run-verify-commit lifecycle

When an autonomous agent is about to touch your codebase, the first thing to do is preserve whatever you were already working on. A simple stash before the agent run means you can always get back to your last known state:

git stash push -u -m "pre-agent-run-backup"

The -u flag includes untracked files, which matters because agents often create new files as part of their work, and you want those reversible too.

Once the agent finishes, you verify before you commit. This is the step most people skip. In my own bot’s autonomous improver, the sequence looks like this:

function stashWorkingTree(): boolean {
  const status = execSync("git status --porcelain", {
    cwd: PROJECT_ROOT, encoding: "utf-8"
  });
  if (!status.trim()) return false;
  execSync('git stash push -u -m "autonomous-improver-backup"', {
    cwd: PROJECT_ROOT
  });
  return true;
}

After the agent runs, you inspect what it touched:

const tracked = execSync("git diff --name-only", {
  cwd: PROJECT_ROOT, encoding: "utf-8"
});
const untracked = execSync("git ls-files --others --exclude-standard", {
  cwd: PROJECT_ROOT, encoding: "utf-8"
});

That gives you the complete list of files changed or created. You run your build and tests against those changes. If they pass, you commit with a descriptive message referencing the task. If they fail, you revert hard:

function revertAllChanges(): void {
  execSync("git checkout .", { cwd: PROJECT_ROOT });
  execSync("git clean -fd", { cwd: PROJECT_ROOT });
}

Then you pop the stash to restore your prior working state. The agent run leaves no trace except in your attempt log.

Why you should not let the agent commit

One of the most important constraints in any agent prompt is explicit: do not commit. The system prompt for my autonomous improver includes the line "5. Do NOT commit or push — only make file edits". This sounds obvious, but it matters for several reasons.

First, the agent does not know whether its changes actually work until after the build and tests run. If it commits before verification, you now have a broken commit in history, and cleaning that up is more work than just reverting the working tree.

Second, commit authorship and message quality matter. An agent writing its own commit messages will do so without full context of what the commit should say for a human reading the log six months later. You want the commit message to reflect the task intent, not just a description of the diff. Writing that message yourself, or parameterizing it from the task description, produces a cleaner history.

Third, keeping the commit step outside the agent means you retain the final say. The agent produces a candidate diff; you decide whether to accept it. This is the equivalent of staging changes before committing in a normal workflow, except the “author” is an AI system rather than your own keystrokes.

Git diff is your code review interface

With an agent doing the implementation, git diff replaces the mental overhead of reading every file the agent touched. Before you run the build, before you run tests, run a quick git diff --stat to understand the scope:

 src/commands/push.ts     |  12 +++
 src/utils/gitWatcher.ts  |  34 ++++++----
 src/mcp/tools/git.ts     | 187 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 221 insertions(+), 12 deletions(-)

If that stat shows 800 lines changed across 40 files for a task that should have been a small feature, something went wrong. You do not need to read all 800 lines to know the agent overcorrected. Revert immediately and re-prompt with tighter scope constraints.

For changes you do proceed with, a quick git diff before committing is the actual code review step. You are not reviewing for correctness in the sense of “does this logic do what I wanted” (that is what tests are for), but for scope, style, and accidental damage. Agents occasionally fix things that were not broken, refactor adjacent code that was out of scope, or leave debug instrumentation in place. A diff pass catches all of this in seconds.

git diff’s three-dot syntax is useful here too. If you created a branch before the agent run, git diff main...HEAD shows everything the agent produced relative to the base, which is the cleaner view when you are preparing a PR.

Worktrees for parallel agent work

Git worktrees are underused in the context of coding agents. A worktree lets you check out a branch into a separate directory without cloning the repo, sharing the same .git object store:

git worktree add ../ralph-agent-feature feature/new-command

Now you have C:\ralph on master and C:\ralph-agent-feature on feature/new-command, both pointing to the same underlying repository. You can run an agent in each directory simultaneously without any coordination issues, because they are on different branches. Objects are shared, so there is no disk space duplication for the history.

This is how Claude Code’s own isolation mode works under the hood. The worktree parameter in Claude Code’s Agent tool creates a temporary worktree, runs the subagent in that isolated copy, and then either merges the result or discards it. The parent agent continues in the original working directory uninterrupted.

For a Discord bot that does autonomous work like mine, this pattern would allow running two improvement tasks in parallel: one agent working on a user-facing command in a feature branch, another running tests or refactoring infrastructure in a different worktree, with neither blocking the other.

List and clean up worktrees with:

git worktree list
git worktree remove ../ralph-agent-feature

The atom-everything principle

Willison’s article emphasizes committing everything into the repository, including the prompts and context files that guide the agent. This is worth taking seriously. When an agent produces a bad result and you need to debug why, having the exact prompt that produced that result in the git history alongside the diff is invaluable.

For the autonomous improver, this means the task description from the kanban board is part of the commit message. The agent’s instructions live in a version-controlled file. The learning history for each task (what errors were produced on previous attempts) is stored in JSON files that are also in the repo. If something goes wrong, git log tells you what the agent was trying to do, what it was told, and what files it changed, in one place.

This is different from how most developers use version control. Normally commits document what changed; with agent-assisted development, commits need to document why, because the agent does not have opinions about the why in the way a human author does.

When rollback is not enough

Sometimes an agent run produces changes that partially work: some files are correct, others are wrong. A full git checkout . loses everything. Here the selective staging workflow becomes important:

# Stage only the files you want to keep
git add src/commands/push.ts
git add src/utils/gitWatcher.ts

# Revert the problematic file
git checkout src/mcp/tools/git.ts

# Commit only the good parts
git commit -m "agent: add push command (partial implementation)"

This is normal git usage, but it is easy to forget in the moment when you are dealing with an agent result. The instinct is usually to accept or reject the whole thing. Selective acceptance is often the right call, especially when the agent correctly solved part of a problem and introduced unnecessary changes elsewhere.

For interactive staging, git add -p lets you review hunks within a single file, which is useful when the agent made one correct fix inside a file that also contains scope-creep changes.

What this changes about the development loop

The practical shift when working with coding agents is that the inner loop changes. Instead of write-compile-test-commit, the loop becomes stash-prompt-verify-review-commit or stash-prompt-verify-revert. Git is not passive bookkeeping at the end of a session; it is the mechanism through which every agent interaction begins and ends.

The constraint this imposes is also the thing that makes agentic development feel controllable rather than chaotic. The agent can do whatever it wants to the working tree. You always have a clean escape path. git checkout . is a one-second undo for any amount of damage, and git stash pop restores your last working state on top of that. Knowing this makes it easier to let the agent work without micromanaging it mid-run, because the cost of a bad run is bounded.

The developers who get the most out of coding agents tend to be the ones who already had disciplined git habits. The branching, the small atomic commits, the frequent diffs, the meaningful messages: all of that pays forward when the author of your next commit is an AI system rather than yourself.

Was this interesting?