· 7 min read ·

Btrfs Snapshots Are the Safety Net That LLM System Configuration Needs

Source: lobsters

Will Morrison’s experiment in letting Claude Code configure his Arch Linux install has generated a lot of discussion about training cutoffs, mental model ownership, and the fundamental gap between text manipulation and system state. These are real concerns worth engaging with seriously. There is one more concern that receives less attention: what the recovery path actually looks like when an LLM-generated configuration goes wrong, and whether that path has to be as painful as it currently is.

The risk profile of agentic system configuration is not symmetric. When Claude Code produces a correct GPU driver setup or a working audio stack, you get a functional system faster than you would have otherwise. When it produces an incorrect one, you may get a system that fails to boot, renders nothing, or works silently wrong until the next kernel update exposes the underlying issue. On most Arch Linux installs, the recovery path from a bad configuration is fully manual: boot from installation media, mount your filesystems, arch-chroot in, diagnose from scratch, and reconstruct whatever the agent changed. That process is not prohibitively hard if you already understand your system well. It is exactly the kind of work that the user who asked Claude Code to configure things in the first place was hoping to avoid.

What Btrfs Copy-on-Write Snapshots Actually Give You

Btrfs snapshots are cheap in a way that matters for this use case. Creating a snapshot is copy-on-write: the operation itself takes a fraction of a second regardless of how much data is on the filesystem, because it creates a reference to the current state rather than copying data. A snapshot of a 50GB Arch root partition consumes nearly zero additional space at creation time. Space consumption accumulates only as the active filesystem and the snapshot diverge through subsequent writes. Taking a snapshot before running a risky operation costs almost nothing.

Snapper is the management layer that makes this operationally useful. It handles automated snapshot creation on configurable schedules, retention policies for cleaning up old snapshots, and provides a clean interface for listing, comparing, and undoing changes between snapshot pairs. A basic snapper configuration for a root subvolume looks like this:

# Create a snapper config for the root subvolume
snapper -c root create-config /

# Edit /etc/snapper/configs/root to set retention:
# TIMELINE_MIN_AGE="1800"
# TIMELINE_LIMIT_HOURLY="5"
# TIMELINE_LIMIT_DAILY="7"
# TIMELINE_LIMIT_MONTHLY="0"
# TIMELINE_LIMIT_YEARLY="0"

systemctl enable --now snapper-timeline.timer
systemctl enable --now snapper-cleanup.timer

Timeline snapshots are useful for the general case. The more targeted feature for LLM-assisted configuration is the snap-pac integration.

snap-pac: Automatic Checkpoints Around Every pacman Operation

snap-pac installs pacman hooks that create paired pre and post snapshots around every package operation. Before pacman -S, pacman -R, or pacman -Syu executes, the current filesystem state is snapshotted. After the operation completes, a second snapshot captures the result. The pair is labeled and linked:

pacman -S snap-pac

# After running: pacman -S nvidia nvidia-utils lib32-nvidia-utils

snapper -c root list
# Type   | Pre # | Date                     | Description
# pre    |       | Mon 16 Mar 2026 14:23:01 | pacman -S nvidia nvidia-utils lib32-nvidia-utils
# post   | 42    | Mon 16 Mar 2026 14:23:47 | pacman -S nvidia nvidia-utils lib32-nvidia-utils

If the resulting configuration breaks your boot, the pre-snapshot is there. Undoing the package installation at the filesystem level is:

snapper -c root undochange 42..43

This does not run pacman -R; it reverts the actual filesystem changes, which includes both the installed package files and any configuration changes Claude Code made during that session. The pacman database and the filesystem return to their pre-operation states together.

For changes Claude Code makes without going through pacman, you can create a manual snapshot before starting an agent session:

snapper -c root create --description "pre-claude-session" --cleanup-algorithm number

And reference that snapshot number when you want to review or revert what the session changed.

Booting Into a Snapshot When the System Won’t Start

The gap in the above approach is the failure mode where the bad configuration prevents booting at all. If Claude Code generates a broken mkinitcpio.conf and runs mkinitcpio -P, or installs a conflicting kernel module that produces a black screen after the GRUB handoff, the snapper undochange command is not accessible from a non-booting system.

grub-btrfs addresses this by adding all available Btrfs snapshots to the GRUB boot menu automatically. Installing it and enabling its daemon means that every snapshot snapper creates (including the snap-pac pre-snapshots) appears as a bootable entry in GRUB:

pacman -S grub-btrfs
systemctl enable --now grub-btrfsd

The daemon watches for new snapshots and regenerates the GRUB configuration when they appear. If your system fails to boot after a Claude Code session, you restart, select the pre-session snapshot from GRUB’s snapshot submenu, boot into the known-good state, and either investigate or roll back permanently with snapper rollback.

This setup has one prerequisite that catches people: your root subvolume needs to be mounted by name in /etc/fstab (e.g., subvol=@) rather than by subvolume ID. Mounting by ID means that after a rollback the wrong subvolume remains active. The ArchWiki snapper article covers the correct subvolume layout, with @ for root and @home for home, mounted explicitly.

The NixOS Comparison From This Angle

NixOS handles the same problem through a mechanism baked into the operating system model rather than layered on top of it. Every nixos-rebuild switch creates a new system generation. GRUB lists all generations. nixos-rebuild switch --rollback reverts to the previous one. The mechanism is automatic, the generations are complete reproducible derivations, and a configuration that fails to build never modifies the running system at all.

Btrfs plus snapper plus grub-btrfs produces something functionally similar on Arch. The rollback is not as clean conceptually, because Arch configurations are imperative rather than declarative, and a Btrfs snapshot captures filesystem state rather than a reproducible build specification. But the operational safety property is comparable: changes are reversible, and reverting does not require live installation media or manual filesystem surgery.

This is also why NixOS has attracted more serious experimentation with autonomous LLM configuration than equivalent Arch tooling. The failure mode on NixOS is a failed build with a diagnostic, and the running system is unchanged. The failure mode on Arch without snapshots is a broken system and a manual recovery. The same LLM configuration attempt carries meaningfully different consequences depending on which substrate it runs against.

For Non-Btrfs Systems

Timeshift provides equivalent functionality for Arch installs on ext4 or XFS using rsync for full system snapshots. The tradeoffs are different: rsync snapshots are slower to create and consume more space than Btrfs copy-on-write snapshots, but the functional guarantee is the same. A known-good state exists before a risky operation, and reverting to it is a structured process rather than manual reconstruction.

Timeshift also supports Btrfs if you are running it, using the same snapshot mechanism as snapper but through a graphical interface. For headless or server installs, snapper is generally the more appropriate tool.

What Changes About the Risk Calculation

Setting up snapshot infrastructure before starting an LLM-assisted configuration session changes the nature of the requests you can reasonably make. Without rollback, asking Claude Code to configure an NVIDIA card for Wayland is a high-stakes operation. A mistake in the mkinitcpio hooks, the bootloader kernel parameters, or the DRM modesetting flags produces a system that may not boot, with no clean path back to a working state. With snap-pac in place and grub-btrfs configured, the same operation becomes an experiment with a defined exit: if it works, the pre-snapshot is obsolete and gets cleaned up by the retention policy; if it fails to boot, you select the pre-snapshot from GRUB and the system is restored.

This does not resolve the other problems that LLM system configuration raises. The mental model problem persists: when Claude Code makes dozens of configuration decisions on your behalf and the session ends, the reasoning behind those decisions exists nowhere except as side effects on disk, and you still need to understand your system well enough to maintain it through future updates. The training cutoff problem persists: the ArchWiki article on NVIDIA, WirePlumber, or Hyprland that Claude Code draws on may describe configuration that was accurate when the model was trained and has since been superseded.

What snapshot infrastructure addresses is the specific asymmetry where easy mistakes are hard to reverse. That asymmetry is not intrinsic to the problem. It is a consequence of running on a filesystem without copy-on-write semantics and without the tooling to leverage them. Btrfs with snapper, snap-pac, and grub-btrfs exists precisely to close that gap, and it was designed for exactly the scenario where package operations and manual configuration changes need to be reversible without live media. Pairing it with LLM-assisted system configuration is not a stretch; it is the obvious mitigation for the failure mode that makes autonomous system configuration feel riskier than it needs to be.

The more interesting implication is about what this suggests for agentic tooling design. An agent with explicit awareness of snapshot state as part of its observation loop, that checkpoints before destructive operations and can roll back on verification failure, is a substantially more capable tool than one that issues commands and reads stdout. The infrastructure for that already exists in the Arch ecosystem, assembled from snapper, snap-pac, and grub-btrfs. A tool designed for this use case from the start would just integrate that loop directly rather than leaving it as a prerequisite the user has to set up themselves.

Was this interesting?