· 2 min read ·

A Quiet Bug in SQLite's WAL Reset Logic

Source: lobsters

SQLite is one of those pieces of software you eventually trust completely, which makes it all the more striking when a corruption-class bug surfaces in its documentation. The SQLite team recently documented a WAL-reset bug that can corrupt a database under specific conditions tied to how the write-ahead log is reset after a checkpoint.

How WAL Mode Works

In WAL (Write-Ahead Logging) mode, SQLite does not modify the database file directly when writes occur. Instead, changes are appended to a separate WAL file. Readers look at the WAL for newer page versions; writers never block readers. Periodically, a checkpoint operation copies those WAL frames back into the main database file.

After a full checkpoint, the WAL file can be reset. Reset here means the WAL is effectively emptied, and new writes start from the beginning of the file. The reset is coordinated through the WAL index, a shared-memory file that all connections use to track which frames are valid and where the write position is.

The Bug

The corruption arises in the reset path. When the WAL is reset, the write-ahead log header is rewritten with a new salt value. Readers use those salts to verify that frames they read belong to the current generation of the WAL, not a previous one. The bug is a race: under certain timing conditions involving multiple connections, a reader can see a partially-reset WAL state and interpret stale frames as valid current data.

The specific scenario requires concurrent database access where one connection is checkpointing and resetting the WAL while another is actively reading. The window is narrow, but the consequence is not: if a reader pulls in stale page data and acts on it, the resulting state can diverge from what is actually on disk.

Why This Is Easy to Miss

Most SQLite deployments in practice either use WAL mode with a single writer and occasional readers, or run in environments where the OS serializes enough that the race never fires. Test suites rarely cover the exact interleaving required. The bug likely exists in production systems that have never triggered it, which is its own uncomfortable thought.

SQLite’s codebase is extraordinarily well-tested by their own test suite and by broad deployment, so when a bug like this does surface and get documented, it reflects both the difficulty of concurrent systems bugs and the team’s commitment to transparency. The fact that they document it openly on the WAL reference page rather than burying it in a changelog is worth noting.

Mitigation

The immediate practical advice is to update to the patched version of SQLite once a fix is available, and to check whether your application uses WAL mode with concurrent readers during checkpointing. If you manage a database with high read concurrency in WAL mode, this is worth reviewing.

For most embedded use cases, the risk is low. For server-side applications that open SQLite databases from multiple threads or processes in WAL mode, it deserves attention. The SQLite team documents the full conditions on the WAL documentation page, and reading through it is a good exercise in understanding the checkpoint lifecycle regardless of whether you’re directly affected.

Was this interesting?