SQLite WAL Mode in Docker: What the Shared Memory Contract Actually Requires
Source: simonwillison
Simon Willison recently explored what happens when you run multiple Docker containers pointing at the same SQLite database file through a shared volume, with WAL mode enabled. The short answer is that it can work, but understanding why requires looking at what SQLite’s WAL mode actually needs from the operating system.
How WAL Mode Works at the File Level
SQLite introduced Write-Ahead Logging in version 3.7.0, released in July 2010. Before that, the only durability mechanism was the rollback journal: before modifying any page, SQLite copied the original to a -journal file. WAL flips this around. Writes go into a separate -wal file first. The original database file stays untouched until a checkpoint moves WAL frames back into it.
This design gives WAL mode its signature concurrency property: readers never block writers, and writers never block readers. Multiple readers can scan the original database file simultaneously while a single writer appends to the WAL. To make this work safely, every connection needs a consistent view of which WAL frames have been committed and which haven’t. That coordination lives in a third file: the -shm file.
For a database at /data/app.db, WAL mode produces:
/data/app.db # main database
/data/app.db-wal # write-ahead log
/data/app.db-shm # wal-index (shared memory)
The -shm File Is Not Ordinary Storage
The -shm file is 32KB and contains the wal-index: a data structure tracking which WAL frames exist, which are committed, read lock slots for each active reader, and checkpoint state. SQLite does not treat this file like normal storage. Every connection opens it and immediately memory-maps it with mmap().
The contract this creates is strict: all connections to the same database must be mapping the same physical memory. The wal-index is not read off disk on each access; it is read directly from the mapped memory region. Writes to the wal-index by one connection must be immediately visible to all other connections without any disk I/O. This is only possible if those connections share the same physical memory pages.
On a single Linux host, this is guaranteed by the kernel page cache. When two processes call mmap() on the same file and the same offset, they receive virtual addresses that point to the same physical pages. Any write by one process is immediately visible to the other through those shared pages, with no filesystem round-trip involved.
SQLite also uses POSIX byte-range locks (fcntl with F_SETLK) on regions within the -shm file to coordinate read and write locks between connections. These locks are enforced per-process by the kernel. A writer acquires an exclusive lock on byte range 120-127; readers acquire shared locks on ranges 100-107. The kernel tracks these locks and enforces mutual exclusion.
What This Means for Docker
Docker containers running on the same Linux host share a kernel. They have separate filesystem namespaces, separate process namespaces, and separate network namespaces, but the kernel page cache is unified. When container A and container B both mount the same volume and mmap the same -shm file, they are mapping into the same kernel page cache entries. The same physical memory. The mmap contract holds.
The same is true for fcntl byte-range locks. The kernel tracks these locks against the underlying file’s inode, not against a container-local view of the file. Two containers accessing the same inode will contend on the same lock table entries.
This means that, on a single host with local volume storage, multiple Docker containers sharing an SQLite database in WAL mode should behave correctly. The kernel provides the shared memory semantics SQLite depends on.
A minimal Docker Compose setup that exercises this:
services:
writer:
image: python:3.12
volumes:
- dbdata:/data
command: python /scripts/writer.py
reader:
image: python:3.12
volumes:
- dbdata:/data
command: python /scripts/reader.py
volumes:
dbdata:
With WAL mode enabled in both containers:
import sqlite3
conn = sqlite3.connect('/data/app.db')
conn.execute('PRAGMA journal_mode=WAL')
conn.execute('PRAGMA synchronous=NORMAL')
The synchronous=NORMAL setting is worth noting: WAL mode is safe at this level because if the OS crashes, the WAL file is intact and SQLite can replay it. Only the WAL file needs to hit durable storage before a commit, not the entire database file.
Where This Breaks
The shared-memory contract has hard limits. SQLite’s own documentation explicitly warns that WAL mode cannot safely be used over network filesystems. NFS, CIFS, SMB, and similar protocols do not guarantee cache coherence for mmap. A write on one machine does not invalidate the mapped pages on another. The wal-index can become inconsistent, and database corruption follows.
This is the critical failure mode in container-heavy infrastructure:
- Docker Swarm or Kubernetes with shared NFS storage: containers run on different hosts, but they mount the same NFS-backed volume. WAL mode across these containers is unsafe. The kernel page cache on host A and host B are separate, and NFS does not bridge them for mmap.
- EFS on AWS, Azure Files, or Google Filestore: all network-backed filesystems. Same problem.
- GlusterFS, CephFS: distributed filesystems with varying mmap semantics. The safe assumption is that WAL mode will not work.
For single-host deployments using named volumes or bind mounts backed by local storage (ext4, xfs, btrfs), the kernel guarantees hold and WAL mode works.
There is also a subtler issue around permissions. The -shm file is created by the first connection to open the database in WAL mode. It is created with the umask of that process. If your containers run as different UIDs, the second container may fail to open the -shm file for writing, and SQLite will fall back to read-only mode or fail outright. The fix is consistent UID assignment across containers, or a shared group with group write permission on the data directory.
Checkpoint Behavior With Multiple Writers
WAL mode has a default autocheckpoint threshold of 1000 pages. When any connection’s write causes the WAL to exceed this threshold, SQLite automatically attempts a checkpoint, flushing WAL frames back into the main database file. In a multi-container setup, each container’s connection pool will independently try to checkpoint. This is fine: checkpointing is safe to run concurrently, and SQLite handles the locking correctly.
You can tune or disable autocheckpointing:
# Disable autocheckpoint, manage manually
conn.execute('PRAGMA wal_autocheckpoint=0')
# Run a checkpoint explicitly
conn.execute('PRAGMA wal_checkpoint(TRUNCATE)')
TRUNCATE mode resets the WAL file to zero length after checkpointing, which keeps the WAL from growing indefinitely. In a multi-container setup, only one container needs to run periodic checkpoints; others can leave it at the default.
The Pragmatic Boundary
For small-to-medium write loads, a single SQLite database in WAL mode accessed by two or three containers on the same host is a legitimate architecture. SQLite handles this correctly. The constraints are real hardware shared memory, local (non-network) storage, and consistent permissions.
Once the deployment spans multiple physical hosts, SQLite’s single-file architecture stops being a fit for write coordination. Litestream handles replication for read scaling; for write coordination across hosts, a database server is the right answer.
The interesting thing about the Docker case is how it exposes that SQLite’s WAL mode is fundamentally a kernel feature, not just a file format. It works across containers because the Linux kernel provides the abstraction that makes it work, and it fails across hosts because no network filesystem reliably provides that same abstraction. The Docker volume is just a path; what matters is whether the kernel underneath all the readers and writers is the same one.