Why SQLite WAL Mode Works Across Docker Containers (and When It Doesn't)
Source: simonwillison
Simon Willison recently wrote up his findings on running SQLite in WAL mode across multiple Docker containers that share a volume. The short answer is that it works on a single host, and the reason why is worth understanding in detail, because the explanation also tells you exactly where it breaks down.
The Setup
The scenario is common enough: you have two or more services, each running in its own container, and both need access to the same SQLite database. You mount a named Docker volume at the same path in each container. Maybe one container runs your application and another runs a background worker, or you’re deploying a sidecar that reads analytics data. SQLite’s WAL mode is attractive here because it offers better read concurrency than the default journal mode. But WAL mode involves shared memory coordination between processes, and “shared memory” sounds incompatible with container isolation.
The reason it actually works comes down to how Linux handles memory-mapped files.
What WAL Mode Actually Creates
When you enable WAL mode on a SQLite database:
PRAGMA journal_mode=WAL;
SQLite creates two auxiliary files alongside your .db file. If your database is app.db, you get:
app.db-wal: the write-ahead log itself, where new writes are appended before being checkpointed into the main database fileapp.db-shm: the shared memory index, always exactly 32,768 bytes
The -shm file is the key piece. Every process that opens the database in WAL mode maps this file into its address space using mmap(). It acts as a coordination structure: a WAL index that tells readers which frames in the -wal file are valid, what sequence number they were written at, and what the checksum state is. Without a coherent view of this file across all processes, readers would not know how far into the WAL to read, and writers could not safely append new frames.
In a traditional single-process scenario this is obvious. In a multi-container scenario it raises a real question: does mmap() give two processes in different containers a coherent shared view of that file?
The Linux Page Cache Is the Answer
Linux maintains a unified page cache: a single in-memory representation of every file on a given filesystem, keyed by inode and block offset. When two processes call mmap() on the same file at the same offset, the kernel maps the same physical pages into both processes’ virtual address spaces. A write from one process is immediately visible to the other through those shared pages.
Docker containers on the same host do not have separate kernels. They share the host Linux kernel, and therefore they share the same page cache. Container isolation via namespaces and cgroups does not extend to separating the page cache. When both containers mount the same Docker volume (which is backed by a directory on the host filesystem), opening app.db-shm in either container refers to the same inode on the same filesystem. The mmap() calls in each container land on the same physical pages.
This is not an accident or a loophole. It is the same mechanism that allows two uncontainerized processes on the same host to share a SQLite WAL database, and containers do not change it.
The file locking side of the equation works the same way. SQLite uses POSIX fcntl advisory locks for coordination between concurrent readers and writers. These locks are per-open-file-description, and they operate at the kernel level. Two containers on the same host use the same kernel’s lock table, so fcntl locks taken in one container are correctly visible as conflicts to the other.
What a Docker Compose Setup Looks Like
services:
app:
image: myapp:latest
volumes:
- db_data:/data
worker:
image: myworker:latest
volumes:
- db_data:/data
volumes:
db_data:
Both containers mount db_data at /data. If /data/app.db is opened in WAL mode by app, the app.db-shm file appears there too. When worker opens the same database file, it maps the same -shm inode and participates in the same WAL coordination. From SQLite’s perspective, this is indistinguishable from two processes on the same host outside containers, because at the kernel level it is exactly that.
You can verify WAL mode is active from either container:
sqlite3 /data/app.db 'PRAGMA journal_mode;'
# wal
And confirm the auxiliary files are present:
ls -la /data/
# app.db
# app.db-shm
# app.db-wal
The Failure Modes
Understanding why this works on a single host makes the failure modes obvious.
Multiple writers. SQLite’s WAL mode allows one writer at a time. If two containers both attempt concurrent writes, SQLite serializes them via fcntl locks. This is fine, but under high write contention you will see SQLITE_BUSY errors. The default busy timeout is zero. Set it explicitly in each container’s connection:
PRAGMA busy_timeout = 5000;
This tells SQLite to retry for up to 5 seconds before returning SQLITE_BUSY, which handles transient contention without application-level retry logic.
Container crash during a write. If a container is killed mid-write with an incomplete WAL frame, the -wal file may be left in a partially written state. When the next process opens the database, SQLite will detect the incomplete frame via checksum verification and truncate the WAL to the last valid commit record. This recovery is correct but it does mean uncommitted transactions are lost, which is expected behavior. The risk is that if the -shm file is left in an inconsistent state (for instance, if the kernel had not yet flushed dirty pages from the crashed container), the recovery is less clean. In practice, because the page cache is shared and the crash is at the process level rather than the kernel level, the -shm pages are typically coherent.
WAL checkpoint accumulation. SQLite automatically checkpoints the WAL back into the main database file when the WAL reaches 1000 pages by default. If your containers start and stop frequently, and the long-running container is always the reader while the short-lived container is the writer, the reader’s open read transaction can prevent checkpointing. The WAL file will grow unbounded. Configure auto-checkpoint behavior explicitly:
PRAGMA wal_autocheckpoint = 100;
Or use the PRAGMA wal_checkpoint(TRUNCATE) command from a maintenance process to force a full checkpoint and truncate the WAL file.
Startup ordering. If container worker starts before container app has created and initialized the database, worker will either fail to open a nonexistent file or create an empty one. Add a health check or a startup dependency in your Compose file, or make both containers tolerant of a missing database at startup with proper retry logic.
Overlay filesystems. Docker’s default storage driver uses overlayfs for the container’s writable layer, but named volumes are bind-mounted directly from the host filesystem. The mmap and fcntl behavior described above applies to bind mounts. If you are mounting a path from the container’s own overlayfs layer (not a named volume), behavior may differ depending on the overlayfs version and kernel configuration. Named volumes are the safe path here.
Where It Breaks: Network Filesystems
Everything above depends on the shared page cache, which requires a shared kernel. Move to a distributed setup and the guarantee disappears.
If your Docker volume is backed by NFS, EFS, or any other network filesystem, two containers on different hosts accessing the same volume do not share a page cache. Each host maintains its own local cache of the remote file’s contents. mmap() writes in one host’s cache are not immediately visible on the other host. fcntl locks on NFS have historically been unreliable and are not coherent across hosts on all NFS implementations.
Running SQLite in WAL mode on an NFS-backed volume with multiple writers on multiple hosts is a data corruption path. The SQLite documentation explicitly warns against using SQLite on network filesystems at all. The WAL mode documentation goes further and notes that the shared memory file requires coherent shared memory between all database connections, which is not guaranteed over a network.
For multi-host setups, Litestream provides streaming replication of SQLite to S3-compatible storage, treating SQLite as a single-writer database and replicating asynchronously. LiteFS takes a different approach, using FUSE to intercept SQLite’s WAL transactions and replicate them to other nodes with a designated primary writer. Neither approach involves sharing a volume between multiple hosts; they both replicate the data at the transaction level.
Practical Takeaways
For a single-host Docker deployment with a named volume, WAL mode across containers is sound. The kernel makes it work. The things you need to manage are: busy timeout configuration on every connection, explicit checkpoint scheduling if you have long-running readers, and startup ordering so the database exists before secondary containers try to open it.
One writer and multiple readers is the easiest configuration to reason about. If your architecture allows it, designate one container as the sole writer and configure all others with PRAGMA query_only = ON. This eliminates write contention entirely and keeps WAL behavior straightforward.
For anything beyond a single host, treat SQLite as a single-node database and reach for a replication layer rather than a shared volume.