Shared Files Are Not Shared Memory: The SQLite WAL Problem Across Docker Containers
Source: simonwillison
Running two Docker containers against the same SQLite database over a shared volume looks reasonable on paper. You mount the same directory into both, both containers open the database, and because SQLite is designed for concurrent access, you expect things to work. With WAL mode enabled, things will appear to work, right up until they don’t.
Simon Willison documented this failure case after running into it in practice. The root cause is not a Docker bug or a SQLite bug in isolation. It is a contract mismatch: SQLite’s WAL mode requires shared memory between cooperating processes, and Docker volumes provide shared filesystem access but not shared memory.
What WAL Mode Actually Creates
When you run PRAGMA journal_mode=WAL on a SQLite database, you are opting into a three-file model. The main database file (database.db) holds committed data. The write-ahead log (database.db-wal) accumulates new writes as sequentially appended frames. The shared memory file (database.db-shm) acts as a WAL index that all readers and writers use to coordinate without blocking each other.
The .db-shm file is the piece most people ignore and the piece that matters most here. Its name is accurate: it exists to be memory-mapped by every process that opens the database. The WAL index lives in this shared memory and contains a header with metadata about the WAL file, a set of hash tables that map database page numbers to WAL frame positions, and a lock region with 8 byte-range locks that coordinate readers, writers, checkpointers, and recovery processes.
SQLite’s WAL documentation is explicit about the design: the SHM file is intended to be mapped into each process’s address space so that all cooperating processes share the same in-memory data structure. A writer updates the hash tables in shared memory after appending frames to the WAL file. Readers consult those same in-memory hash tables to find the most recent version of any given page. The whole system works because mmap on Linux maps the same physical memory pages into multiple virtual address spaces when all processes open the same file on the same filesystem.
Where Docker Breaks the Contract
On a single host, two processes opening the same file get the same inode. When both mmap that file, the kernel backs both mappings with the same physical page frames. Writes from one process are immediately visible to the other without any disk I/O. This is the shared memory that SQLite WAL depends on.
Docker containers on the same host share the host kernel. A bind-mounted volume in two different containers points to the same underlying filesystem path. So far, so good: both containers open the same inode, and in theory, mmap should give them the same physical pages.
The problem is that Docker’s container isolation includes separate mount namespaces, and depending on how the volume is configured, the filesystem seen inside each container may go through different kernel paths. More critically, containers often run as different UIDs, with different cgroup constraints, and with different expectations about file ownership of the SHM file. The SHM file gets created by whichever container first opens the database in WAL mode. If the second container runs as a different user, it may not be able to open the existing SHM file with write access, causing SQLite to fall back to private anonymous memory for its WAL index.
This is the silent failure mode. SQLite does not error out. It continues operating, but each container now has its own private WAL index that does not reflect what the other container is writing. Container A appends WAL frames that container B cannot see in its index. Container B reads stale pages from the main database, unaware that newer committed versions exist in the WAL. Both containers believe they are reading consistent data.
The Locking Side of the Problem
SQLite uses fcntl byte-range locks on the SHM file to coordinate concurrent access. These are POSIX advisory locks, which are associated with a (process, inode, byte-range) tuple in the kernel. Unlike flock locks, fcntl locks are released when any file descriptor in the process that holds them is closed, and they are not inherited across fork.
Across containers on the same host, fcntl locks on a shared-volume file do propagate correctly because they are kernel-level constructs keyed on the inode. Two processes in different containers can coordinate via these locks. This might make the situation seem safer than it is: the locking works, but the shared memory mapping may not, giving you correct mutual exclusion on top of incorrect data.
On network-backed volumes, the situation is worse. NFS and many networked filesystems do not implement fcntl locking reliably. SQLite’s own documentation warns that WAL mode should not be used on network filesystems. If you are using Docker volumes backed by EFS, NFS, or similar, you lose both the shared memory guarantee and the locking guarantee simultaneously.
Concrete Failure Scenarios
Consider a web application container that writes user session data and a background job container that reads and processes those sessions. Both run against the same SQLite file in WAL mode on a local volume. The web app writes a new session record. The WAL frame is appended to the WAL file and the WAL index in the web app’s mmap’d SHM is updated. The background container reads the sessions table, but its private WAL index does not know about the new frame. It reads from the main database, which does not yet contain the committed write because the WAL has not been checkpointed. The background job processes a stale snapshot.
Checkpointing makes the failure non-deterministic. When the web app container eventually checkpoints (folding WAL frames back into the main database file and truncating the WAL), the background container suddenly starts seeing the committed writes on its next read. From the outside, this looks like intermittent data visibility delays, which are among the hardest class of bugs to reproduce and diagnose.
Practical Mitigations
The cleanest solution is to avoid WAL mode when multiple Docker containers access the same database file. The default rollback journal mode (PRAGMA journal_mode=DELETE) uses a single journal file and coarser locking semantics that do not depend on shared memory. Concurrent read performance is worse because readers block during writes, but the correctness guarantees hold across process boundaries without requiring shared memory.
PRAGMA journal_mode=DELETE;
PRAGMA synchronous=NORMAL;
If you need WAL mode’s concurrency properties, the safest pattern is a single-writer architecture: one container owns the database with read-write access, and all other containers access it through an application-layer API rather than opening the file directly. This is what tools like Datasette enable when run as a read-only query layer in front of a database managed by a separate write process.
For workloads that genuinely need both WAL mode and cross-container durability, Litestream runs as a sidecar that streams WAL frames to an object store like S3. This gives you point-in-time recovery and replication without running multiple writers against the same file. Each container still gets one primary writer; replication happens asynchronously through the sidecar.
rqlite and Turso/libSQL represent more aggressive alternatives for distributed workloads. rqlite uses Raft consensus to replicate SQLite state across nodes. libSQL forks SQLite to add network replication primitives. Both are substantially more complex to operate than a vanilla SQLite file, but they are designed from the ground up for multi-process writes.
When WAL Mode Is Still Worth Using
None of this means WAL mode is wrong to use in Docker environments. It is wrong to use across containers that share the same database file simultaneously. Single-container deployments with WAL mode are fine and beneficial: you get concurrent readers alongside a writer within the same process, reduced write latency compared to the rollback journal, and more predictable checkpoint timing when you control it explicitly with PRAGMA wal_checkpoint(TRUNCATE).
The broader lesson is that SQLite’s design encodes a specific set of assumptions about process locality. It was built to be the storage layer for a single application on a single device, and it does that job remarkably well. WAL mode pushes the concurrency envelope within that constraint. Docker volumes create an illusion of colocation that breaks the assumption without announcing that it has done so.
Shared files are not shared memory. That distinction is easy to forget when both look like the same path on disk.