What the Kernel Actually Guarantees When You Run SQLite WAL Across Docker Containers
Source: simonwillison
Simon Willison recently documented his experience getting SQLite WAL mode to work correctly across multiple Docker containers sharing a volume. It works, under specific conditions, and the reason why comes down to some non-obvious behavior in how Linux handles memory-mapped files.
Most SQLite documentation talks about WAL mode in the context of a single process with multiple threads, or multiple processes on the same machine in a straightforward sense. The Docker case is interesting precisely because containers look like separate machines but are not. Understanding the distinction matters if you want to know when this setup is safe and when it will corrupt your database.
What WAL Mode Creates on Disk
When you enable WAL mode with PRAGMA journal_mode=WAL;, SQLite creates two additional files alongside your database. If your database is app.db, you get app.db-wal and app.db-shm.
The -wal file is the write-ahead log itself. Instead of modifying the main database file directly, SQLite appends change records (called frames) to the WAL. Readers check the WAL for recent frames that supersede what is in the main file. A background process called the checkpointer periodically merges WAL frames back into the main database.
The -shm file is the WAL index, also called the shared memory file. It is a compact data structure that tells readers which frames in the WAL are valid and committed, and maps database page numbers to their most recent WAL frame. Without this index, each reader would have to scan the entire WAL on every read, which would be unusably slow on a large WAL.
The name “-shm” is the clue to why this gets complicated. SQLite accesses this file by memory-mapping it. Every process that opens the database maps the same file into its address space, and they all read and write it directly through memory operations rather than through read()/write() syscalls. This is how SQLite achieves the low-overhead coordination that makes WAL mode fast.
Why Containers Are Not Separate Machines
Docker containers on the same host share one Linux kernel. That is the critical fact here. Process namespaces, network namespaces, filesystem namespaces: these isolate what processes can see, but they do not change how the kernel manages memory or file I/O.
When two processes in different containers both call mmap() on the same file on a local filesystem, the kernel maps the same physical pages into both virtual address spaces. This is not a coincidence or an implementation detail; it is the defined behavior of mmap() with MAP_SHARED on a file. Both processes are writing to and reading from the same physical memory, mediated by the kernel’s page cache. Changes made by one process are immediately visible to the other because they are literally the same memory pages.
This is precisely what SQLite needs for the -shm file to function correctly. The WAL index updates that one process writes are visible to other processes immediately, without any explicit synchronization call. The consistency guarantee comes from the unified page cache, not from any SQLite-specific mechanism.
The Locking Side
Beyond shared memory, WAL mode also uses file locks to coordinate access. SQLite uses POSIX byte-range locks (fcntl with F_SETLK) on the -shm file to implement a set of named locks: the write lock, the checkpoint lock, the recovery lock, and a set of reader locks (one per reader slot, up to the maximum concurrent reader limit).
POSIX advisory locks on Linux work at the kernel level and are scoped to the host. Two processes in different containers on the same host see each other’s locks correctly because they are going through the same kernel’s lock table. A lock taken by a process in container A is visible and respected by a process in container B.
So both halves of the coordination mechanism, the mmap’d shared memory and the byte-range locks, work correctly across Docker containers on the same host as long as the underlying filesystem is a local filesystem.
The Cases Where It Breaks
The guarantee evaporates the moment the shared volume is not a local filesystem. NFS is the most common failure case.
NFS does not guarantee coherent mmap() semantics across clients. Two clients that both map the same NFS-backed file may have separate, incoherent views of that memory. The SQLite documentation is explicit about this: WAL mode requires shared memory that is truly shared, and NFS does not provide it. The documentation recommends falling back to PRAGMA locking_mode=EXCLUSIVE if you must use a network filesystem, which serializes access entirely and removes the WAL benefit.
CIFS and SMB-mounted volumes have the same problem. Docker volumes that use remote storage backends may or may not provide coherent mmap semantics depending on the driver and the storage backend. If you are using a cloud-native volume plugin that presents network-attached storage, you need to check whether that storage supports coherent shared memory before trusting WAL mode.
Windows is a separate problem. Docker Desktop on Windows runs a Linux VM, and volumes are typically shared through a filesystem layer that introduces its own locking and coherency semantics. The behavior is less predictable, and WAL mode on shared Docker volumes on Windows is not something to trust without explicit testing.
A Concrete Setup That Works
For a standard Linux Docker deployment using named volumes or bind mounts to a local ext4 or xfs filesystem, the setup is straightforward:
-- Run once after connecting, in each container
PRAGMA journal_mode=WAL;
PRAGMA synchronous=NORMAL;
PRAGMA busy_timeout=5000;
The busy_timeout pragma is important here. With multiple containers potentially writing concurrently, you will occasionally hit lock contention. Without a busy timeout, SQLite returns SQLITE_BUSY immediately when it cannot acquire a lock. A timeout of 5000 milliseconds tells SQLite to retry for up to five seconds before giving up, which handles most real-world contention gracefully.
synchronous=NORMAL is a reasonable trade-off in WAL mode. The default FULL mode calls fsync() at each checkpoint; NORMAL skips some of those fsyncs. In WAL mode, a crash can lose at most the last committed transaction even with NORMAL, unlike in rollback journal mode where NORMAL can corrupt the database on a crash.
What WAL Mode Does Not Solve
WAL mode makes concurrent reads and writes possible without the readers blocking writers or vice versa, and it does this correctly across Docker containers on a local filesystem. What it does not do is make SQLite a horizontally scalable database.
All writes still serialize. WAL mode allows one writer at a time; a second writer trying to start a transaction will either wait (if you set a busy timeout) or get SQLITE_BUSY. If your workload involves multiple containers writing frequently to the same database, you will hit contention, and no amount of WAL configuration will fix that. The right answer at that point is a client-server database or a tool like Litestream to replicate a single-writer SQLite database to object storage with a separate read replica setup.
Litestream itself uses WAL mode and relies on the same filesystem semantics described here. It watches the WAL file for new frames and streams them to object storage before the checkpointer can remove them, enabling point-in-time recovery and replication. It works well in single-container setups; running it alongside another container writing to the same database requires careful coordination.
The libSQL and Turso Angle
If you find yourself wanting the SQLite programming model but with genuine multi-writer, multi-host concurrent access, libSQL, the fork of SQLite maintained by Turso, is worth examining. It extends the WAL mechanism with its own replication protocol, separating the WAL into a local and a remote layer. The remote WAL is coordinated through a dedicated server process rather than through shared files, which removes the host-colocation requirement.
This is a substantially different architecture than vanilla SQLite WAL mode, and it comes with its own trade-offs around latency and operational complexity. But it points to the direction the ecosystem is moving: SQLite’s embedded simplicity at the application level, with network-transparent durability underneath.
The Takeaway
SQLite WAL mode works across Docker containers sharing a local volume because Linux’s mmap coherency and POSIX lock semantics extend to container boundaries, which are namespace boundaries, not kernel boundaries. The setup is safe when the volume is backed by a local filesystem on the same host. It is unsafe on network filesystems, and you need to test explicitly on any storage that is not straightforwardly local disk.
The practical recommendation from Simon Willison’s exploration holds: for small to medium workloads where all your containers are on the same host, WAL mode on a shared volume is a reasonable and operationally simple choice. Know what guarantees you are relying on, know where those guarantees stop, and set a busy timeout so lock contention does not surface as opaque errors to your users.