The SQLite Shared Memory File That Breaks Docker Volume Sharing
Source: simonwillison
SQLite in WAL mode ships three files, not one. Most people know this in a vague way. What trips them up in Docker environments is not the WAL file itself but the third one, the -shm file, which behaves less like a database file and more like a raw chunk of shared memory that happens to live on disk.
Simon Willison wrote about this recently in the context of running multiple containers that share a single Docker volume. The short version of the problem: containers that mount the same volume and both open the same SQLite database in WAL mode can silently diverge or corrupt the database, depending on how the underlying filesystem handles memory-mapped I/O. The long version is worth understanding, because the fix that actually works is not obvious and the reason it works requires knowing what the -shm file is doing in the first place.
What WAL Mode Changes
The default SQLite journaling mode, sometimes called DELETE mode, writes a rollback journal before modifying any page. If the transaction fails, the journal replays the original pages. Writers block readers and readers can block writers. It works, it is simple, and it uses only POSIX advisory locks on the main database file to coordinate between processes.
WAL mode flips the write model. Instead of modifying pages in-place, writers append changed pages to a separate write-ahead log file. The main database file stays untouched during a transaction. Readers read from the main file and consult the WAL for any pages that have been updated since their read transaction began. A checkpoint operation eventually writes WAL pages back into the main database file.
This gives you non-blocking concurrent reads during writes, which is why people reach for it. For applications with high read load and occasional writes, the difference is significant.
PRAGMA journal_mode = WAL;
Once set, this persists in the database header. Every subsequent connection will open in WAL mode until you explicitly change it back.
The Three Files
When mydb.sqlite is opened in WAL mode, two additional files appear on disk:
mydb.sqlite-wal: The write-ahead log. Changed pages get appended here. This file grows until a checkpoint writes it back to the main database.mydb.sqlite-shm: The shared memory index. This is a 32 KB file containing aWalIndexstructure that all database connections use to coordinate which frames in the WAL are valid and visible to each reader.
The -wal file is straightforward: it is a log of page writes, and SQLite can reconstruct correct state from it during recovery. The -shm file is different. SQLite memory-maps it using mmap() with MAP_SHARED. Every process that opens the database calls mmap() on the same file, and the expectation is that all those processes share the same physical memory pages. Writes by one process are immediately visible to every other process. No explicit IPC, no disk reads between writes and reads. It is shared memory that happens to be backed by a file.
From the SQLite WAL documentation: “The -shm file exists only to be used as a block of shared memory. Its disk content is irrelevant.” SQLite may zero the file on open and rebuild it from scratch, which tells you everything about how it treats the file: the bytes on disk are not the point.
Why Linux Local Bind Mounts Work
When two processes on the same Linux host both call mmap(MAP_SHARED) on the same file, the kernel maps both virtual address ranges to the same physical page frames in the page cache. A write by process A at virtual address 0x7f000000 modifies a physical page, and process B’s mapping at its own virtual address sees that modification immediately. There is no disk I/O involved. The kernel guarantees coherence.
Docker containers on Linux with local bind mounts work the same way. Each container is a namespace, but they share the host kernel’s page cache. When container A and container B both bind-mount /host/data:/data and both open /data/mydb.sqlite-shm with mmap(MAP_SHARED), the kernel maps both to the same physical pages. The WAL coordination works correctly.
This is why you can run the same Docker setup on a Linux host without hitting problems, even though the configuration is technically unsupported by SQLite’s own documentation. The kernel makes it work despite the lack of explicit guarantees.
Why macOS Docker Desktop Breaks
Docker Desktop on macOS runs all containers inside a Linux virtual machine. Your host’s filesystem is exposed to that VM through a filesystem layer: historically osxfs, now gRPC-FUSE or VirtioFS depending on your Docker Desktop version.
These are not local POSIX filesystems from the VM’s perspective. They are FUSE-based or paravirtualized filesystems that tunnel file operations from the VM to the macOS host. When container A calls mmap(MAP_SHARED) on the -shm file, it is not mapping physical page frames that container B can see. Each container gets its own private copy of the file’s contents in the VM’s memory. Writes by one container are not visible to the other without going through the filesystem layer, which does not behave like a coherent shared memory segment.
The result is that both containers build independent views of the WAL index. They disagree on which WAL frames are valid. You get SQLITE_BUSY errors, SQLITE_PROTOCOL errors, or silent data corruption depending on the timing and access pattern. This is not a Docker bug and it is not a macOS bug. It is the WAL mode’s fundamental requirement for mmap(MAP_SHARED) coherence colliding with a filesystem that cannot provide it.
Why NFS and Network Volumes Also Break
The SQLite documentation states this plainly: WAL mode does not work on a network filesystem. NFS does not guarantee mmap(MAP_SHARED) coherence across separate hosts. The NFS client cache can serve stale memory contents without invalidating the other client’s mapped view. AWS EFS, being NFS-backed, has the same problem. Any Docker volume plugin that sits between the container and physical storage via a network protocol should be treated as incompatible with WAL mode.
For NFS and EFS, journal_mode=DELETE is the correct fallback. It uses file-level advisory locks instead of shared memory, and those locks work across network filesystems with far fewer caveats.
The Fixes
locking_mode = EXCLUSIVE
This is the cleanest fix when you can architect around single-writer access:
PRAGMA locking_mode = EXCLUSIVE;
PRAGMA journal_mode = WAL;
In exclusive locking mode, SQLite acquires a file-level lock at open and holds it for the lifetime of the connection. Because no other process can open the database concurrently, there is no need for cross-process shared memory coordination. SQLite skips the -shm file entirely and uses in-process memory for the WAL index instead. The three-file WAL system collapses to a two-file system where the coordination problem disappears.
The tradeoff is that only one container can hold the database open at a time. For many small-scale deployments this is acceptable: one container owns the database and exposes it over an API or a Unix socket. Other containers talk to that interface rather than touching the file directly.
journal_mode = DELETE
Switching back to rollback journal mode eliminates the -shm file entirely:
PRAGMA journal_mode = DELETE;
This uses POSIX advisory locks on the main database file for coordination. Those locks work across processes, across Docker containers on the same host, and across many NFS configurations, though NFS locking has its own reliability history. The performance characteristics change: concurrent reads no longer proceed during writes, which may or may not matter for your workload. For low-write-volume applications, the difference is often negligible.
Changing from WAL back to DELETE requires that no other connections are open when you issue the pragma, since the mode is stored in the database header.
LiteFS
For cases where you need true multi-node SQLite replication across containers or hosts, LiteFS by Fly.io is the purpose-built solution. It is a FUSE-based distributed filesystem that intercepts SQLite transactions at the filesystem layer and replicates them. It handles WAL semantics correctly by design, because it understands the SQLite file format rather than treating the files as opaque byte streams. LiteFS is more infrastructure than most small projects need, but it is the right tool when you are running SQLite across actual distributed nodes rather than multiple containers on one host.
The Underlying Pattern
The WAL mode -shm file stays invisible until your deployment topology changes. On a developer’s Linux machine, multiple processes sharing a SQLite database in WAL mode work correctly. In Docker on macOS, the same setup silently fails or corrupts. The difference is not in the SQLite code or the Docker code; it is in whether the filesystem layer supports the mmap(MAP_SHARED) coherence that the WAL coordination mechanism requires.
Willison’s writeup covers this from the practical angle of someone who runs Datasette in containers and hit the problem firsthand. The fix he settles on is sensible: constrain the architecture so only one process touches the database file directly, either by switching to exclusive locking mode or by routing all database access through a single container.
Understanding why the fix works makes it easier to evaluate whether it applies to your situation. The -shm file is not a cache or an optimization that can be degraded gracefully. It is the entire inter-process coordination mechanism for WAL mode, and if your filesystem cannot provide coherent shared memory, WAL mode cannot function correctly regardless of how the filesystem appears to behave for ordinary reads and writes.