· 6 min read ·

The Storage Layer You're Probably Overbuilding

Source: hackernews

The reflexive move when starting a new project is to add a database. Not just any database, but usually a full client-server system: PostgreSQL in a container, a connection pool, a migration runner, and a half-dozen environment variables before you’ve written a single domain model. This is the default stack, and for a lot of projects it’s the wrong one. The original article puts the question plainly, but the interesting part isn’t the “no” answer. It’s understanding the spectrum between a flat JSON file and a replicated Postgres cluster, and knowing precisely where on that spectrum your application’s access patterns actually land.

The cost isn’t storage, it’s the operational model

A client-server database like PostgreSQL imposes a particular operational model. You have a server process that must be running and reachable before your application can do anything useful. You have connection limits, so you need a pooler like PgBouncer once you have more than a handful of concurrent clients. You have a migration workflow that must be coordinated with deployments. You have backups that require either a managed service or careful configuration of pg_dump, WAL archiving, or both. None of this is insurmountable, and for applications that genuinely need what Postgres offers, it’s worth every bit of that overhead.

But the overhead exists regardless of whether you’re storing 500 rows or 500 million. Spinning up Postgres for a personal tool, a small internal dashboard, or a Discord bot that tracks a few hundred users means you’re paying the full operational cost for something that could live in a single file.

What flat files are actually good for

Before dismissing flat files as naïve, it’s worth being precise about where they work well. Configuration is the obvious case: JSON, TOML, and YAML files are a storage layer. They’re human-readable, diffable, version-controlled alongside your code, and require zero infrastructure. If your “data” is configuration that changes rarely and gets read frequently, a file is strictly better than a database row.

Content is another case. This blog is built on Markdown files in a directory. There’s no table of blog posts, no slug column, no published_at timestamp stored in rows. The filesystem is the database, and mtime is the timestamp. Static site generators have proven this model at scale: Cloudflare’s own blog, GitHub’s documentation, and thousands of production sites run this way. The access pattern, reading a bounded set of files at build time, fits the storage layer.

Small lookup tables deserve the same consideration. A list of supported languages, a mapping of country codes to names, a set of feature flags for a small team: these can live in a JSON file loaded at startup and kept in memory. The operational cost is zero, the latency is zero, and the data rarely changes.

The failure mode for flat files is concurrent writes. Two processes writing to the same file without coordination will corrupt it. If you need to update data from multiple concurrent writers, you’ve immediately outgrown this layer.

SQLite is production infrastructure, not a toy

The most underused option in the spectrum is SQLite, and underused is an understatement. SQLite is the most widely deployed database engine in existence: it runs in every Android device, every iPhone, every Firefox and Chrome installation, in the firmware of aircraft, and in countless embedded systems. It is not a toy. What it lacks is a server process, which is exactly what makes it interesting.

SQLite is a library that your application links against. The database is a single file on disk. There’s no connection string, no port, no pooler. You open the file and query it. The operational model is the same as reading any other file your application owns.

For write performance, SQLite in WAL (Write-Ahead Log) mode allows concurrent readers while a single writer is active. On NVMe storage, it can sustain tens of thousands of write transactions per second. For the overwhelming majority of applications, this is more than sufficient.

What SQLite doesn’t do: it doesn’t support multiple writers from separate processes without coordination, it doesn’t scale horizontally across machines, and its full-text search is functional but not optimized for the same workloads as PostgreSQL’s tsvector or Elasticsearch. If you need any of those things, you’ve genuinely outgrown it.

The ecosystem around SQLite has grown significantly. Litestream provides continuous streaming replication to S3-compatible storage, solving the backup problem with a separate process that shadows all writes. Turso is a distributed SQLite service built on libSQL, a fork maintained by the Turso team, which adds replication and a serverless access model while keeping the SQLite query interface. LiteFS from Fly.io provides FUSE-based replication for SQLite in containerized environments. The “SQLite isn’t suitable for production” argument has less ground to stand on than it did even three years ago.

For embedded analytics, DuckDB is worth mentioning separately. It’s an embedded OLAP database, column-oriented and optimized for aggregation queries over large datasets. Where SQLite is a row store suited to transactional workloads, DuckDB can scan millions of rows for analytical queries in milliseconds on a laptop. The two are complementary: SQLite for your application’s operational data, DuckDB for ad-hoc analysis over exports or log files.

The key-value middle ground

Between a document-store flat file and a full SQL database sits the key-value store. For a Discord bot, this is often the natural fit: per-user settings indexed by user ID, per-guild configuration keyed by guild ID, command cooldowns keyed by a composite of user and command. These access patterns are point lookups. You rarely need to join, aggregate, or query by anything other than the primary key.

Libraries like bbolt (the Go key-value store underlying early versions of etcd), LevelDB, and its more write-optimized descendant RocksDB provide embedded key-value storage with ordered iteration and atomic writes, without the SQL layer. For JavaScript and TypeScript, packages like level wrap LevelDB with a clean async API. The operational model is the same as SQLite: a file on disk, no server process.

For more structured data that doesn’t need full relational querying, SQLite with a single table and a JSON column often provides the best of both worlds: the flexibility of a document store with the tooling and reliability of SQLite.

When you actually need Postgres

None of this means Postgres is the wrong answer. There are conditions that genuinely require it.

Multiple writers from separate processes or machines is the clearest one. If your application runs more than one instance, they need to write to a shared data store. SQLite’s single-writer model doesn’t fit, and flat files certainly don’t.

Complex relational queries are another. When you’re joining across three tables, filtering on indexed columns, and running aggregations in a single query, SQLite handles this, but Postgres’s query planner is more sophisticated, its index types are more varied (partial indexes, GIN indexes for full-text and JSONB, BRIN indexes for time-series), and its statistics are more detailed. For complex workloads, the planner quality matters.

Full-text search at scale tilts toward Postgres or a dedicated search engine. SQLite’s FTS5 extension is capable, but Postgres’s tsvector with ranked retrieval, or purpose-built systems like Meilisearch or Typesense, are meaningfully better for search-heavy applications.

And replication requirements: if you need read replicas, point-in-time recovery with granular retention, or logical replication for change data capture, the Postgres ecosystem is mature and well-understood. Litestream covers many SQLite replication needs, but the Postgres tooling is deeper.

Making the call

The decision reduces to a few concrete questions. How many concurrent writers will there be? If the answer is one process, SQLite covers you. What are your query patterns? Point lookups and simple filters are the same cost everywhere; complex joins and aggregations reward a more capable query planner. What’s your deployment model? A single server or container is different from a horizontally scaled fleet. And what’s the maintenance burden you’re willing to accept?

For a significant portion of software, the right answer is SQLite. For configuration and content, flat files often win outright. Reaching for Postgres is justified when the application’s requirements genuinely match what Postgres provides. The problem isn’t Postgres, it’s using it by default before checking whether the default is warranted.

Was this interesting?