· 6 min read ·

Building a SQLite Language Server That Gets the Dialect Right

Source: lobsters

SQLite runs in more places than almost any other piece of software. It is in every Android and iOS device, in Firefox, Chrome, Python’s standard library, Node.js, and Electron, and the SQLite project estimates it is deployed on over one trillion devices. Its developer tooling has not kept pace with its reach: the primary interface is the sqlite3 CLI, which is functional but syntax-blind, and the third-party GUIs largely treat SQL as text to execute rather than code to analyze.

Syntaqlite is a project by Lalit Maganti, who works on Perfetto, Google’s open-source system profiling tool, which uses SQLite’s virtual table interface as its query engine. The project’s goal is to give SQLite the same quality of editor integration that PostgreSQL users have had for years: schema-aware completions, inline diagnostics, and query plan visualization. The interesting question is not what it does but how it approaches the problem, because the architectural decision at the core is the same lesson the C++ language server ecosystem learned after years of failed reimplementations.

Why Generic SQL Tools Fail at SQLite

The problem with generic SQL tooling for SQLite is that SQLite’s SQL dialect is not generic SQL. It has specific behaviors that a tool unaware of SQLite’s internals will model incorrectly.

Type affinity is the most subtle example. SQLite has five type affinities: TEXT, NUMERIC, INTEGER, REAL, and BLOB. A column declared INTEGER does not enforce integer storage in non-STRICT tables; you can insert the string 'hello' and SQLite will accept it, storing it as text. A column declared TEXT will coerce numeric literals to text on the way in. The full coercion ruleset is documented in the SQLite specification and is both consistent and counterintuitive. STRICT tables, added in SQLite 3.37.0, enforce actual type constraints and change which queries are valid. A tool that does not know whether a given CREATE TABLE used the STRICT keyword will give wrong diagnostics in both cases.

Virtual tables add another layer. FTS5 full-text search, the rtree spatial index, and user-defined virtual table implementations all present column type information that is not recoverable from a standard schema query; it is determined by the module implementation at runtime. The upsert syntax uses ON CONFLICT DO UPDATE SET ... = excluded.column_name, which differs from PostgreSQL’s and MySQL’s equivalents in ways that matter for autocompletion. Window functions, added in 3.25.0, follow the SQL standard but with SQLite-specific grammar limits. The json_each() and json_tree() table-valued functions behave differently from how other databases handle JSON path queries.

Existing generic SQL language servers such as sqls or sqlfluff approximate these behaviors at best. They can provide table name completions and basic syntax checking, but they have no model of SQLite’s type system, no understanding of which tables are STRICT, and no ability to resolve the column types exposed by a virtual table. The approximations look reasonable on simple queries and fail silently on anything that exercises the dialect’s edges.

Schema Awareness Is Simpler Than It Looks

One place where SQLite’s design makes the devtools problem easier than expected is schema introspection. Every SQLite database stores its schema in sqlite_schema (formerly sqlite_master), a table with columns for type, name, tbl_name, rootpage, and sql. The sql column contains the original CREATE statement verbatim.

SELECT name, sql FROM sqlite_schema WHERE type IN ('table', 'view');

This query gives a devtools layer everything it needs to seed completions for table and column names, derive column types, and understand constraint structure. There is no connection string to configure, no separate schema export, no schema registry to synchronize. The database file is self-describing. A language server that maintains a live connection to the database file gets schema updates automatically as migrations run.

This is a meaningful difference from PostgreSQL, where schema-aware tooling requires establishing a connection, querying pg_catalog, and handling connection permissions. SQLite’s schema access is a local file read, which simplifies the setup path considerably for editor integration.

EXPLAIN QUERY PLAN After 3.31.0

SQLite has had EXPLAIN QUERY PLAN since the 3.0 era, but its output was a single text column with an ad-hoc format that required string parsing to extract meaningful structure. Starting with SQLite 3.31.0, the output became a proper virtual table with columns id, parent, notused, and detail, where id and parent encode a tree structure.

EXPLAIN QUERY PLAN
SELECT u.name, COUNT(p.id)
FROM users u
JOIN posts p ON p.user_id = u.id
WHERE u.created_at > '2025-01-01'
GROUP BY u.id;

The parent column lets you reconstruct the full plan tree, and each node’s detail describes the operation: scan type, index name when one is used, filter conditions. Building a visualization on top of this is tree traversal, not regex parsing. PostgreSQL users have had tools like PEV2 for years, which parse EXPLAIN (FORMAT JSON) output and render annotated query plan trees with cost estimates and scan type annotations. SQLite’s equivalent machinery has been in place since 3.31.0, but almost no existing SQLite GUI exposes it as a navigable visualization. A high-fidelity devtools layer can surface this as a rendered tree inside the editor, flagging full table scans or missing indexes on frequently-filtered columns before a query ever runs in production.

The Case for Using the Real Parser

The architectural decision that distinguishes syntaqlite from a reimplemented parser is that it calls into the actual SQLite C library rather than maintaining an independent dialect model. The SQLite amalgamation, the single-file distribution used for embedding, is roughly 200,000 lines of C and exposes enough internal API surface to support this approach.

The analogy is clangd, the C and C++ language server maintained by the LLVM project. Earlier C++ language servers, including cquery and rtags, maintained independent parsers that understood enough C++ to provide completions and diagnostics. They worked until they did not; every new language standard, every compiler extension, every edge case in template instantiation could produce results that the LSP model did not handle correctly. clangd solved this by running the actual Clang frontend, inheriting the compiler’s full understanding of the language rather than maintaining an approximation.

SQLite’s test suite runs approximately 92 million test cases. A devtools layer built on the real amalgamation inherits that correctness for parsing and type resolution. A reimplemented parser has to track every SQLite release that adds new syntax, and will lag. Using the real library also means the tool cannot silently accept syntax that SQLite itself would reject, which is exactly the failure mode that makes low-fidelity tooling counterproductive: diagnostics that pass in the editor but fail at runtime train developers to distrust the tool.

The Production Context That Makes This Urgent

The case for high-fidelity SQLite devtools depends on SQLite being taken seriously as a production runtime, and that shift is well underway. Litestream provides streaming replication to S3-compatible storage, adding point-in-time recovery. Turso, built on libSQL, extends SQLite with server mode and HTTP access for multi-tenant workloads. Cloudflare D1 runs SQLite at the edge. Electric SQL uses SQLite as the local-first database in sync-based architectures. When production infrastructure runs on SQLite, the developer tooling gap has real costs.

Maganti’s starting point is Perfetto, where writing complex analytical SQL against large trace datasets representing hours of system activity across multiple processes is routine work. The absence of schema-aware completions and query plan visualization is a daily problem that compounds with query complexity. That same use case is now common across the industry, in applications that choose SQLite for its embedded simplicity and then discover they need the same quality of tooling they had with Postgres.

The broader PostgreSQL LSP ecosystem, particularly postgres_lsp, provides a reference point for what schema-aware SQL tooling looks like when it works well. SQLite’s dialect complexity and the production ecosystem growing around it make an equivalent investment worthwhile. Syntaqlite’s approach, building on the real SQLite parser rather than approximating it, is the right foundation for that work.

Was this interesting?