· 6 min read ·

What Building a Nix Type Checker From Scratch Actually Requires

Source: lobsters

The Nix expression language has resisted static typing for two decades, and the resistance is not primarily theoretical. There are genuine type-theoretic challenges, well-documented by now: the with statement that injects entire attribute sets into scope, the callPackage pattern that introspects function argument names at runtime, dynamic attribute access via set.${key}. But the deeper obstacle for a practical type checker is not solving those cases perfectly. It is building a system that can check Nix code incrementally, interactively, and correctly enough to be useful in an editor, where files are always half-written and evaluation contexts are perpetually incomplete.

Johns.codes has a post walking through the experience of building both a type checker and a Language Server Protocol implementation for Nix from scratch. The two components are technically separable, but building them together forces the kind of architectural discipline that a standalone type checker can defer indefinitely.

What an LSP Actually Demands

The Language Server Protocol standardizes the communication between editors and language analysis tools. An editor sends JSON-RPC notifications when files open, change, and close. The server responds with diagnostics, completions, hover information, go-to-definition targets, and more. The protocol itself is straightforward. What is hard is satisfying its latency expectations.

A user typing in VS Code expects hover information to appear in under 100 milliseconds. Completion candidates should arrive faster than that. Diagnostics can be deferred a few hundred milliseconds, but not seconds. These constraints mean a Nix LSP cannot afford to re-parse and re-type-check the entire file, let alone the entire import graph, on every keystroke.

The solution used by virtually every production language server is incremental computation: track which parts of the analysis depend on which inputs, and recompute only what changed. Rust Analyzer uses Salsa, a demand-driven incremental computation framework. The Kotlin compiler has its own incremental system. Clangd uses a preamble precompilation scheme for headers. Each approach is different, but the structural requirement is the same: computation must be decomposed into cacheable units with explicit dependency tracking.

For a Nix type checker, the dependency graph is non-trivial. A Nix file can import other Nix files with import, producing a dependency between the types in one file and the type results in another. Nixpkgs itself has a deep import graph, though most practical Nix projects are far shallower. An incremental system needs to handle at minimum: re-parsing on change, re-inferring only the changed scope, and re-checking importers when their dependencies’ types change.

The Parser Layer Has to Work on Broken Input

Before type checking can begin, parsing must succeed on input that is continuously broken during editing. When a user types let x = and pauses, the file is syntactically incomplete. A parser built for batch processing will fail and return nothing useful. An LSP-grade parser needs to produce a partial, error-recovering syntax tree that preserves as much structure as possible around the error site.

Tree-sitter has become the standard solution for this layer. It produces concrete syntax trees with error nodes for malformed regions, handles incremental re-parsing efficiently, and has a mature Nix grammar maintained by the nix-community organization. An LSP built on tree-sitter gets error recovery and incremental re-parsing for little integration cost.

The difference between a concrete syntax tree from tree-sitter and an abstract syntax tree suitable for type checking is significant. The CST preserves whitespace, comments, and syntactic trivia. Type checking works on the AST. The LSP needs both: the CST for mapping cursor positions back to source ranges (necessary for hover and completion spans), and the AST for analysis. Keeping these two representations in sync through the full edit-parse-lower-check pipeline is one of the less glamorous but time-consuming parts of building any language server.

Type Inference Choices for Nix

Once you have a parse tree and an incremental computation layer, the type inference algorithm becomes the central design question. Hindley-Milner is the natural starting point for a functional language, but its failure modes on Nix are well-known. The with statement requires reasoning about the shape of values at the point of scope injection. The callPackage pattern requires introspecting function argument names. Neither fits cleanly into HM without extensions that add significant complexity, and an academic treatment of HM applied to Nix confirms that these three features, with, dynamic attribute access, and builtins.functionArgs, are the consistent failure points.

A more pragmatic approach is bidirectional type checking combined with structural record types. Bidirectional typing, developed formally by Dunfield and Krishnaswami, separates inference mode from checking mode. When the expected type is known from context, say, a function argument with an annotation, the checker uses that information directly. When no expected type is available, it infers. This reduces the need for full global inference and degrades gracefully when parts of the program are unannotated.

Row polymorphism handles open attribute sets more directly than HM. A function that accepts { stdenv, lib, ... } in Nix is polymorphic over the remaining fields. In a row-polymorphic type system, this function has a type roughly like { stdenv: Stdenv, lib: Lib | ρ } -> Drv where ρ is a row variable ranging over additional fields. Nickel, Tweag’s Nix-adjacent configuration language, uses row polymorphism for exactly this reason. Adopting row polymorphism in a Nix type checker means the core inference engine is more complex than HM, but it handles the dominant nixpkgs idiom correctly.

Dynamic attribute access, set.${expr}, is a dependent type problem: the type of the result depends on the runtime value of expr. No sound, practical type system handles this fully without programmer annotation. The pragmatic choices are to return a conservative any-typed result, reject the expression statically, or require a refinement on the key expression. Most production type checkers choose the first and document the limitation. A Nix type checker that rejected dynamic attribute access would fail on substantial portions of real nixpkgs code.

How This Differs From Existing Tools

The Nix ecosystem already has two main LSP implementations. nil, by oxalica, is a lightweight server that provides go-to-definition, basic completion, and simple local type inference without claiming soundness. It is fast and widely deployed. nixd takes a different approach, integrating with the actual Nix evaluator to provide evaluation-driven intelligence: because it can partially evaluate Nix expressions, it delivers more accurate completion for nixpkgs attribute paths and module option values.

Both tools make a deliberate trade-off: they do not attempt to build a sound type system. nil focuses on editor integration with lightweight analysis. nixd focuses on evaluation-driven accuracy for the common case of nixpkgs-heavy NixOS configuration. Neither is trying to be a type checker in the sense of reporting type errors at call sites.

A ground-up type checker, the kind described at johns.codes, occupies a different point in the design space: actual type inference with error reporting at the type level. This matters for catching errors that the Nix evaluator only surfaces lazily, long after the mismatched value was constructed. A function that expects { pname, version, src } but receives a set missing src will produce an error when mkDerivation eventually touches src, not when the wrong set was passed. A type checker catches this at the call site, where the fix is obvious.

The Value of Building Both Together

Building the type checker and the LSP together, rather than building a standalone type checker and wrapping it afterward, forces certain good architectural decisions early. Source positions must be tracked through every transformation from source text to CST to AST to typed IR, because hover requires mapping cursor positions back through all those layers. Incremental invalidation must be correct from the start, because a server that incorrectly caches stale types will produce silently wrong hover information.

A standalone type checker is the simpler project. You parse a file, infer types, report errors, exit. The LSP forces you to solve the harder problems: correctness under partial input, correctness under incremental update, and correctness of source location mapping through a pipeline of transformations. These are the problems that determine whether a type checker is practically useful or only useful in batch CI pipelines.

The JavaScript and Python ecosystems both went through this arc. TypeScript was not just a type checker bolted onto a JS parser; it was a full language service from the beginning, which is why it integrated smoothly into editors from early on. Mypy started as a batch checker and retrofitted the daemon mode and incremental support over several major versions. The lesson is that designing for the LSP use case from the start produces a different, usually better, architecture than adding it later.

Nix tooling has been functional but incomplete for years. nil and nixd cover the common cases well. A type checker designed from the ground up for incremental, sound-ish analysis would fill a gap neither tool addresses, and the engineering decisions involved in building it are at least as interesting as the type theory.

Was this interesting?