What It Actually Takes to Type-Check Nix

Nix has two mature language servers and zero type checkers. This is not an oversight; it reflects genuine structural difficulty. John Sterrett’s writeup on building both a type checker and an LSP for Nix is worth reading precisely because it confronts that difficulty honestly, mapping out what bidirectional inference and row polymorphism can cover and where the approach hits a hard ceiling.

The Existing Landscape

The current tooling splits into two tools with opposite trade-offs. nil, written in Rust by oxalica, does fast scope-level analysis: goto-definition, find-references, rename across files, dead code detection on unused let bindings. It uses demand-driven incremental computation modeled on rust-analyzer’s Salsa framework. It works without a Nix installation, stays well under 100 MB of memory, and starts instantaneously. It does not do type inference. nixd takes the opposite approach, embedding libnixexpr and running the actual Nix evaluator to provide accurate, evaluation-backed completions for NixOS options and home-manager modules. When configured against a nixpkgs path, it knows which packages exist because it evaluated them. Initial startup can take minutes and multiple gigabytes of RAM.

Neither tool tells you that you passed a string where a derivation was expected. That gap exists for reasons that go beyond implementation effort into type-theoretic limits specific to how Nix is designed.

Why Nix Resists Static Typing

Nix’s expression language was designed by Eelco Dolstra for correctness-by-construction in package management, not for static analyzability. Several of its core idioms resist type inference in ways that illuminate broader truths about type system design.

Lazy evaluation is the first complication. Nix is call-by-need: every binding creates a thunk evaluated only on demand. A type checker operating without evaluation must reason structurally about what thunks will produce when forced, tracking types through arbitrary chains of function application across files without running anything.

The with expression is harder. with pkgs; [ git vim curl ] injects the entire attribute set of pkgs into lexical scope. Standard Hindley-Milner inference depends on well-defined lexical scope, where every name resolves to a specific binding site. With with, that guarantee disappears. A name in scope might come from the with target or from an outer binding. Statically resolving which requires knowing the complete shape of the with target, which for pkgs means knowing the complete shape of nixpkgs, tens of thousands of attributes. TypeScript solved JavaScript’s with by banning it in strict mode. That option does not exist for Nix, because with lib; is idiomatic in nearly every NixOS module written.

The Sterrett project handles with through a taint system: identifiers that might originate from a with-injected scope are marked with uncertain provenance rather than failing analysis entirely. When the with target is a statically knowable attrset, resolution proceeds normally; otherwise the identifier’s type falls back to Unknown. The consequence is that NixOS module code, which is saturated with with lib;, produces a lot of Unknown in the type inference output.

The Type Theory Choices

Row polymorphism is the decision that makes the tractable fragment tractable. Nix’s attribute sets are open by default. A function signature { stdenv, lib, ... }: ... accepts any attrset that has at least stdenv and lib fields. This cannot be expressed in plain Hindley-Milner; you need row variables. The type checker assigns signatures like { x : Int, y : Int | ρ } -> Drv where ρ is a row variable ranging over any additional fields. The // merge operator becomes a type-level operation on row variables. The theory works cleanly for statically known shapes; it becomes imprecise when operands are function arguments with unknown shapes.

Bidirectional type checking, following Dunfield and Krishnaswami’s work on sound and complete bidirectional typing, separates inference mode from checking mode. When the type checker already knows the expected type from context, it uses that information directly rather than inferring from scratch. This reduces the need for full global inference and makes the algorithm’s behavior predictable. For Nix it matters because function bodies often have expected return types derivable from how they are used in callPackage patterns, even when the function signature carries no annotation.

The builtins are handled as a bootstrap layer. All roughly one hundred Nix builtins are assigned typed signatures explicitly, for example builtins.map : (a -> b) -> [a] -> [b] and builtins.attrValues : { ..r } -> [v]. These serve the same role as TypeScript’s lib.d.ts, giving the inference engine a typed foundation to build on.

The Hard Ceiling: callPackage

builtins.functionArgs ({ stdenv, fetchurl, lib }: stdenv.mkDerivation { ... }) returns { stdenv = false; fetchurl = false; lib = false; }. It extracts parameter names as runtime strings. callPackage uses those strings to look up values in the package set.

This is dependency injection through runtime structural reflection on a function’s argument names. No mainstream static type system models this correctly. TypeScript can extract argument types via Parameters<typeof f>, but it cannot produce the string literal "stdenv" from a parameter named stdenv. Hindley-Milner and row polymorphism share the same ceiling. Modeling this correctly requires dependent types, where the result type of callPackage depends on a runtime-extracted string set. The Sterrett type checker acknowledges the coverage gap explicitly rather than emitting incorrect types, which is the right call.

The same ceiling applies to set.${dynamicKey}: the result type depends on the runtime value of the key expression, which is a dependent type problem with no practical sound solution in current mainstream type theory.

The LSP Engineering

The LSP wrapper has its own requirements independent of the type theory. The LSP specification defines JSON-RPC over stdin/stdout, with the server sending diagnostics via textDocument/publishDiagnostics and responding to hover and completion requests. Hover must arrive in under 100 milliseconds; anything slower breaks the feel of an editor. This latency constraint forces the analysis to be incremental.

The project uses a Salsa-style computation graph that tracks which analysis results depend on which inputs, recomputing only what a change invalidates. For Nix, cross-file import statements and with expressions create non-local dependencies that make incremental invalidation non-trivial to scope correctly, since a change to a library file can affect type inference results in arbitrarily many downstream files.

Parsing is handled by tree-sitter with the tree-sitter-nix grammar, which produces a concrete syntax tree with explicit error nodes for invalid regions. Error recovery is necessary because the user is always mid-edit; the parser must produce a useful partial tree for incomplete files. Both the CST and a derived AST are maintained: the CST provides source range information needed for hover responses, and the AST is used for type analysis. Lowering from CST to AST while tracking source positions through every transformation is more work than it sounds but cannot be skipped, since hover requires mapping a cursor position back through every analysis layer to find the inferred type.

Type annotations are not part of standard Nix syntax. The project proposes .d.nix declaration files, analogous to TypeScript’s .d.ts, as external type stubs for nixpkgs exports and builtins. The NixOS module system’s lib.types declarations already encode runtime type information in module options; a generator that translates these into static .d.nix stubs would immediately provide type coverage over the NixOS configuration surface without modifying any Nix source.

The Broader Picture

This project is not a replacement for nil or nixd. Scope-level analysis and evaluation-backed completions cover different use cases that static type inference cannot replace. The goal is a third position: type errors reported before evaluation, for code that might never successfully evaluate, without requiring a working Nix installation.

Nickel, developed by Tweag, took the alternative path, designing a typed configuration language with gradual typing and row polymorphism from scratch rather than retrofitting types onto Nix. Nickel’s 1.x series is practically usable. It cannot interoperate with nixpkgs because it is a different language. typenix encodes Nix types in TypeScript’s structural type system, getting the TypeScript language server for free but hitting the same callPackage ceiling along with TypeScript’s recursive instantiation depth limits.

Adding types to Nix is not a solved problem. What the Sterrett project shows is how far bidirectional inference combined with row polymorphism takes you on the tractable fragment, where the Unknown escape hatch is a type-theoretic necessity rather than a sign of incomplete work, and why the hard limits are hard. That kind of accounting is useful whether the project ships a production tool or not.