Retrofitting Types onto Nix: How Far Static Analysis Can Actually Go

Nix has a tooling problem that sits at an unusual intersection. The language is purely functional, lazily evaluated, and structurally simple enough that the core lambda calculus fits on a whiteboard. Yet writing large Nix configurations without a type checker means passing strings where derivations are expected, misspelling attribute names, and receiving errors at evaluation time that a static checker could have caught seconds after you typed them. The reason a type checker has taken this long to arrive is that Nix, despite its functional nature, contains a handful of idioms that actively defeat static type inference.

A recent project builds both a static type checker and a Language Server Protocol implementation for Nix from scratch, navigating these difficulties in detail. The write-up is worth reading for the implementation specifics. What makes it more interesting is what the project reveals about where the hard walls are, and why they exist where they do.

The Existing Landscape

Two LSPs already exist for Nix. nil, written in Rust, is the community recommendation for most users. It uses a demand-driven incremental analysis graph architecturally similar to rust-analyzer, provides go-to-definition, find-references, rename refactoring, and completion, and starts instantaneously. It has no model of types. It knows where pkgs.git is defined; it cannot tell you that you passed a string where a derivation was expected.

nixd, written in C++, takes the opposite approach: it embeds libnixexpr, the actual Nix evaluator, as a library. When pointed at a nixpkgs path, it evaluates the full tree in the background and uses live evaluation state to answer completion queries. It knows the real attribute names on pkgs because it evaluated pkgs. The cost is an initial evaluation that can take minutes and multiple gigabytes of RAM. It also cannot catch errors in unevaluated code paths; if a function is never called during the warm-up evaluation, its body is invisible to nixd’s analysis.

Neither system produces type errors. The project under discussion occupies a third position: static type inference with error reporting, occupying the space between nil’s structural-but-untyped analysis and nixd’s evaluation-backed-but-slow approach.

What the Type Checker Actually Does

The implementation uses tree-sitter-nix for error-recovering, incremental parsing. Tree-sitter is the right foundation for LSP work: it produces partial, error-annotated concrete syntax trees even for malformed input, re-parses only affected portions on each keystroke, and provides the source position information needed to map cursor positions to hover spans and completion anchors.

The type inference algorithm is Hindley-Milner extended with row polymorphism for attribute sets. Plain HM handles the lambda calculus cleanly: it can infer the type of x: x + 1 as Int -> Int without annotations. Row polymorphism extends HM to handle open records, where { foo, bar, ... }: means “an attribute set with at least these fields, plus any others.” This is the direct encoding of Nix’s structural attribute set patterns.

Bidirectional type checking, in the style of Dunfield and Krishnaswami, reduces the annotation burden. Rather than inferring purely bottom-up or checking purely top-down, bidirectional systems propagate known types downward into subexpressions when context is available, falling back to inference when it is not. The result is that a function whose return type can be determined from context does not require an explicit annotation, while a function where inference would be ambiguous can be constrained by the call site.

Builtins are handled via a hand-written prelude, analogous to TypeScript’s lib.d.ts. An Unknown escape hatch covers expressions that defeat static analysis.

The Three Patterns That Resist Analysis

The interesting part of the project is not what it handles well. Pure functions over attribute sets with known shapes are tractable; Hindley-Milner was designed for exactly this. The interesting part is what it cannot handle, and why.

with expressions. with pkgs; [ git vim ] imports the entire attribute space of pkgs into lexical scope. Every type inference algorithm depends on lexical scope being enumerable at compile time. If pkgs has a known static type, its attribute names are known, and with pkgs; is analyzable. In practice, pkgs is typically derived from a large lazy import whose type is complex or imprecise. TypeScript resolved the equivalent problem by disallowing with in strict mode. Nix cannot make that choice: with lib; appears at the top of nearly every NixOS module in the ecosystem.

callPackage and builtins.functionArgs. The central nixpkgs pattern works like this: builtins.functionArgs ({ foo, bar }: foo + bar) returns { foo = false; bar = false; }, an attribute set whose keys are the parameter names of the function and whose boolean values indicate whether each has a default. callPackage uses this to know which packages to supply from the package set, looking up each parameter name as a string in pkgs.

The correctness of this pattern depends on a computation that takes a function value, extracts its parameter names as strings, and uses those strings as attribute lookups. This is a dependent type: the result type depends not on the type of the function argument but on a structural property of its internal representation. The names of the parameters matter, not just their types. Hindley-Milner cannot express this. TypeScript’s conditional types can extract argument types (F extends (arg: infer A) => any ? A : never) but cannot extract argument names as string literals; parameter names are erased at the type level. The pragmatic response is to treat callPackage as a black box and return Unknown, which is what any sound type checker for Nix must do.

Dynamic attribute access. set.${someExpr} returns a value whose type depends on the runtime value of someExpr. There is no sound practical type system that handles this without annotation. When someExpr is a string literal known at compile time, a type checker can resolve it; when it is computed, the result type is Unknown.

These three patterns are not edge cases. They are idiomatic Nix. Nearly every non-trivial nixpkgs package definition uses callPackage. with lib; is in the majority of NixOS modules. Dynamic attribute access appears throughout overlays and module composition. A Nix type checker will return Unknown more often than a Python type checker returns Any, and for structurally deeper reasons.

How Other Languages Handled the Same Problem

The comparison with TypeScript is instructive. TypeScript chose deliberate unsoundness: it accepted that some type-correct programs would still throw at runtime, and in exchange it could type real-world JavaScript patterns that a fully sound system could not handle. The escape hatches (any, // @ts-ignore, declaration files) are load-bearing. DefinitelyTyped exists because JavaScript libraries existed before TypeScript did; the annotation layer had to be built separately from the code.

Pyright, Microsoft’s Python type checker and the backend for Pylance, uses bidirectional checking and falls back to Unknown when inference fails. Python has analogous problem patterns: __getattr__, locals(), runtime class construction, and getattr(obj, computed_name) are to Python what with, callPackage, and set.${expr} are to Nix. The Python ecosystem’s response, codified in PEP 484 and subsequent typing PEPs, was gradual typing: unannotated code is treated as returning Any, compatible with everything, with no errors unless explicitly opted into stricter checking.

Ruby’s Sorbet added per-file strictness levels via # typed: false/true/strict comments, letting teams annotate incrementally. It explicitly gave up on some dynamic dispatch patterns rather than hiding those gaps. The practical framing was not whether the type system was sound across the entire language, but whether it caught enough real mistakes to justify annotation overhead.

For Nix, gradual typing with Unknown as the escape hatch is the only credible path. The coverage will be lower than TypeScript’s because the idiomatic patterns are harder to type, but partial type checking is still substantially better than none.

The Nickel Contrast

Nickel, developed by Tweag, is a configuration language designed as a typed alternative to Nix targeting the same domain. It uses gradual typing with a contract system: static where annotations exist, runtime-enforced where they do not. Its record types use row polymorphism natively. Its merge operator was designed with conflict detection in mind, making it statically analyzable in ways Nix’s // operator is not. It bans with and constrains import precisely because those constructs defeat type inference.

Nickel demonstrates that the problem is solvable if you start fresh. Retrofitting types onto Nix is harder precisely because Nix’s design made different trade-offs, and those trade-offs are now embedded in millions of lines of nixpkgs code. Migrating to Nickel means leaving the entire existing ecosystem behind.

The project building a Nix type checker is not an argument against Nickel. It is filling a different gap: analyzing Nix code as written, today, without asking anyone to change languages.

What This Means for Nix Tooling

One underexplored path is the NixOS module system’s existing lib.types DSL. NixOS options are already declared with runtime type information: types.str, types.int, types.listOf types.package, types.submodule, and so on. A tool that reads lib.mkOption declarations and emits static type stubs would provide type checking for the NixOS configuration surface with minimal annotation work, because the type information already exists at the runtime level and needs only to be lifted into the static layer.

Building the type checker and LSP together, rather than building a batch checker first and retrofitting LSP support, turns out to be architecturally important for a reason beyond convenience. LSP-first design forces correct source position tracking from day one and requires incremental invalidation to be built into the core data model, not added on top. The rust-analyzer project established this pattern for Rust; the Nix project under discussion applies the same principle.

The realistic expectation for a Nix type checker is not the coverage TypeScript provides for JavaScript. It is closer to what mypy provides for Python at moderate strictness levels: most pure functions are checked correctly, framework-level magic returns Unknown, and the gap between what is checked and what is not is explicit rather than hidden. For a language where a mistyped attribute name currently produces a runtime error at the end of a multi-minute evaluation, even partial coverage is a meaningful improvement in the development feedback loop.