Nix Gets a Type Checker: What It Actually Takes to Build Static Analysis for a Lazy Language
Source: lobsters
The Nix expression language has been around for over two decades. It powers NixOS, drives the package manager of the same name, and increasingly manages people’s entire system configurations through flakes. For all that maturity, the developer tooling has lagged badly. You get syntax highlighting if you’re lucky, and completion that guesses more than it knows. The reason for this gap isn’t neglect; it’s that Nix was designed with evaluation semantics in mind, not static analysis, and those two design goals pull hard in opposite directions.
A recent post on johns.codes walks through building both a type checker and a Language Server Protocol implementation for Nix from scratch. It’s the kind of project that appears tractable until you start and then reveals layer after layer of difficulty specific to the language. Understanding why requires understanding what makes Nix unusual as a target for static analysis.
What Makes Nix Resistant to Type Checking
Nix is a purely functional, lazily evaluated, dynamically typed language. Each of those properties is workable in isolation. Together, they create a surface that fights static analysis at almost every level.
The laziness is the most structurally disruptive property. In an eagerly evaluated language, you trace value flow by following call order. In Nix, values are not computed until demanded, and “demanded” is defined by what the evaluator’s consumer asks for. A function might receive an attribute set with fifty fields and use only two of them; the other forty-eight are never evaluated at all. A type checker trying to determine what fields are present on a value has to either evaluate the expression, which defeats the purpose of static analysis, or reason conservatively about all possible shapes a value might take.
Attribute sets, Nix’s primary data structure, compound this problem. Unlike records in OCaml or structs in Rust, Nix attribute sets are open and extensible at runtime. The // merge operator combines two sets with right-hand side fields winning on conflicts:
let
base = { x = 1; y = 2; };
override = { y = 99; z = 3; };
in base // override
# { x = 1; y = 99; z = 3; }
Tracking the shape of a set through a chain of merges is a constraint propagation problem. It becomes harder still when with enters the picture.
The with expression is one of Nix’s more consequential design choices. It brings all keys from an attribute set into the current scope:
with pkgs; [ vim git curl openssh ]
A type checker encountering vim inside a with pkgs; block has to know what fields pkgs contains to determine whether this is a valid reference or a typo. Nixpkgs contains tens of thousands of package attributes. Representing that corpus fully in a static type system, and keeping that representation current, is a significant engineering undertaking on its own.
The Current LSP Landscape
Three tools have made serious attempts at language server support for Nix, each with a distinct approach and distinct limitations.
rnix-lsp was the first widely used option, built on top of the rnix-parser Rust library. It provides syntax-level features: error highlighting, basic formatting via nixpkgs-fmt, and rudimentary completion. It never attempted type inference. The project is now largely deprecated, having been superseded by tools with deeper analysis capabilities.
nil is the current community-recommended choice for most users. Written in Rust, it uses an incremental analysis approach that operates primarily at the scope-graph level. It handles goto-definition for local bindings, hover information for builtins, and completion that’s more context-aware than rnix-lsp managed. The maintainer has put real work into Nix’s tricky scoping rules around let, rec, and with. But nil explicitly avoids full type inference, which lets it remain fast and correct within its domain while leaving the harder inference problems unaddressed.
nixd takes the opposite approach. Rather than purely static analysis, nixd invokes the actual Nix evaluator to derive completion information. This gives it something the other LSPs cannot have without evaluation: real knowledge of evaluated attribute sets. If your flake imports nixpkgs, nixd can tell you which packages are available in pkgs because it actually evaluated pkgs. The trade-off is latency and resource usage. The Nix evaluator is not fast on large inputs, so nixd has to be careful about caching and incremental re-evaluation to remain usable during editing.
None of these tools provide what a developer coming from TypeScript or Rust would recognize as a type system: propagated constraints, function signature inference, and detection of field-access errors at analysis time rather than evaluation time.
What a Type System for Nix Needs
A practical type checker for Nix has to solve several interconnected problems that don’t arise in most language implementations.
Row polymorphism for attribute sets is probably the core requirement. Rather than requiring exact set shapes, the type system needs to express partial knowledge: “this function requires a set that has at least these fields, and may have others.” This is the same problem OCaml’s object types address and what TypeScript handles through structural typing. A row-polymorphic type for a function that needs x and y from its argument might look like:
{ x : Int, y : Int, ..r } -> Int
where r is a row variable capturing any additional fields. The // merge operator then becomes a type-level operation on row variables, which is well-understood theoretically but requires careful implementation to stay efficient.
Handling with conservatively is the pragmatic call most type checkers would make. Rather than requiring full evaluation of the with target, the type checker can treat names introduced by with as having an unknown or any type, then refine that if the target’s type can be resolved statically. For nixpkgs usage, pre-computed type stubs for the package set, similar to how TypeScript’s DefinitelyTyped project provides type definitions for JavaScript libraries, would give the type checker enough information to flag obvious errors without requiring live evaluation.
Builtins as the bootstrap layer gives the type checker a solid foundation. Nix’s built-in functions have well-defined signatures that can be hard-coded: builtins.map takes a function and a list and returns a list; builtins.attrValues takes an attribute set and returns a list of its values. Once these are typed, inference can propagate those types outward through the expressions that use them.
Incremental re-checking is a hard requirement for anything used as an LSP. The approach pioneered by Roslyn and refined in rust-analyzer is a query-based incremental computation graph, where only the analysis nodes affected by a change are invalidated and recomputed. For Nix, this is complicated by import and with, both of which create non-local dependencies that can invalidate analysis of syntactically distant code.
Lessons from Gradual Typing in Other Languages
The Nix tooling situation echoes where Python was before mypy and where JavaScript was before TypeScript. Both communities arrived at gradual typing: a system that doesn’t require full annotation coverage, infers what it can, and accepts any or unknown where inference runs out.
Python’s mypy and Microsoft’s pyright illustrate two different implementation strategies. Mypy prioritized soundness and started as an academic project; pyright prioritized IDE responsiveness and ships as the backend for Pylance in VS Code. Both use bidirectional type checking, propagating type information both downward from known types into expressions and upward from expression types back to their callers. For Nix, bidirectional checking makes sense, but the lazy evaluation model means propagation rules have to account for thunks. A value that’s unevaluated isn’t typed as unknown but as “the type of whatever the thunk produces when forced,” which may be knowable from the thunk’s body without forcing it.
Nickel, developed by Tweag, is worth examining as a point of comparison. It’s a configuration language designed as a typed alternative to Nix, using gradual typing with explicit annotations and blame tracking. When a dynamically typed value crosses a type boundary into a statically typed context, Nickel can attribute the resulting runtime error to the specific contract that was violated. This is a cleaner model than retrofitting types onto an existing language. But Nickel is a different language, not Nix, so it doesn’t help the existing corpus of Nix code or the NixOS ecosystem that depends on it.
Why This Work Matters
The practical argument for better Nix tooling is straightforward. NixOS configurations and flakes are real codebases that developers maintain over years. A typo in an attribute set field name that a type checker could catch instantly instead surfaces as a runtime evaluation error, often deep in a build graph and far from where the mistake was made. Functions in nixpkgs have implicit contracts about what their argument sets must contain, and violating those contracts produces error messages that are genuinely difficult to trace back to their source.
Better tooling also matters for adoption. The Nix learning curve is steep, and a significant portion of that steepness comes from the feedback loop. You write something, run nix build or nix eval, wait several seconds for the evaluator, receive a cryptic error, and try to reconstruct what went wrong. An LSP that catches obvious mistakes before you save the file changes that loop from minutes to seconds.
Building a type checker and LSP for Nix is not simple work. The language’s design prioritizes evaluation flexibility over analysis tractability. But that’s what makes the engineering interesting, and given how central Nix has become to a certain style of system configuration and reproducible builds, the tooling has to catch up eventually. Projects like the one documented on johns.codes are how that happens.