· 7 min read ·

Nix Gets a Type Checker: The Hard Problem Behind a Long-Overdue Tool

Source: lobsters

The Nix expression language has been around since Eelco Dolstra’s 2003 thesis, and for most of that time, the tooling story has been austere. You write Nix, you nix eval, you discover your mistake. If something goes wrong, the error message arrives at the worst possible time: buried inside a derivation build, or mid-evaluation of a 400,000-line nixpkgs tree.

Recently, John Sterrett documented his work building a type checker and Language Server Protocol implementation for Nix. It’s an ambitious project, and it comes at a moment when the Nix tooling ecosystem is finally maturing. But the deeper story here isn’t about one project. It’s about why Nix has resisted static analysis for two decades, and what it means to try to retrofit type inference onto a language that was never designed for it.

The Existing Landscape

Two LSP servers dominate current Nix editor tooling. nil, written in Rust by oxalica, uses an incremental demand-driven computation model inspired by rust-analyzer’s salsa architecture. It handles name resolution, completions based on static attribute paths, go-to-definition, and find-references. It does not evaluate Nix code. Startup is essentially instantaneous and memory usage stays well under 100 MB. What nil cannot do is tell you what type a function argument is, or whether you’re passing the wrong value to mkDerivation.

nixd, written in C++, takes the opposite approach. It embeds libnixexpr, the actual Nix evaluator, as a library. When configured against a nixpkgs path, it evaluates the full nixpkgs tree in the background and uses that live evaluation state to answer completion queries. The result is accurate, context-aware completions for NixOS options, home-manager modules, and nix-darwin configuration. The cost is steep: initial evaluation can take minutes and multiple gigabytes of RAM. nixd also ships nixf, a fault-tolerant Nix parser and diagnostic tool that has been spun out as a separate component.

Neither project attempts type inference. The gap that Sterrett’s project addresses is real, and it’s not simply a matter of effort. Nix is genuinely difficult to type-check.

Why Nix Resists Static Analysis

Start with lazy evaluation. Nix is a purely lazy language: values are not computed until forced. This means the type of any expression is, in a strict sense, only knowable once evaluation proceeds far enough to force it. A type checker must approximate this, either by tracking laziness explicitly through the type system or by making conservative assumptions that will occasionally be wrong.

The attribute set merge operator // is a specific pain point. If you have:

let
  base = { a = 1; b = 2; };
  override = { b = "hello"; c = true; };
in base // override

The result is { a = 1; b = "hello"; c = true; }. A sound type checker needs to model both operands as record types and compute the merged output type, including field overrides. This is tractable for small literal records. It becomes intractable when either operand is a function argument whose shape is not statically known, which describes most real nixpkgs code.

Then there is with. The expression with pkgs; [ git vim curl ] introduces every attribute of pkgs into scope without any declaration. Name resolution without evaluation is impossible in the general case, because pkgs might be constructed dynamically or extended with overlays. nil handles with by tracking the expression and attempting best-effort resolution; a type checker would need to do the same, accepting that completeness is out of reach.

Finally, the callPackage pattern that underlies all of nixpkgs uses builtins.functionArgs to introspect a function’s argument names at runtime, then automatically passes the matching attributes from the package set. This is dependency injection through dynamic reflection. There is no static declaration that says “this derivation depends on stdenv, fetchurl, and openssl”; that information lives in the function signature and is extracted at evaluation time. Any type system that wants to model nixpkgs accurately must either evaluate the function args introspection statically, or accept that callPackage-based code is a type-checking black box.

What Gradual Typing Did for Other Languages

The history of type checkers for dynamic languages is instructive. Python had no type syntax until PEP 484 in 2014 introduced the typing module and function annotation syntax, building on the runtime annotation mechanism that PEP 3107 had added in 2006. mypy, Pyright, and now Meta’s Pyrefly all benefit from explicit syntax that programmers write into their code. The annotations are optional and ignored at runtime, but they give the type checker solid footholds in an otherwise dynamic codebase.

TypeScript took a more structural approach: rather than adding annotations to existing JavaScript syntax, it defined a new superset language. The structural type system was designed specifically to match JavaScript’s duck-typed patterns. interface declarations, union types, and conditional types exist precisely because JavaScript code passes objects around based on shape, not class hierarchy. TypeScript’s optional any type and // @ts-ignore escapes are not lazy design choices; they are deliberate safety valves for code that cannot be typed without breaking the programmer’s workflow.

Nix has neither of these handholds. There is no annotation syntax. There is no community convention equivalent to JSDoc type comments. The language was designed to be evaluated, not analyzed. A type checker for Nix must either propose language extensions (a non-starter for production Nix code that needs to stay compatible), infer types entirely from unannotated code, or adopt a hybrid approach that infers what it can and falls back gracefully.

Nickel Chose a Different Path

Nickel, developed by Tweag, approached this by designing a new language rather than retrofitting an old one. Nickel targets the same domain as Nix (configuration generation, infrastructure-as-code) but built typing in from the start. It uses gradual typing with a contract system: types can be checked statically where annotations exist, and enforced at runtime via contracts where they don’t. Row polymorphism handles the record merging problem: instead of tracking exact field sets, Nickel’s type system can express “a record with at least these fields, possibly more.”

Nickel’s merge operator was designed with conflict detection in mind, which makes it statically analyzable in ways that Nix’s // is not. The tradeoff is that Nickel does not do package management. It has no derivations, no store, no nixpkgs. It is not a replacement for Nix; it is a proof of concept that the configuration domain is typeable if you’re willing to start from scratch.

The existence of Nickel shows that the problem is solvable. Building a type checker for existing Nix code is harder, because it must be compatible with code that was written without types in mind.

The LSP Architecture Problem

Type checking is half the challenge. The other half is making it useful inside an editor, which means implementing the Language Server Protocol on top of whatever type inference engine you build.

LSP imposes hard latency requirements. Hover responses, completions, and diagnostics need to arrive in well under 200 milliseconds for the experience to feel acceptable. Type inference can be arbitrarily slow for complex programs; the architecture has to decouple background analysis from foreground query response. Rust-analyzer, widely regarded as the best LSP implementation in the ecosystem, achieves this through its salsa incremental computation framework: queries cache their results and are only recomputed when their inputs change, at the granularity of individual source files or even individual functions.

For a Nix type checker, this means the incremental model has to be designed around Nix’s specific dependency structure. A change to a let binding in one file should not force re-evaluation of unrelated expressions. A change to a package in nixpkgs should not invalidate every type in every configuration file that depends on it. Getting the incrementality right is at least as much work as getting the type inference right.

The existing nil project demonstrates that incremental, demand-driven analysis is achievable for Nix. What Sterrett’s project attempts is to add a semantic layer on top of that foundation: not just name resolution, but type inference and type-error reporting. That is the step the community has not had before.

What Success Looks Like

A useful Nix type checker probably doesn’t need to be complete. It needs to be correct where it makes claims, and silent where it can’t be certain. The TypeScript lesson is relevant here: a type checker that fires false positives on legitimate code will be disabled by users within a week.

The high-value targets are the simple cases that catch real bugs: passing a string where an integer is expected, accessing an attribute that doesn’t exist on a record, calling a function with the wrong number of arguments. These are the errors that currently surface only at eval time, sometimes inside a derivation build that takes twenty minutes to reach the error site. Getting those into the editor as red squiggles would be a genuine improvement for the majority of Nix users.

The hard cases, callPackage patterns, module system fixpoints, dynamically-constructed attribute sets, can be left as any-equivalent unknowns in the initial implementation. A Nix type checker that handles 60% of user code well is more valuable than one that attempts to handle 100% and gets 30% wrong.

The project joins a moment where Nix tooling is finally getting serious attention. The combination of nil’s incremental architecture, nixd’s evaluation-backed completions, and now a type inference layer suggests that the tooling gap that has defined the Nix experience for twenty years might, slowly, be closing.

Was this interesting?