· 6 min read ·

Nix Tooling's Missing Layer: What a Real Type Checker Requires

Source: lobsters

The Nix expression language has a problem that grows more painful as your configuration grows: the tooling does not know what anything is. You can write pkgs.hello.version and your editor will not tell you the type, will not complete version as a field, and will silently accept pkgs.hello.versoin until the evaluator catches it at build time. A recent post at johns.codes documents a serious attempt to build a type checker and Language Server Protocol implementation for Nix from scratch, and it surfaces a set of language design problems worth examining in their own right.

The Nix community has not been idle on tooling. Three LSP implementations have existed at different points: rnix-lsp (now archived), nil (actively maintained, written in Rust), and nixd (actively maintained, written in C++). Each represents a distinct philosophy about what editor tooling for a language like Nix should do.

rnix-lsp was the first serious attempt and established that a Rust-based Nix LSP was buildable. It was purely syntactic, meaning it understood the parse tree but not the semantics. Completions came from whatever bindings happened to be in the current file’s let expressions. with pkgs; told it nothing. Cross-file analysis did not exist. It was eventually archived, having established the basic infrastructure without solving the hard problems.

nil is the current recommendation for most users. It uses a demand-driven, incremental analysis graph architecturally similar to rust-analyzer, handles cross-file import resolution, performs dead code detection on unused let bindings, and supports rename refactoring across files. It is fast and works without a Nix installation. What it does not do is type checking. It understands name binding and scope but has no model of what type any expression produces.

nixd took a different approach entirely: it invokes the actual Nix evaluator in the background and extracts information from the evaluation state. When you type pkgs.python3.pkgs. in a NixOS configuration, nixd evaluates pkgs.python3.pkgs and returns the real attribute names. For NixOS module options, it knows that services.nginx.enable is a boolean and services.nginx.virtualHosts is an attribute set of submodules, because it evaluated those option declarations. The tradeoff is that nixd requires a working Nix installation, can consume hundreds of megabytes of memory, and cannot help with code that does not evaluate successfully.

This is the gap the johns.codes article addresses: neither approach gives you static types. nil knows the structure of your code without knowing what any of it means. nixd knows what things evaluate to, but only after they succeed at evaluating. A real type checker would occupy a third position: inferring types structurally, flagging type errors before evaluation, and doing so for code that might never successfully evaluate.

Why Nix Is Genuinely Hard to Type

Several properties of the Nix language create problems that do not exist in Python, Ruby, or JavaScript to the same degree.

Lazy evaluation is the first. Nix is call-by-need: no expression evaluates until forced. A binding like x = f y; creates a thunk, not a value. This means a type checker cannot use evaluation order as a guide, and it cannot evaluate to discover types. It must reason purely structurally, which requires a type theory expressive enough to describe what the values will be without running the code.

The with expression is the second problem. with pkgs; stdenv.mkDerivation { ... } dumps every attribute of pkgs into lexical scope, making stdenv available as if it were a local binding. Statically enumerating what names this introduces requires knowing the type of pkgs, which requires type-checking pkgs, which is often an import of a large attribute set derived from complex computation. The article explores a taint system for identifiers that originate from with scopes, marking them as having uncertain provenance rather than refusing to analyze the expression at all.

The merge operator (//) creates a related problem. base // override produces an attribute set combining both, with override winning on conflicts. Typing this precisely requires knowing the shapes of both operands. If either contains a row variable representing an unknown set of additional fields, the result type becomes a constraint the solver must reason about. Full row-typed record concatenation is known to be complex; some formulations are undecidable in the general case. Languages with row-typed records, like PureScript and OCaml’s object system, impose restrictions that keep the system tractable.

The callPackage pattern adds a third layer of difficulty. Standard Nixpkgs packages are written as functions:

{ stdenv, lib, fetchurl }:
stdenv.mkDerivation {
  pname = "hello";
  version = "2.12";
  src = fetchurl { url = "..."; sha256 = "..."; };
}

The caller (callPackage) fills in the arguments by matching the function’s parameter names against the package set. Typing this correctly requires recognizing that the function’s argument names correspond to attributes of a known set, which is a form of dependent typing or at minimum a very precise row constraint. Without this, the type of every callPackage-based package function degrades to “takes an attribute set with unknown fields.”

Recursive attribute sets introduce a fourth wrinkle. rec { a = 1; b = a + 1; } allows self-reference, and overlays use fixed-point combinators of the form self: super: { ... }. Typing these patterns requires the inference algorithm to handle recursive type constraints without looping, which means cycle detection and, in some cases, equirecursive types.

What a Type System for Nix Needs

The cross-language precedents are instructive. TypeScript typed JavaScript by committing to structural typing, accepting any as an escape hatch, and adding flow-sensitive narrowing so that typeof x === "number" branches actually narrow the type of x. Pyright infers return types from function bodies without requiring annotations, falls back to Unknown when inference fails, and uses Protocol types for structural typing directly analogous to Nix attribute sets. Sorbet introduced per-file strictness levels for Ruby, letting teams annotate incrementally rather than all at once.

The lessons from all three converge on the same decisions: embrace gradualism, represent the hard cases as Unknown rather than refusing to analyze them, make structural typing the default, and add flow-sensitive narrowing for runtime type tests. For Nix, builtins.typeOf x == "int" should narrow the type of x to integer in the true branch, the same way typeof narrows in TypeScript.

The builtins also need first-class treatment. builtins.map, builtins.filter, builtins.foldl', and the rest are implemented in C++ inside the Nix evaluator. Any external type checker must ship a typed prelude that declares their signatures. This is not unusual; TypeScript ships lib.d.ts declaring the entire browser and Node.js API surface, and Pyright ships type stubs for the Python standard library.

Set-theoretic types, as used in CDuce and as a conceptual model for TypeScript’s union and intersection types, map onto Nix’s runtime dispatch patterns naturally. A function that behaves differently depending on whether its argument is a string or a list has type String -> A | [a] -> B, and after a builtins.typeOf check, the type narrows accordingly.

Architecture

A viable architecture separates concerns into distinct layers. The parser layer needs to produce a lossless, error-resilient concrete syntax tree; errors in one part of the file should not prevent analysis of the rest, which is essential for LSP work where the user is always mid-edit. The name resolution layer handles let, rec {}, function parameters, inherit, and as much of with as can be resolved statically. The type inference layer applies Hindley-Milner style inference extended with row polymorphism for attribute sets and an Unknown type as the escape hatch for expressions that cannot be typed without evaluation.

The incremental computation model is a practical requirement. An LSP server that re-analyzes the entire Nixpkgs tree on every keystroke is unusable. nil solved this with a demand-driven graph where changing a file only invalidates analyses that depend on it. Any type checker with the same performance requirements needs the same approach from the start; retrofitting incrementality onto a batch analysis pipeline is considerably harder than designing for it initially.

What Remains Hard

Getting completions right for with pkgs; [ <TAB> ] probably requires either accepting nixd’s evaluation-based approach for that specific case, or maintaining a type database for Nixpkgs that maps package names to their attribute shapes. Neither is trivial. The fixed-point patterns in overlays require the type checker to handle recursive type constraints without looping. Dynamic attribute names ({ ${someExpr} = val; }) are simply unanalyzable without evaluation, and the Unknown escape hatch is the only honest response.

The project at johns.codes is an honest account of how far a serious implementation effort gets before hitting these fundamental difficulties. The design space between nil (fast, structural, no types) and nixd (evaluation-based, complete for evaluatable code, heavy) has room for a third tool that infers types for the statically analyzable majority of Nix code while gracefully degrading on the dynamic parts. Getting there requires confronting problems that the existing tools have deliberately sidestepped, and the article makes a credible case for what that confrontation looks like.

Was this interesting?