· 7 min read ·

Nix, Static Types, and the Architecture That Makes Them Tractable

Source: lobsters

The Nix expression language has a persistent tooling gap. Every major editor treats it largely as a text file, and the two dominant language servers, nil and nixd, give you useful things, but neither does type inference. A recent post on johns.codes documents building one from scratch. The architecture decisions it requires illuminate something specific about the language, and the approach maps to a broader class of problems involving dynamically typed languages where tooling authors want to give users more than a syntax highlighter.

What nil and nixd actually do

Before getting into why type inference is hard for Nix, it helps to be precise about where the existing tools stop.

nil is written in Rust and uses a Salsa-style incremental computation database, the same pattern rust-analyzer uses. Salsa memoizes query results and reruns only the parts of the computation whose inputs have changed, which is essential for LSP latency budgets. nil does scope analysis: it resolves variable definitions, flags undefined bindings, warns about unused names, and navigates to definitions. What it does not do is reason about the types of values. It will not tell you that you are passing an integer where a derivation attribute set is expected.

nixd takes a fundamentally different approach. Rather than static analysis alone, it spawns a Nix evaluator subprocess and actually evaluates your file to gather semantic information. This gives it something nil cannot have: access to real computed values. For NixOS module users this matters considerably, because nixd can evaluate the entire nixpkgs option tree and offer accurate completions for services.nginx.enable and thousands of similar options. The tradeoff is cost. Evaluation is expensive, large configurations are very expensive, and the evaluator state at any point is a mixture of forced and unforced thunks that does not map cleanly to an incremental architecture designed for sub-100ms hover responses.

Both tools leave a gap. Static scope analysis catches undefined variables but misses type mismatches. Eval-based tooling catches far more but is slow and coupled to an evaluator designed for reproducibility, not interactive latency.

The specific obstacles

Nix is a purely functional, lazily evaluated language whose central data structure is the attribute set: a key-value map where keys are strings and values can be any expression. That combination creates concrete obstacles for a type checker.

Dynamic keys. Nix allows attribute set keys to be computed at runtime:

{ ${builtins.toString n} = value; }

Row-polymorphic type systems track known field names at the type level, so { a: Int, b: String } is a structurally distinct type from { a: Int }. If the key is an arbitrary runtime expression, there is no field name to track. The type checker must either evaluate the key expression, which requires an evaluator and defeats the purpose, or treat dynamically keyed attribute sets as opaque maps with no known field structure.

The with expression. with pkgs; [ hello curl ] injects all keys of pkgs into the surrounding scope. Without knowing the keys of pkgs at type-checking time, which requires evaluation, the type checker cannot resolve what hello and curl refer to. This is not a corner case: nixpkgs uses with extensively, and many user configurations open with with pkgs; to avoid repetitive prefixes.

Recursive attribute sets. In rec { a = 1; b = a + 1; }, bindings can reference each other. Type checking them requires solving recursive type equations. Hindley-Milner handles recursive let bindings through the let-polymorphism generalization step, but recursive attribute sets with computed fields add further complexity.

The module system. NixOS’s module system (lib.mkOption, lib.mkMerge, lib.mkIf) is implemented entirely as Nix library code built on attribute set conventions. There is no language-level primitive for modules. Typing module system usage requires the type checker to understand what these functions do at the library level, which means either hand-encoding the semantics of dozens of functions or deriving them through inference and stubs.

Lazy evaluation cuts across all of these. In a strict language, a type error causes an immediate failure at the point of evaluation. In Nix, an expression is only forced when its value is needed, so a type error in a package definition might never surface if that package is never built. A type checker must reason eagerly over lazily evaluated code, treating every unevaluated thunk as a typed expression and checking its body for consistency without waiting for evaluation to force it.

Hindley-Milner and row polymorphism

The approach the johns.codes implementation takes is Hindley-Milner constraint generation with unification, extended with row polymorphism for attribute sets.

Hindley-Milner infers types without annotations by generating constraints and solving them through unification. Given f x = x + 1, it infers that x must be an integer and f has type Int -> Int. Extended to Nix, the same mechanism can infer that x: x.foo + 1 has type { foo: Int | r } -> Int, where r is a row variable representing whatever other fields the attribute set might carry.

Row polymorphism is what makes attribute set types structural. Instead of a closed record type requiring exactly the specified fields, a row-polymorphic type describes a minimum required structure and allows additional fields. This maps well to how Nix functions use attribute sets in practice: a function that needs .name and .version does not care what other fields the argument contains.

The practical limitation is that dynamic keys punch holes in this model. The implementation treats dynamically keyed attribute sets as typed with an opaque map type, which is the honest answer since you genuinely cannot say anything more specific, but it means that a common Nix pattern of constructing attribute sets programmatically falls outside the typed region.

The incremental architecture LSP demands

LSP has hard latency requirements. A hover response needs to arrive within roughly 100 milliseconds; completions need to feel instantaneous. This rules out any approach that re-typechecks an entire file on every keystroke.

The standard solution is a query-based incremental computation architecture. The Salsa library in Rust, used in rust-analyzer and nil, expresses all computations as queries over an input database. Query results are memoized, and only queries whose transitive inputs have changed are re-executed when an input changes. When you edit a single line, only the queries that depend on that line through parsing, then the AST, then type inference for the affected expressions, are invalidated and re-run.

For an LSP server, the pipeline looks like this: text changes flow into a parsing layer, the changed subtrees of the concrete syntax tree are reparsed, the affected portions of the HIR are lowered again, type constraints for the changed region are regenerated, and unification runs over the changed constraints. Hover and completions then query the type of the expression at the cursor position against the memoized results.

Parsing for Nix means tree-sitter-nix, which has matured considerably and now handles edge cases in string interpolation, multiline strings, and path literals reliably. Building on it gives a type checker a correct, incrementally updated parse tree without maintaining a custom parser.

The Nickel comparison

Nickel, developed by Tweag, approaches the same problem from the opposite direction. Rather than retrofitting types onto Nix, it is a new functional configuration language with gradual typing built in from the start. Its blame calculus tracks contract violations at runtime boundaries between typed and untyped code. Nickel is a reasonable choice for new projects that want a typed configuration language.

It is not a replacement for Nix. The entire nixpkgs ecosystem, NixOS module system, and flake infrastructure are written in Nix. A type checker for Nix itself, even one covering only the statically typeable subset, is useful precisely because it applies to existing code without requiring rewrites. TypeScript succeeded in a comparable situation: it added types to JavaScript by accepting that some code would fall into the any-typed region, and by providing coverage over the statically analyzable subset, which turned out to be most of the code that mattered.

The Nix case is the same in structure. Dynamic keys and with expressions define the boundary of what can be typed without evaluation. Within that boundary, Hindley-Milner with row polymorphism covers the majority of typical code: function argument types, attribute set field access, arithmetic, string operations, list operations. The type errors it catches in that region, wrong-arity calls, accessing nonexistent fields in statically known attribute sets, type mismatches in arithmetic, are exactly the errors that currently surface only at evaluation time, often only when a specific package is built.

The LSP integration converts that coverage into something interactive. Hover shows inferred types. Completions use the inferred attribute set type to suggest valid field names. Diagnostics mark type mismatches inline. For a language used as heavily as Nix is for system configuration, that changes the feedback loop considerably, even without complete coverage.

The architecture needed to build it reflects the language’s specific constraints: row polymorphism for structural attribute sets, explicit opaque types where dynamic keys make structure unknowable, Salsa-style incremental computation for LSP latency, and tree-sitter for incremental parsing. None of those choices are accidental. Each one follows directly from a property of the language, and the chain of reasoning from language feature to architecture decision is what makes the johns.codes writeup worth studying beyond its specific implementation.

Was this interesting?