Looking back at Andrew Kelley’s December 2022 announcement that the C++ Zig compiler was leaving the repository for good, the headline reads like a standard self-hosting milestone. Language matures, rewrites itself, retires the bootstrap compiler. That arc is familiar. Zig’s story is more interesting, and more instructive, because the C++ implementation had to be replaced not primarily for maintenance reasons but because it was structurally incapable of correctly implementing the language it compiled.
Stage1’s Fundamental Problem
Most accounts of Zig’s self-hosting journey focus on the milestone itself. The more useful framing is that stage1, as the C++ compiler was known, had documented, unfixable bugs. Not the kind of bugs that accumulate in any large codebase and get addressed in subsequent releases. Architectural defects baked into the implementation, particularly around comptime and the type system, that would have required a complete rewrite to fix.
This matters because comptime is not a peripheral feature in Zig. It is the foundation of the language’s generics story. Where other languages have templates, macros, or generic syntax, Zig uses compile-time execution for nearly everything. ArrayList(T) is a function called at compile time that returns a struct type. The compiler needs a complete, correct interpreter for Zig itself. Stage1’s comptime implementation was neither, which meant there was a meaningful gap between what the Zig language specification described and what stage1 could actually compile correctly.
So the self-hosted compiler was not a rewrite pursued for contributor ergonomics or maintenance hygiene, though both improved substantially. It was the only path to a compiler that correctly implemented the language as designed.
A Multi-Pass Architecture Built Around Zig’s Semantics
The self-hosted compiler, developed under the informal name stage2, introduced a carefully separated compilation pipeline.
Source code passes through a tokenizer and parser to produce an AST. Zig’s AST is designed to be lossless, preserving whitespace and comments from the original source. This design decision directly enables zig fmt, the code formatter, which can round-trip source through the AST without information loss.
From the AST, an AstGen pass produces ZIR (Zig Intermediate Representation). ZIR is deliberately untyped. It captures the structure of the code without resolving types, and it is cached per source file. If a file hasn’t changed, its ZIR doesn’t need to be regenerated. Lazy analysis happens here too: Zig only analyzes code reachable from the compilation’s entry points, so ZIR for unused code never gets processed further.
The bulk of the work happens in Sema, the semantic analysis pass, which takes ZIR and produces AIR (Analyzed Intermediate Representation). Type checking, comptime evaluation, and generic monomorphization all happen in Sema. AIR is typed and closer to machine semantics.
A data structure called the InternPool underpins the whole type system. It is a global deduplicated table of types, values, and compile-time constructs. The practical benefit is that type comparison becomes identity comparison: pointer equality rather than structural equality. For a language that creates large numbers of generic instantiations at compile time, this makes a substantial difference. The InternPool was designed specifically for Zig’s comptime-heavy patterns, where the same type might be demanded repeatedly through different paths in the analysis.
The Bootstrap Problem, and a Novel Solution
Self-hosting creates an obvious circular dependency: the compiler is written in the language it compiles. Building from source on a machine with no prior Zig installation requires solving this.
Compare how other language ecosystems handle it. Go switched its compiler from C to Go in version 1.5 (2015). The bootstrap solution was to preserve Go 1.4, the last C-based release, as a required dependency for building later versions. It works, but it means carrying a decades-old C-era binary forward indefinitely and requiring anyone building from source to first obtain Go 1.4. Rust takes a different approach: bootstrapping requires downloading a binary of the previous Rust release from the internet. This is convenient for most users but requires network access and places trust in binaries fetched from an external server.
Zig’s approach is architecturally distinct. The repository contains a file called zig1.wasm, a WebAssembly binary of an older Zig compiler. Alongside it lives a minimal C program, a few thousand lines, that implements a WebAssembly interpreter. The bootstrap sequence: compile the tiny C interpreter with any available C compiler, use it to run zig1.wasm, which produces a native Zig binary, which then compiles the full compiler.
This gives the bootstrap a set of useful properties. The WASM binary is committed to the repository and can be audited directly. The C interpreter is small enough to read in an afternoon. No internet access is required. The same zig1.wasm works on x86_64, ARM, RISC-V, or any other architecture, because WebAssembly is a stable portable instruction set that doesn’t depend on the host’s native ABI. When the compiler makes breaking changes to its own interface, zig1.wasm is updated in the repository with a new commit.
The self-hosted compiler’s C backend provides a secondary bootstrap path. The compiler can translate Zig source to portable C89, which any C compiler can then build. This means a native Zig binary can be reached from nothing but a C compiler, by a different route than the WASM path.
For comparison: Zig’s approach is more auditable than Rust’s (no download trust chain, the bootstrap artifact is in the repository) and cleaner than Go’s (no need to maintain a release from a different implementation era).
Bypassing LLVM for Debug Builds
Alongside the architecture improvements, the self-hosted compiler introduced multiple code generation backends. Stage1 used LLVM exclusively for code generation. The self-hosted compiler uses LLVM for optimized release builds, but also includes a C backend, an x86_64 backend, an aarch64 backend, and others targeting RISC-V, ARM, and WebAssembly.
The practical consequence for daily development is significant. LLVM produces excellent optimized machine code, but it is slow. For debug builds during active development, LLVM’s optimization passes are not just unnecessary, they are a tax on every compile cycle. The x86_64 backend generates code directly without invoking LLVM, which produced debug compilation times roughly five to fifteen times faster for simple programs in the Zig 0.10.0 era. The x86_64 backend was also built from the start with incremental compilation in mind, where only changed functions and their dependents recompile rather than the entire translation unit.
This workflow, where LLVM handles final release builds and a fast native backend handles day-to-day iteration, was not achievable with stage1’s single-backend design. It is the kind of improvement that shows up in every compile invocation rather than in occasional benchmarks.
The C backend serves different purposes. It makes Zig portable to targets LLVM does not support: if a platform has a C compiler, it can build Zig programs. It also enables the secondary bootstrap path described earlier. Both use cases are about coverage and correctness rather than speed.
What the Transition Established
The most durable consequence of the transition is that the Zig compiler is now the primary real-world test of the Zig language itself. The compiler team works in Zig daily on their most critical project, which means any language feature that is awkward in practice surfaces in the team’s own workflow rather than in user reports months later. This feedback loop is structurally different from maintaining a compiler written in another language.
The removal of stage1 also made the language specification authoritative in a way it could not be while a second, divergent implementation existed. Stage1’s deviations from the specification were documented but not fixable. With a single implementation designed to correctly implement the spec, discrepancies are bugs to be addressed rather than known limitations to be worked around.
Looking back at the December 2022 announcement from a few years’ distance, the headline undersells the change. Describing it as Zig becoming self-hosted makes it sound like an ideological milestone, the language proving it has reached maturity. The more accurate description is that Zig shipped a correctly-designed compiler to replace one that could not correctly implement the language. The self-hosting was a consequence of that necessity, not the goal itself.