Looking back at Andrew Kelley’s December 2022 announcement that Zig’s C++ compiler implementation was being retired, the obvious headline was “Zig is now self-hosted.” The more interesting story is how Zig solved the bootstrapping problem, and why that solution compares favorably to what Go and Rust built before it.
Every programming language that wants to write its own compiler faces the same paradox: you need the compiler to build the compiler. The solutions different language communities found for this problem reveal something about each project’s values and constraints.
What Stage1 Was, and Why It Had to Go
Zig’s original compiler, internally called “stage1,” was written in C++. It used LLVM as a backend and required a full C++ toolchain to build. For a language whose design philosophy centers on having explicit control over dependencies and rejecting unnecessary complexity, maintaining a roughly 100,000-line C++ codebase was an ongoing source of friction.
The problem was not just philosophical. Stage1 had genuine architectural limitations. Zig’s comptime system, which allows arbitrary Zig code to execute at compile time, was partially implemented in stage1 through a series of heuristics and workarounds rather than a proper interpreter. Certain valid Zig programs would fail to compile at comptime, and edge cases in the type system produced incorrect results. The Zig team was spending significant effort maintaining a codebase in a language they did not want to work in, to produce a compiler that could not fully implement the language they were designing.
The self-hosted compiler, called “stage2,” was built from scratch in Zig. It introduced a cleaner IR pipeline, with AIR (Analyzed Intermediate Representation) sitting between the parsed AST and backend code generation. It includes a proper comptime interpreter, making the language specification and the implementation consistent. It also introduced multiple backends: the LLVM backend for optimized release builds, a C backend for portability to platforms LLVM does not support, and a self-contained x86_64 backend that skips LLVM entirely for debug builds. That last backend is where the most dramatic compile-time improvements come from, with debug compilation often being several times faster than the equivalent LLVM path.
The Bootstrap Mechanism
The self-hosting transition creates an obvious problem: you cannot compile a Zig compiler with a Zig compiler if you do not have one. Zig’s solution to this is zig1.wasm, and it is worth examining carefully.
zig1.wasm is a WebAssembly binary of an older version of the Zig compiler, committed directly to the Zig source repository. Alongside it is a small WebAssembly interpreter written in C, approximately 2,000 lines. The bootstrap process works like this:
- Compile the small C WASM interpreter with any available C compiler (gcc, clang, or anything compatible).
- Use the resulting interpreter to run
zig1.wasm, which contains an older Zig compiler. - That older compiler compiles the current Zig source into a working binary.
- The resulting binary then recompiles itself for final optimizations.
The entire chain requires only a C compiler and the files in the repository, with no network access, no pre-installed Zig, and no C++ toolchain.
Compare this to how other languages handled the same problem.
Go transitioned from a C compiler to a self-hosted Go compiler in Go 1.5 (2015). The approach was mechanical translation: the team wrote a tool that automatically converted the C source code into equivalent Go code. The resulting compiler was not particularly idiomatic Go, but it compiled correctly. To build any modern version of Go from source, you still need a prior Go installation, and the build scripts download this automatically from the internet. The chain works, but it reaches outside the repository.
Rust started with a compiler written in OCaml. When the self-hosted Rust compiler (written in Rust) was ready, the OCaml implementation was dropped. To build Rust from source today, the build system downloads a prior version of rustc from the internet, automated through the x.py build script. This is convenient, but the bootstrap chain has an external dependency that cannot be fully audited from the repository alone.
Zig’s approach differs from both. The zig1.wasm binary is in the repository. The WASM interpreter is small enough to read and audit in a few hours. There is no network access required. The zig1.wasm file is updated when the compiler’s own source changes incompatibly, but each update is a discrete, reviewable artifact committed to version control.
This has concrete implications for reproducible builds and supply chain security. Ken Thompson’s 1984 lecture “Reflections on Trusting Trust” established a foundational problem: a compiler can contain hidden backdoors that reproduce themselves in every binary it compiles, undetectable by reading the source code. The practical defense is a short, auditable bootstrap chain. A 2,000-line C WASM interpreter is auditable. A full C++ toolchain with LLVM is not. Zig’s approach does not fully solve Thompson’s problem (no bootstrap chain does), but it minimizes the trusted computing base in a way the Go and Rust approaches do not.
What Self-Hosting Actually Enabled
Beyond the bootstrap story, the transition unlocked concrete improvements.
Memory usage dropped substantially. Stage1 was known for high peak memory consumption during compilation of large programs, sometimes reaching several gigabytes. Stage2’s architecture, combined with work toward incremental compilation, reduced this significantly. The incremental compilation system, which tracks which parts of a program changed and recompiles only what is necessary, was architecturally incompatible with stage1 but is a core design goal of stage2.
Comptime became a proper interpreter over Zig’s AIR rather than a collection of special cases bolted onto the C++ implementation. This matters because Zig uses comptime extensively, not just for constant folding but for generics (through comptime parameters), interface dispatch (through anytype), and compile-time code generation. Having a correct implementation means the language specification and the compiler now agree, which was not always true with stage1.
The team also now works entirely in Zig. The Zig Software Foundation contributors are people who chose Zig as their primary language; maintaining a large C++ codebase was a constant friction point. Complex bugs in stage1 were difficult to fix because they required C++ expertise and the architecture did not lend itself to clean patches. In stage2, the compiler itself is among the most complex Zig programs in existence, which means compiler development and language development inform each other directly. Bugs in the language or standard library surface in the compiler, and fixes improve both.
The multi-backend design also matters for portability. The C backend in stage2 means Zig can target platforms that LLVM does not support, by emitting valid C code that a system C compiler can then compile. This is a different kind of portability than LLVM provides, and it is only possible because the self-hosted compiler has a clean, pluggable backend interface.
The Retrospective View
The December 2022 announcement is worth examining in retrospect not because self-hosting is unusual, but because of how Zig solved the specific problem of getting there. Most languages either require an existing installation of themselves (Go, Rust) or start from a completely different host language and leave the transition path implicit. Zig committed a WebAssembly binary and a small interpreter, updated it when necessary, and built an auditable chain from C compiler to finished Zig binary.
The technical details of the transition, especially the memory improvements and the correct comptime implementation, matter for anyone using Zig in production. The self-hosted x86_64 backend has meaningfully changed the iteration speed on debug builds. Comptime is now reliable in a way it was not before.
The bootstrap mechanism is the part that reflects something specific about Zig’s design philosophy: explicit dependencies, minimal trust requirements, and preferring systems that can be understood end to end. Other languages made pragmatic choices to get their compilers shipped. Zig made a choice that is slightly harder to explain but easier to audit, and in 2022 that choice became part of the default build path.