· 6 min read ·

How Zig's Bootstrap Strategy Solves a Problem Other Languages Ignored

Source: zig

Looking back at the announcement on ziglang.org from December 2022, it’s easy to read the “Goodbye to the C++ Implementation of Zig” post as a feel-good milestone story: Andrew Kelley spent years building a compiler in C++, then spent more years replacing it with one written in Zig, and now the C++ code is gone. But the actual technical substance of what changed, and how the new compiler bootstraps itself into existence, is worth examining closely. The method reveals deliberate thinking about a problem most language projects treat as an afterthought.

The C++ Compiler Was Not Just Technically Inferior

Stage1, as the C++ compiler came to be called, was functional for years. It compiled real Zig programs and powered the language through several significant version releases. But it had structural limitations that made certain directions impossible without a full rewrite.

The most important of these was incremental compilation. Stage1’s data structures were not designed around the concept of invalidation, the ability to say “this declaration changed, so recompute only what depended on it.” Bolting incremental compilation onto stage1 would have required rebuilding most of its internals anyway. The same problem applied to comptime evaluation: stage1’s implementation had known edge cases where it would produce incorrect results, and fixing them properly required rethinking the evaluation model at an architectural level.

Error message quality was another casualty of stage1’s design. Producing precise, structured diagnostics with source spans requires threading location information throughout the compiler from the start. Stage1 didn’t do this consistently, which meant error messages were often less useful than they could have been. This matters more than it might seem for a language whose compiler is supposed to replace C in systems programming contexts, where clear diagnostics directly affect debugging time.

Kelley also noted the build system issue: stage1 was built with CMake, which sits outside Zig’s own ecosystem. That was both philosophically awkward and practically limiting, since it meant the compiler’s own build couldn’t use any of the features Zig was supposed to offer for builds.

What Stage2 Actually Looks Like

The self-hosted compiler introduces a two-IR pipeline. Source files are first converted into ZIR (Zig Intermediate Representation), which is untyped and preserves the generic structure of the program. Generic functions, for instance, are stored as ZIR and only instantiated when called with concrete types. ZIR is then processed by Sema, the semantic analysis phase, which resolves types, evaluates comptime expressions, and produces AIR (Analyzed IR), a typed and fully monomorphized representation ready for code generation.

This split matters for incremental compilation. ZIR can be cached per file. When a file changes, only the affected declarations need to go through Sema again. Downstream dependents, declarations in other files that reference changed ones, can be identified and reanalyzed selectively. The full incremental compilation system was still being developed after the 0.10.0 release, but the architecture was designed for it from the beginning in a way stage1 never was.

Stage2 also replaced stage1’s single LLVM backend with multiple backends. The LLVM backend remains for release builds, where optimization passes matter. But for debug builds, stage2 offers a self-contained x86_64 backend that emits machine code directly without going through LLVM at all. Bypassing LLVM’s optimization pipeline entirely produces debug binaries that compile roughly five to ten times faster, making the development feedback loop substantially tighter. An aarch64 backend and a C backend round out the options, the latter being useful for platforms without an LLVM target or as an alternative for portability.

The Bootstrap Problem and Why WASM Is the Right Answer

Self-hosting a compiler creates a circular dependency: you need the compiler to build the compiler. Every language solves this with some kind of seed artifact, a pre-existing binary capable of compiling the first version of the self-hosted code. The choices made here reveal a lot about the project’s priorities.

Go’s approach, implemented when Go 1.5 shipped in 2015, was to require a previous Go binary as the bootstrap compiler. You need Go 1.20 or later to build current Go versions. This means Go’s bootstrap chain requires downloading a platform-specific binary for every architecture you want to build on. It works, but it introduces a dependency on pre-built native binaries that can’t be inspected easily.

Rust’s bootstrap process is similar but heavier. The rustup-based bootstrap downloads a previous stable release of rustc as the stage0 compiler. This can be well over a hundred megabytes for each target platform, and it requires an internet connection and pre-built binaries distributed from Rust’s infrastructure. The tradeoff is that Rust has always been self-hosted and never had a “goodbye C++” moment, but the bootstrap cost is real.

Zig’s approach is different and cleaner. The seed artifact is zig1.wasm, a WebAssembly binary of a previous Zig compiler version committed directly to the repository. Alongside it is a small C program (bootstrap.c) containing a minimal WASM interpreter. The entire bootstrap process requires only a C compiler, which is available on essentially every platform Zig targets.

The WASM binary serves as a single, platform-agnostic seed. There are no platform-specific binaries for different architectures. The same zig1.wasm bootstraps Zig on Linux x86_64, macOS ARM, Windows, or any other platform with a C compiler. The artifact is inspectable as a binary, which matters for the class of supply chain concerns Ken Thompson raised in his “Reflections on Trusting Trust” paper: you can, in principle, audit the WASM binary’s behavior in a way that platform-native binaries make much harder.

The zig1.wasm file is intentionally minimal. It’s compiled from a previous Zig compiler version using the WASM backend, without LLVM, so it only needs to be capable enough to compile the next stage. It doesn’t need optimization passes or the full feature set of a production compiler. This keeps the artifact small and the verification burden lower.

The Build System as a Self-Hosting Story

One aspect of the transition that gets less attention is the build system. Stage1 was built with CMake. Stage2 is built with build.zig, Zig’s own build system that was also introduced as part of the self-hosted ecosystem.

The Zig build system works by compiling and executing a build.zig file at the root of each project, which describes the build graph as Zig code. This means the build system is itself a Zig program, compiled by the Zig compiler that it’s helping to build. For the compiler’s own build, this closes a loop: the tool, the language it’s written in, and the build system that produces it are all the same thing.

As of Zig 0.11.0, the build system also integrated with the package manager, where build.zig.zon files specify dependencies and zig build handles fetching and building them. The C++ stage1 compiler could not participate in any of this because it was, by definition, outside the Zig ecosystem.

What the Transition Actually Cost and What It Bought

The stage2 rewrite took roughly three and a half years and produced a codebase of around 200,000 lines of Zig, replacing approximately 100,000 lines of C++. The increase in line count reflects not just translation overhead but the additional capability: multiple codegen backends, a more complete comptime evaluator, better error diagnostics, and the incremental compilation infrastructure.

The --stage1 fallback flag was available briefly in Zig 0.10.0 for the transition period. By Zig 0.11.0, released in July 2023, the C++ compiler was deleted entirely. At that point, the project fully committed to the self-hosted implementation as its only supported path.

The most concrete benefit for anyone writing Zig today is compile time. Debug builds using the x86_64 backend skip LLVM entirely and produce binaries quickly enough that iterative development feels genuinely different from working with a compiler that always routes through LLVM’s optimization passes. Release builds still use LLVM and take correspondingly longer, but the split between debug and release build speed is now intentional and architectural rather than an accident of the optimization settings.

The comptime improvements are harder to quantify but meaningful: stage2’s comptime interpreter operates on ZIR and AIR rather than relying on LLVM’s constant folding, which means it handles a broader range of cases correctly and with more predictable behavior.

Zig’s self-hosting milestone is worth revisiting not because self-hosting itself is rare or surprising, but because the specific choices made, the WASM bootstrap chain, the dual-IR architecture, the multiple codegen backends, reveal a project thinking carefully about long-term correctness and portability rather than just checking a box. The C++ compiler was always meant to be temporary; what makes the transition interesting is that the replacement was clearly designed to last.

Was this interesting?