· 6 min read ·

Zig's Bootstrap Gambit: What the Self-Hosted Compiler Reveals About the Language's Philosophy

Source: zig

Back in December 2022, the Zig team shipped version 0.10.0 and published a post they called “Goodbye to the C++ Implementation of Zig”. The announcement was straightforward: the compiler was now written in Zig instead of C++, and the old C++ codebase was being deleted. That framing undersells what actually happened. The interesting part is not that Zig joined the self-hosting club, but how it got there, and what architectural choices the team made that distinguish this transition from similar ones in Rust, Go, and D.

The Bootstrap Problem Is Not New

Every language that wants to implement its own compiler runs into the same fundamental constraint: you need the compiler to build the compiler. There is no getting around this, and the compiler community has been solving it in various ways since the 1970s. Ken Thompson’s 1984 Turing Award lecture made the problem famous by showing how a self-hosting compiler could hide a backdoor that survives even if you recompile from clean source, because the trust has to start somewhere.

The conventional solution is the staged bootstrap. You need a trusted seed binary from somewhere, and then you use it to compile the next stage, and then that stage compiles itself, and you check that the outputs match. If stage2 compiled by stage1 produces the same binary as stage2 compiled by itself, you have some confidence the compiler is consistent.

The question is where that seed binary comes from, and this is where languages have made very different choices.

Rust originally had a compiler written in OCaml. The OCaml compiler compiled the first Rust compiler, and eventually Rust compiled itself before 1.0 shipped in 2015. Today, building Rust from source requires a prior version of rustc, typically the previous stable release. The Rust bootstrap documentation describes a three-stage process: stage0 is a downloaded pre-compiled rustc, stage1 is compiled by stage0, and stage2 is compiled by stage1. This works well, but it means the dependency chain for building Rust is “Rust.”

Go went through a similar arc. Go 1.0 through 1.4 had a compiler written in C. For Go 1.5, the team mechanically translated that C compiler to Go using a tool, giving them a self-hosted compiler without a complete rewrite. Building recent versions of Go from source requires Go 1.20 or later. Again: you need Go to build Go.

Zig took a different path, and the choice reflects something fundamental about what the language is trying to be.

The WebAssembly Seed

Instead of requiring a pre-existing Zig installation to build Zig, the team compiled the self-hosted Zig compiler to WebAssembly and checked that wasm binary into the repository as zig1.wasm. To build Zig from source, you need a C compiler and a way to run that wasm file. Zig ships a small wasm interpreter written in C called zig-wasm-freestanding, a few thousand lines that have no dependencies beyond a C99 compiler.

The sequence is: the C compiler builds the wasm interpreter, the wasm interpreter runs zig1.wasm to compile the full Zig compiler source, and the result is a native Zig binary. That binary can then compile itself for stage3 verification.

This approach means the bootstrap dependency is “a C compiler,” not “a prior version of Zig.” The wasm blob is architecture-neutral, so the same file bootstraps Zig on x86_64, ARM, RISC-V, or anywhere else you can compile a C program. That is a much weaker prerequisite than requiring a language’s own runtime, and it aligns with Zig’s stated goal of being usable as a replacement for C in environments where you cannot assume much about the host toolchain.

The tradeoff is that the zig1.wasm file has to be maintained and updated periodically as the compiler’s interface changes. But the team clearly judged that maintenance cost worth paying in exchange for the reduced bootstrap dependency.

What the Self-Hosted Compiler Actually Looks Like

The new compiler’s internal architecture is a significant departure from the C++ implementation, not just a translation.

The compiler works through two intermediate representations. Source code is parsed into an AST, then lowered to ZIR (Zig Intermediate Representation), which is a flat, dense, array-based format rather than a pointer-heavy tree. From there, semantic analysis and type checking produce AIR (Analyzed Intermediate Representation), which captures the fully resolved program. AIR is then handed off to whichever backend is generating the output.

The multiple backends are probably the most consequential architectural decision. The C++ compiler supported only LLVM. The self-hosted compiler supports LLVM for optimized release builds, a hand-written native x86_64 backend for fast debug builds, a WebAssembly backend, and a C backend that transpiles Zig to C. The native x86_64 backend bypasses LLVM entirely; it emits machine code directly and produces binaries in a fraction of the time that LLVM needs. For the edit-compile-debug cycle, this matters more than release-mode throughput.

Incremental compilation was designed in from the start, which the C++ compiler never had. The compiler tracks dependencies between declarations and only re-analyzes what has changed. Combined with the native backend, this means recompiling a Zig program after a small change is fast in a way that felt out of reach with the old architecture.

The Memory Problem Was Real

If you used the C++ Zig compiler for any non-trivial project before 0.10.0, you encountered the memory usage. Compiling moderately-sized programs routinely consumed 1 to 4 gigabytes of RAM. The root causes were structural: the compiler held its entire AST and all intermediate state in memory simultaneously, used per-object heap allocation with all the fragmentation that entails, and performed no lazy analysis.

The self-hosted compiler addresses this through arena allocation (large allocations freed in bulk rather than individually), lazy analysis (only declarations that are actually referenced get analyzed), and the incremental compilation design. The result is roughly a 10x reduction in memory usage for typical workloads. For the Zig compiler compiling itself, the self-hosted compiler uses on the order of 100 to 400 megabytes where the C++ compiler would use multiple gigabytes.

For a systems language, memory consumption of the toolchain itself matters. If the compiler for your embedded systems language requires a gigabyte of RAM to run, that creates friction for CI environments, remote servers, and development on modest hardware.

Self-Hosting as Language Validation

There is a reason compiler developers treat self-hosting as a milestone worth announcing. Writing a compiler is one of the more demanding exercises in any programming language. A self-hosted compiler exercises the type system, the memory model, the standard library, the build system, and performance characteristics simultaneously. If the language is missing something important, compiler development will find it.

For Zig specifically, the self-hosted compiler is now the largest and most complex Zig program that exists. The team uses Zig’s comptime, error handling, and allocator conventions extensively throughout the compiler itself. Every time a Zig feature is awkward or buggy, the team experiences it directly. That feedback loop has practical value.

Andrew Kelley has said that Zig will not reach 1.0 until the language and toolchain are sufficiently stable. Self-hosting was a necessary step toward that, not because of any philosophical requirement, but because the C++ compiler had become an obstacle. Kelley described the C++ codebase as unmaintainable and the source of bugs that were nearly impossible to diagnose. Deleting it removed that entire category of problems.

Looking Back at the Transition

Reviewing this from early 2026, the 0.10.0 release looks like it delivered what the team promised. The memory situation improved substantially. Debug builds using the native backend are genuinely fast. The language server situation (through ZLS) improved because the self-hosted compiler’s analysis can be used directly by tooling rather than reimplementing Zig semantics separately.

The C backend also opened up platforms that LLVM does not target well, which matters for a language that explicitly wants to be a viable C replacement in embedded and niche environments.

The most distinctive thing about Zig’s path to self-hosting is not the technical achievement, which it shares with many other languages, but the bootstrap strategy. Choosing WebAssembly as a portable seed format and a small C interpreter as the only external dependency reflects the same values the language tries to embody in user code: minimal dependencies, portability, explicitness about what you’re relying on. The wasm blob is an unusual choice. It is also, given Zig’s goals, the most consistent one.

Was this interesting?