From Egg to Production: What Cranelift's Aegraph Had to Sacrifice to Ship
Source: lobsters
The egg library landed in 2021 with a POPL paper by Willsey, Nandi, Wang, Flatt, Tatlock, and Panchekha that made a genuine case for equality saturation as a practical compiler optimization technique. The idea was elegant: instead of committing to a sequence of rewrite rules that might interfere with each other, accumulate all equivalent forms of every expression in a single data structure, defer the question of which form to use until the end, and let a separate extraction pass choose optimally. The technique produced compelling results across superoptimization, tensor graph rewriting, and regular expression equivalence checking.
Cranelift’s aegraph, documented by Chris Fallin, applies equality saturation to a production JIT compiler. The design is principled and the results are real. But getting from egg’s model to something deployable in Wasmtime required reconsidering almost every structural choice that makes egg appealing in theory. Tracing those changes reveals what production deployment actually demands from compiler infrastructure.
What Equality Saturation Promises
An e-graph represents a set of terms and a declared equivalence relation over them. Each e-node is an operator applied to child equivalence classes (e-classes). Each e-class groups all the e-nodes the system considers equivalent, with a union-find structure for efficient merging.
The rewrite model is the key distinction. Given a rule like (mul a 2) => (shl a 1), you do not discard the original. You insert both forms as e-nodes in the same e-class and let them coexist. Rules are applied until fixpoint (equality saturation), and only then does an extraction pass walk the e-graph, selecting the cheapest e-node from each e-class recursively. Because no information is destroyed during rewriting, the order in which rules fire does not affect what equivalences are discovered. Phase ordering problems, a persistent headache in traditional pass-based compilers, largely disappear.
This is genuinely better than what LLVM’s approach offers. LLVM’s mid-end is a pipeline of interdependent passes: InstCombine fires algebraic simplifications, GVN handles value numbering, LICM moves loop-invariant code, then SimplifyCFG cleans up the control flow the other passes disturb. Each pass runs to completion, then the next begins. If GVN discovers a new equivalence that would enable an InstCombine simplification, that simplification does not happen until InstCombine runs again in the next pipeline iteration. The optimizer has to be run multiple times at increasing cost to approximate convergence, and it may never converge.
The Extraction Problem That Blocked Deployment
The elegance of equality saturation collides with a hard computational problem at extraction time. If e-class A’s cheapest node depends on e-class B, and B’s cheapest node depends on A, the extraction algorithm faces a system of mutually recursive cost equations. In the general case, optimal extraction from a cyclic e-graph is NP-hard. Real implementations handle this with heuristics, iteration limits, and restrictions on cost functions, but there is no clean solution.
For research applications, this is manageable. Superoptimization and tensor rewriting are offline processes where compilation time is not a first-order constraint. A few seconds of extraction time on a complex graph is acceptable.
Cranelift compiles WebAssembly modules inside Wasmtime, which backs Fastly’s edge compute platform and serves as the default compiler for Firefox’s WebAssembly engine. Compilation happens before requests are served. An optimizer that can exhibit superlinear behavior on adversarial inputs is not something you can deploy at scale. Worst-case bounds matter more than average-case performance in this setting.
egg was not built for this environment, and it shows in the design. Its extraction phase is documented as potentially expensive; the library provides hooks for custom extractors precisely because the default approach does not handle all real-world cases efficiently.
The SSA Insight
The key observation behind Cranelift’s aegraph is that the inputs to the optimizer are not arbitrary terms. They are programs in SSA form.
SSA (Static Single Assignment) form requires that every value be defined exactly once, and every use of a value be dominated by its definition. Within a basic block, values flow strictly forward: each instruction can only reference values defined in earlier instructions or in function parameters. There are no backward edges within a basic block. Phi nodes at block boundaries handle loop-carried values, but within an acyclic region, the program’s data flow is already a directed acyclic graph.
The aegraph enforces this property structurally. Every e-node’s children must reference e-classes created earlier in a strict topological ordering. Forward references only. The graph is always a DAG.
With this constraint, extraction is no longer NP-hard. Processing each e-class in topological order, selecting the cheapest e-node, and recursing into its children (which are all already resolved) is a single linear pass. No cycles, no mutual recursion, no fixed-point equations. The computational complexity that blocks general e-graph extraction simply does not arise because the pathological inputs do not exist in well-formed SSA programs.
This is not a lucky coincidence. SSA’s dominance property encodes precisely the acyclicity that the aegraph needs, and Cranelift’s IR (CLIF, Cranelift Intermediate Format) is SSA-based. The constraint is not an artificial restriction imposed on the optimizer; it is a reflection of what the optimizer’s inputs actually look like.
Loop-carried values and phi nodes remain outside the aegraph. The optimizer handles acyclic regions and delegates control flow to separate machinery. The mid-end optimizations that matter most (constant folding, algebraic simplification, common subexpression elimination, strength reduction) are predominantly local: they inspect a node and its immediate children, not distant parts of the graph across loop boundaries. Restricting the aegraph to acyclic regions loses little in practice.
Trading Saturation for Bounded Time
The second major departure from egg is replacing global equality saturation with eager rewriting.
In egg’s model, rules are applied iteratively to the entire e-graph until no new equivalences can be discovered. This global view allows one rewrite in part of the graph to unlock rewrites elsewhere, and the full saturation process may require many passes. The benefit is completeness within the rule set: if a combination of rewrites leads to a simpler form, global saturation will find it.
The aegraph fires rules immediately when a new e-node is inserted. Because children are already fully processed (the acyclicity guarantee), there is no later state change in a child that could retroactively unlock a rule on the current node. Eager rewriting is a complete strategy for the DAG case; nothing is lost by not waiting for global saturation.
This is not a compromise in the case where inputs are acyclic. It is the correct algorithm for the domain, and it eliminates the variable compilation time that global saturation introduces.
ISLE as the Integration Layer
Cranelift uses ISLE (Instruction Selection Language Expressions) as a rule DSL for both instruction selection in the back-end and optimization rules in the mid-end. The aegraph optimizer compiles ISLE rules at build time into Rust pattern-matching code that fires on e-node insertion.
A rule to eliminate addition of zero:
(rule (simplify (iadd ty x (iconst ty 0)))
x)
When an iadd node is inserted into the aegraph, the generated code checks this pattern. On a match, the new e-node is immediately merged into the e-class for x. The redundant addition never materializes in output.
Having one rule language for optimization and instruction selection reduces the surface for inconsistency between the two phases and means that people writing new optimization rules use the same tooling as people writing instruction lowering patterns. The infrastructure is shared; the cognitive context is shared.
This is a meaningful contrast with LLVM, where InstCombine optimizations are hand-written C++ in lib/Transforms/InstCombine/, instruction selection patterns live in .td TableGen files, and the two systems have entirely different mechanisms, contributors, and review paths. The gap between them has historically been a source of missed optimization opportunities.
What the Aegraph Actually Resolved
Before the aegraph, Cranelift ran GVN, constant folding, and algebraic simplification as separate passes. Information discovered by one pass was not visible to the others until the next pipeline iteration. The aegraph consolidates them into a single graph construction and extraction phase, where every equivalence discovered during one rule application is immediately available to subsequent rules processing the same e-node.
The design is a case study in how academic ideas get industrialized. egg showed that equality saturation is a coherent framework. Cranelift’s aegraph shows what you have to change to use it in a JIT compiler: replace global saturation with eager rewriting, restrict the graph to the acyclic structure that SSA already enforces, and integrate the rule system with existing compiler tooling rather than treating it as a separate subsystem. Each change resolves a specific mismatch between the theoretical model and the production requirements. None of them are arbitrary, and none of them are free.