Equality Saturation Without the Cycle Problem: Inside Cranelift's Acyclic E-Graph

The phase ordering problem is one of the oldest frustrations in compiler middle-end design. Constant folding reveals opportunities for dead code elimination. Dead code elimination reveals opportunities for GVN. GVN reveals opportunities for loop-invariant code motion. Run them in the wrong order and you leave optimizations on the table; run them all repeatedly until convergence and you pay a quadratic compile-time bill. This is the situation most production compilers live in.

Cranelift, the code generator at the heart of Wasmtime and used by Fastly’s edge compute infrastructure, took a different path. Its mid-end optimizer is built around an acyclic e-graph, a structure that can represent multiple equivalent forms of an expression simultaneously and apply all rewrites in a single saturation pass. Chris Fallin’s detailed writeup explains the design in depth. What I want to do here is give that design its broader context: where e-graphs come from, why cycles are the hard part, and why the acyclicity insight is the thing that makes this work in a production compiler rather than a research prototype.

E-Graphs and Equality Saturation

An e-graph is a data structure originally developed for automated theorem provers and term rewriting systems. The core idea is that instead of committing to a single representation of an expression, you maintain a set of equivalent representations. Each expression node (an “e-node”) belongs to an equivalence class (an “e-class”), and when you discover that two expressions are equivalent, you merge their e-classes. The e-graph then compactly represents the entire space of equivalent terms.

Equality saturation is the optimization strategy built on top of this: apply every applicable rewrite rule exhaustively until no new equalities can be discovered, then extract the best term from each e-class. The critical property is that because you never destroy information (you only add equivalences), you avoid the phase ordering trap. You don’t have to choose between applying rule A before rule B or B before A; both get applied, and their interactions are represented structurally.

The egg library (“e-graphs good”), developed by Max Willsey and colleagues at the University of Washington, demonstrated that equality saturation is practical for real optimization tasks. The key innovation in egg was “rebuilding,” a batched invariant-restoration step that amortizes the cost of maintaining the e-graph’s internal consistency across rule applications. Egg enabled work like Herbie (floating-point expression improvement) and Tensat (tensor program optimization) to use e-graphs as a practical optimization backend.

But egg’s e-graphs are general-purpose. They allow cycles. And cycles are where the integration into a production compiler falls apart.

Why Cycles Are the Hard Part

In a general e-graph, it is entirely legal to have an e-class that, through a chain of child e-classes, eventually refers back to itself. This can arise naturally: if you have expressions x + 0 and x, they are equivalent, and you might represent them in the same e-class with the addition node pointing to an e-class for x and an e-class for 0. That’s acyclic. But consider loop bodies and feedback edges in a control flow graph. A naive translation of looping code into e-nodes creates cycles.

Cycles create two concrete problems. The first is extraction. Once you have saturated the e-graph and want to extract the optimal expression, you need to pick one e-node from each e-class and recursively pick optimal representatives for all child e-classes. This is a minimum-cost problem over the e-graph structure. When the graph is acyclic, this is a straightforward dynamic programming problem solved in topological order. When cycles exist, the problem becomes much harder, equivalent to finding minimum-cost solutions in recursive grammars, and the general case is NP-hard.

The second problem is the rebuilding cost itself. Egg’s rebuilding maintains the invariant that parent pointers in the union-find structure are consistent after merges. The cost of this step is bounded, but cycles in the e-class graph complicate the analysis and implementation. In practice, production compilers deal with large functions and need guaranteed compile-time performance, not just amortized bounds.

This is why, despite the elegance of equality saturation, almost no production compilers used it before Cranelift’s aegraph. The standard middle-end is a pipeline of named passes: LLVM has InstCombine, GVN, LICM, SCCP, and dozens more, each encoding a narrow slice of optimization wisdom, each run in a carefully tuned order that the compiler team has empirically found works well for typical code. It is not principled; it is archaeology.

The Acyclicity Insight

Cranelift’s key observation is that SSA form is inherently acyclic in its value dependencies. In SSA, every variable has exactly one definition, and that definition dominates all uses. If you draw a graph where each value definition points to the values it consumes, this graph is a directed acyclic graph (DAG), with one exception: phi nodes at loop headers, which take values from predecessors including back-edges.

The aegraph exploits this by restricting the e-graph to the value-dependency DAG and handling phi nodes specially. Loop-carried values are not represented as e-graph edges; instead, they are treated as opaque inputs to the loop body. This means the e-graph itself remains acyclic, and all the hard problems of general e-graphs become tractable:

Extraction is a topological traversal of the DAG. For each e-class, visit children first, then pick the cheapest e-node in the class given the costs already computed for its children.
There is no rebuilding step in the egg sense, because acyclicity means the e-class structure can be maintained incrementally without cycles complicating the parent-pointer invariants.
Memory usage is bounded by the size of the program’s value graph, not by the explosion of equivalent terms that a general e-graph might generate.

This is documented in the original aegraph design post from November 2023, and the new writeup expands on the implementation and performance story. The acyclicity is not a limitation imposed from outside; it is a structural property of the problem domain that, once recognized, makes the whole approach snap into place.

The Structure in Practice

Cranelift’s intermediate representation is called CLIF (Cranelift Intermediate Format). It is a traditional SSA IR with explicit basic blocks and phi nodes (called “block parameters” in CLIF terminology). The aegraph operates on the value graph portion of this IR.

Each CLIF value maps to an e-class in the aegraph. When the optimizer runs a rewrite rule and discovers that some expression e is equivalent to some other expression e', it unions the e-classes of e and e'. New e-nodes are added to existing e-classes. At the end of the pass, extraction walks the e-class DAG in topological order and selects the cheapest representative.

The rewrite rules themselves are specified in ISLE (Instruction Selection and Lowering Engine), Cranelift’s domain-specific language for pattern matching on IR terms. ISLE rules look roughly like:

(rule (simplify (iadd ty x (iconst ty 0)))
      x)

(rule (simplify (imul ty x (iconst ty 1)))
      x)

(rule (simplify (iadd ty (iconst ty k1) (iconst ty k2)))
      (iconst ty (add_overflow ty k1 k2)))

ISLE compiles these rules into Rust code that pattern-matches on e-nodes and fires when applicable. The match-and-apply loop runs until saturation, i.e., until no rule fires and adds a new equivalence. Because the e-graph is acyclic and the rule set is finite and productive (rules reduce in some well-founded ordering), saturation is guaranteed to terminate.

This subsumes what LLVM spreads across InstCombine (algebraic identities), SCCP (sparse conditional constant propagation), and GVN (global value numbering) into a single framework. New optimizations are added by writing new ISLE rules, and they automatically interact with all existing rules without any ordering concerns.

Comparing with Other Mid-End Approaches

LLVM’s middle-end is the most widely studied alternative. It consists of dozens of passes, each with hand-tuned ordering in the standard -O2 and -O3 pipelines. The new pass manager, introduced in LLVM 13 as the default, added infrastructure for composing and ordering passes more flexibly, but it did not change the fundamental model: passes are sequential, and phase ordering matters.

GCC uses a similar sequential-pass model, augmented by its GIMPLE and RTL intermediate representations. The mid-end has become increasingly complex over decades of tuning.

A different line of research led to the RVSDG (Regionalized Value State Dependence Graph), which also uses a DAG-based representation for program values and explicitly represents control flow as a hierarchy of regions. RVSDG has attractive theoretical properties and has been used in the Jlm compiler. The aegraph and RVSDG share the observation that SSA’s value DAG is the right level at which to apply optimizations, though they arrive at different representations and different optimization strategies.

Where Cranelift’s aegraph stands out is in its pragmatism. It does not require abandoning a conventional SSA IR; it operates as a mid-end pass over CLIF and hands results back to the rest of the pipeline. The acyclicity constraint is not a global architectural commitment; it is a scoped invariant maintained within the optimizer pass.

Performance Characteristics

One of the recurring concerns about equality saturation is compile-time cost. Saturation until a true fixpoint sounds expensive. In practice, the aegraph keeps this manageable through two mechanisms.

First, acyclicity bounds the size of the e-graph. In a general e-graph with cycles, the closure under rewriting can grow combinatorially. In the acyclic case, the value graph has a fixed topology, and rules can only add new e-nodes to existing e-classes, not create new topology. The e-graph grows linearly in the number of rules fired.

Second, rules are applied in a single forward pass over the topological order of the value DAG, then iterated until no new equivalences appear. In practice, most functions saturate in very few iterations because the most productive rules (constant folding, identity simplifications, reassociation) fire early.

Cranelift measures compile time against Wasmtime’s workloads, and the aegraph pass has shown consistent improvements over the previous sequential-pass mid-end it replaced. The gains come not just from better optimized output but from eliminating redundant passes: when algebraic simplification is no longer a separate pass from GVN, you stop paying for the same IR traversals twice.

Why This Matters Beyond Cranelift

The aegraph design demonstrates something that compiler research has struggled to translate into practice for years: that equality saturation is not inherently a research-only technique. The blocker was never theoretical; it was the cycle problem and the resulting extraction hardness. By recognizing that a conventional SSA value graph is already acyclic, Cranelift sidesteps the hard problem entirely.

This opens a path for other production compilers. A compiler that already uses SSA form (which is most of them, at this point) can add an aegraph-style mid-end optimizer without redesigning its IR or its backend. The ISLE rule language further decouples optimization logic from the optimizer infrastructure, making the rule set incrementally extensible.

The deeper lesson is about the relationship between data structure choice and problem tractability. Traditional e-graphs are expressive but the general case is hard. By identifying a structural property of the domain (acyclicity of SSA value dependencies) and embedding it into the data structure, Cranelift converts a hard problem into a tractable one. This is a pattern that shows up repeatedly in systems work: the right constraint, chosen for the right reason, does not limit what you can express. It limits what you have to worry about.