· 6 min read ·

Equality Saturation Without the Phase Ordering Problem: Cranelift's Aegraph

Source: lobsters

Traditional compiler middle-end optimizers have a sequencing problem. LLVM runs dozens of passes in a specific order: GVN before instcombine, LICM after loop canonicalization, dead code elimination scattered throughout. The ordering is not arbitrary. Optimization A often creates opportunities for B, and if B already ran, those opportunities are missed. This is the phase ordering problem, and it has shaped compiler architecture for decades.

E-graphs, and specifically the equality saturation technique built on top of them, offer a principled escape. Instead of applying rewrites sequentially and hoping the order is right, you apply all rewrites simultaneously to a compact representation of equivalent programs. Chris Fallin’s post describes how Cranelift implements this idea in a form that is both practical and efficient: the acyclic e-graph, or aegraph.

What an E-Graph Actually Is

An e-graph stores a set of expressions and equivalences between them. The core data structure consists of e-nodes (individual expressions, like add(x, y)) grouped into e-classes (sets of e-nodes known to be equivalent). When you apply a rewrite rule like (add x 0) => x, you do not replace the original expression. You add x to the e-class that contains add(x, 0), recording that these two things are equal.

This is the key property: rewrite rules are non-destructive. They accumulate knowledge rather than transforming the program in place. After all rules have been applied, an extraction pass chooses the best (lowest cost) representative from each e-class. The egg library demonstrated that this approach is practical for program synthesis and optimization at small scale, earning a POPL 2021 Distinguished Paper.

The trouble with general e-graphs is scaling. The union-find structure underlying egg is fast, but the graph can grow explosively. Some rewrite rules also create cycles. Commutativity (add a b = add b a) applied repeatedly creates an infinite loop at the representation level. Traditional e-graph implementations handle this with iteration limits and congruence closure, but neither is free. For production compilers targeting fast compilation, these costs matter.

The Acyclic Insight

Cranelift’s IR is in SSA form, which means values flow forward. Every instruction takes values as inputs and produces new values as outputs. Back edges in control flow (loops) are handled through block parameters, the SSA-form equivalent of phi nodes, but within the value graph, the structure is a directed acyclic graph.

This property is what the aegraph exploits. Because the underlying IR has no cycles in its value dependency graph, the e-graph built on top of it can also remain acyclic. Cranelift’s aegraph maintains a strict invariant: e-nodes only reference e-classes defined earlier in the program. This transforms equality saturation from an iterative, potentially non-terminating process into a single forward pass.

The practical consequence is significant. Traditional equality saturation requires running rules repeatedly until no new facts are discovered, a fixpoint computation. With an acyclic structure, one pass over the nodes in topological order is sufficient. Every node you process has already had all its inputs saturated. The result is a linear-time saturation algorithm where cost scales with program size plus the number of rewrites applied.

Memory bounds become predictable as well. General e-graphs can grow without bound when rules are expressive enough. The aegraph, constrained to acyclic structure, grows only as fast as the rewrite rules add new nodes to existing e-classes. Cranelift’s optimizer applies a bounded set of algebraic rewrites, so the graph stays manageable in practice.

ISLE and Declarative Optimization Rules

Cranelift writes its rewrite rules in ISLE, a domain-specific language for instruction selection and pattern matching. ISLE was originally designed for lowering Cranelift IR to machine instructions, but the same infrastructure now drives the mid-end optimizer.

A rule for folding addition with zero looks roughly like this:

(rule (simplify (iadd ty x (iconst ty 0))) x)

This reads as: when simplifying an integer add instruction where one operand is a constant zero, replace it with the other operand. The simplify term is the mid-end optimizer’s entry point. ISLE compiles these pattern-matching rules into efficient Rust decision trees at build time, so the runtime cost is a traversal of precompiled match logic rather than interpreted rule application.

In the aegraph context, these rules apply to e-classes rather than to specific nodes. When the rule fires, it adds the result to the e-class of the matched expression. The extraction phase later picks the best member of that e-class, which for a zero-add fold is just the plain value with no computation.

Having instruction selection and mid-end optimization share the same language is not purely an engineering convenience. It means the same pattern infrastructure handles both target-independent algebraic simplifications and target-specific lowering patterns. Rules can compose across these levels in ways that would require explicit interaction between separate subsystems in a more traditional architecture.

How This Compares to LLVM

LLVM’s middle end is a sequence of named passes. InstCombine handles peephole optimizations. GVN does global value numbering. SCCP does sparse conditional constant propagation. Each is an independent analysis and transformation, run in a specific order.

The phase ordering problem shows up concretely: InstCombine might simplify an expression in a way that makes it eligible for GVN, but if GVN already ran, the deduplication opportunity is gone unless you run another GVN pass. LLVM’s solution is to run instcombine multiple times at specific positions in the pass pipeline and to repeat the simplifycfg + instcombine + gvn sequence inside loop processing. This works well because the LLVM team has decades of experience tuning the ordering, but it is fundamentally heuristic.

The aegraph approach does not have a phase ordering problem in the same sense. When you saturate the e-graph, all applicable rules fire. Extraction then picks the best result. The question of whether running rule A before rule B would have helped is irrelevant; both fire, and their combined effects are visible at extraction time.

In practice, Cranelift’s rule set is currently more limited than LLVM’s optimization suite. LLVM also handles loop transformations (vectorization, unrolling, interchange) that operate at a different granularity than the per-instruction rewrites Cranelift’s aegraph focuses on. These are complementary rather than competing concerns; the aegraph targets the algebraic simplification layer, not the loop restructuring layer.

Scoped Extraction and Control Flow

One subtlety in applying e-graphs to a full function is control flow. An optimization valid inside a loop body might not be valid outside it. Cranelift handles this through scoped extraction: the aegraph is built over the entire function, but extraction respects the dominance tree of the control flow graph.

When extracting from an e-class, the extractor considers which members are valid at the current program point. A node that uses a loop-variant operand is not valid to hoist outside the loop. Dominance information gates which alternatives are usable at each point.

This is where the acyclic constraint and SSA form interact most tightly. Because values are defined once and referenced by later instructions, dominance relationships are already encoded in the structure of the graph. The extractor follows the same ordering as the program’s dataflow, so correctness constraints are structural rather than separately checked.

Side-effecting instructions, loads and stores being the primary cases, require additional care. A load cannot generally be duplicated or reordered freely. Cranelift’s aegraph tracks these instructions specially, preserving their ordering constraints while still allowing the pure value computations around them to be fully saturated and optimized.

Why This Fits Cranelift’s Goals

Cranelift is not trying to be LLVM. Its primary use cases are Wasmtime (WebAssembly compilation) and as an optional fast backend for Rust. Both care about compilation speed at least as much as peak code quality. A mid-end optimizer that runs in linear time with predictable memory use fits those constraints well.

The design also has a testing and maintenance advantage. Declarative rewrite rules in ISLE are individually testable and auditable. A rule that looks wrong is easy to isolate and remove. In a traditional pass-based system, an incorrect optimization might interact with several other passes before a bug surfaces, making root cause analysis expensive.

The broader significance is that the aegraph demonstrates a path to making equality saturation practical for production compiler middle ends. The key was not a new theoretical result but a clear-eyed observation about the structure that was already present in SSA-form IR. SSA form gives you a DAG for free, and a DAG gives you equality saturation without the scaling problems that made general e-graphs seem impractical. Cranelift’s implementation is a concrete existence proof that this tradeoff is worthwhile.

Was this interesting?