· 7 min read ·

The Third Color: What Tri-Color Mark-and-Sweep Actually Unlocks

Source: lobsters

Robert Nystrom’s “Baby’s First Garbage Collector” is one of the best introductions to garbage collection ever written. It is short, written in C, and leaves you with a mark-and-sweep collector that actually runs. The VM keeps a linked list of every allocated object; the mark phase recursively colors reachable objects black; the sweep phase walks the list and frees anything still uncolored. Two states, a pointer, a recursive walk. You can implement it in an afternoon.

Matheus Moreira’s follow-up article adds a third color. The change sounds cosmetic. It is not. The tri-color abstraction, first formalized by Dijkstra, Lamport, and colleagues in their 1978 paper on concurrent GC, is the conceptual foundation of the garbage collectors inside Lua 5.1, Go, Java’s G1 and ZGC, and most production collectors built over the last four decades. Understanding what the third color buys you is the difference between knowing how a GC works and knowing why real ones are built the way they are.

The Two-Color Problem

Nystrom’s two-color GC uses a single marked bit per object. During the mark phase, the collector recursively visits every object reachable from the root set (stack, globals) and sets that bit to 1. During the sweep phase, it walks all objects: free those with marked == 0, and reset marked to 0 on those with marked == 1 in preparation for the next cycle.

This works correctly, but it has a hard constraint: the entire mark phase must run to completion before the sweep begins, and nothing can modify the object graph in between. The reason is simple. Midway through a recursive mark traversal, some objects are marked and some are not. If the program (the “mutator” in GC parlance) were to run at this point and store a pointer from an already-marked object to an unmarked one, the collector would have no way to know it needs to visit that unmarked object. The mark phase has already passed through the marked object and will not revisit it. The unmarked object gets swept as garbage even though it is live.

This forces a stop-the-world design: pause all threads, run the full mark-and-sweep cycle, then resume. For small heaps, that pause is invisible. For large heaps or latency-sensitive applications, it is a serious problem.

Introducing the Third Color

The tri-color scheme addresses this by distinguishing between two previously conflated states. In a two-color GC, a marked object could mean either “we visited this object” or “we fully processed this object and all its children.” Tri-color separates those:

  • White: not yet reached, or determined to be garbage at the end of marking
  • Gray: reachable (discovered), but children not yet scanned
  • Black: fully scanned, all direct children have been colored gray or black

The algorithm replaces the recursive DFS with an explicit worklist:

// Initialize: all roots are gray
for each root in root_set:
    color(root, GRAY)
    push(worklist, root)

// Mark loop: process gray objects one at a time
while worklist is not empty:
    obj = pop(worklist)
    for each child of obj:
        if color(child) == WHITE:
            color(child, GRAY)
            push(worklist, child)
    color(obj, BLACK)

// Sweep: free all white objects
for each obj in heap:
    if color(obj) == WHITE:
        free(obj)
    else:
        color(obj, WHITE)  // reset for next cycle

This looks like a minor refactor. The crucial property it establishes is the tri-color invariant: no black object ever points directly to a white object. At any point during the mark loop, every object is in a well-defined state. Black objects have been fully processed. Gray objects are on the worklist. White objects have not been discovered yet, or are garbage.

Why the Invariant Enables Incrementality

Because the invariant holds at every step of the loop, you can pause the GC between any two iterations and resume later without correctness issues. This is the core of incremental garbage collection. Instead of running the entire mark phase atomically, you process a bounded number of gray objects per GC step, then yield back to the mutator.

Lua 5.1 introduced this approach, making its GC incremental. Each allocation does a small amount of GC work, keeping pause times proportional to the size of each individual GC step rather than the size of the heap. You can tune this with collectgarbage("setstepmul", n), which controls how much GC work is done per kilobyte of allocation. The Lua 5.1 reference manual calls these parameters “pause” and “stepmul” and gives you direct control over the trade-off between GC throughput and mutator throughput.

Go’s garbage collector uses the same underlying idea. The Go GC guide describes a concurrent mark phase where GC goroutines process the gray worklist while mutator goroutines continue running. Go keeps pause times under 500 microseconds in most workloads, down from multi-millisecond pauses in pre-1.5 versions.

The Write Barrier Problem

Incremental GC introduces a hazard that the two-color design does not have. If the mutator runs between GC steps, it can modify the object graph. Specifically, it can do this sequence:

  1. The GC has already colored object A black.
  2. The mutator stores a pointer from A to a newly created object C.
  3. The mutator removes the only other reference to C.
  4. The GC sweeps C as white (garbage) even though A is live and points to it.

This violates the invariant: a black object points to a white object, and the white object is not on any worklist. The collector frees a live object.

The fix is a write barrier: a small piece of code that runs on every heap pointer write and maintains the invariant. There are two canonical forms:

Insertion barrier (Dijkstra): When writing a pointer obj.field = ref, if obj is black, re-color it gray and add it back to the worklist. This prevents black objects from ever stably pointing to white objects.

Deletion barrier (Yuasa): When overwriting a pointer, if the old value points to a white object, immediately shade that white object gray. This preserves a “snapshot at the beginning” of the object graph.

Go uses a hybrid write barrier (introduced in Go 1.14) that shades both the old and new values on every pointer write. The proposal document explains the motivation: earlier versions required a stop-the-world “re-scan” of all goroutine stacks at the end of concurrent marking, because stack writes were not covered by write barriers. The hybrid approach shades both sides of each write, making the final STW phase much shorter.

Lua’s write barrier works differently. When you write a pointer into a black object, Lua re-grays the object (a “back barrier”). This is simpler to implement and sufficient for incremental, single-threaded GC, since there are no concurrent goroutines to worry about.

Write barriers are not free. They add a conditional check (and potentially a worklist push) to every heap pointer store. In practice this costs between 1% and 5% of throughput in typical workloads, a trade-off that incremental and concurrent GCs accept in exchange for bounded pause times.

Floating Garbage Is Acceptable

One consequence of the incremental approach is floating garbage: objects that become unreachable after the GC has already colored them gray or black are not collected in the current cycle. They will be collected in the next one. For most programs this is fine; for programs that allocate rapidly and need precise heap accounting, it requires some care.

Java’s ZGC takes this further. It uses colored pointers, storing GC metadata in unused bits of 64-bit pointers, and applies load barriers on every pointer read rather than just writes. This lets the GC relocate objects concurrently, achieving sub-millisecond pause times on heaps of hundreds of gigabytes. The cost is that every pointer dereference must check and potentially update the pointer. The same fundamental tri-color invariant underpins it, extended to handle concurrent relocation.

The Recursive Mark Was Always the Bottleneck

One practical lesson from Moreira’s article that gets less attention: the two-color recursive mark will stack overflow on deep object chains. A long linked list becomes a recursion depth equal to the list length. The tri-color worklist approach eliminates this because the call stack depth is now O(1) regardless of object graph depth. This is not just a theoretical concern; it is why production GCs have always used iterative mark loops. The transition from two colors to three also happens to fix the stack overflow problem without any additional effort.

What Comes After

Tri-color incremental mark-and-sweep is not the end of the GC design space. The next step is generational collection, which exploits the observation that most objects die young. By keeping a small nursery generation that is collected frequently and cheaply, and promoting survivors to an older generation that is collected less often, a generational collector can dramatically improve throughput. Lua 5.4 added a generational mode on top of its existing incremental GC. Java, Go, and .NET all use generational strategies at their core.

But the tri-color invariant remains central even in generational collectors. The inter-generational remembered sets (which track old-to-young pointers so that a minor collection can find all live young objects) are essentially a write barrier mechanism, logging cross-generation pointer writes so the collector does not miss roots.

Moreira’s article is worth working through because implementing it by hand forces you to confront the write barrier problem at the level where it is simplest to reason about. Most GC literature jumps straight to concurrent or generational designs where the same problems appear, but surrounded by significantly more complexity. Starting with a single-threaded incremental collector over a small object graph gives you an accurate mental model of the invariant before you have to think about memory ordering, evacuation, or concurrent root scanning.

Was this interesting?