Systems
Tri-Color Marking
Three colors that let a garbage collector trace the heap without stopping your program
Tri-color marking is a garbage-collection technique that partitions the heap into white, gray, and black objects so a collector can trace live data concurrently with the running program, using a write barrier to preserve the tri-color invariant.
- Trace costO(live heap)
- Colorswhite / gray / black
- Work queuethe gray set
- Terminates whenno gray left
- Mutator syncwrite barrier
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How tri-color marking works
A naive mark-sweep collector has to stop the world. It freezes every application thread, walks the entire object graph from the roots marking everything it can reach, then sweeps away whatever it didn't mark. The trace is safe precisely because nothing moves while it runs — the set of reachable objects can't change mid-walk. The price is a pause proportional to the live heap. On a 10 GB heap that can be hundreds of milliseconds, and your latency-sensitive service just missed its SLA.
Tri-color marking is the trick that lets the trace run while the program keeps executing. Edsger Dijkstra, Leslie Lamport, and three co-authors formalized it in their 1978 paper "On-the-Fly Garbage Collection: An Exercise in Cooperation." The idea is to give every object one of three colors:
- White — not yet proven reachable. At the start of a cycle, every object is white. Anything still white at the end is garbage.
- Gray — proven reachable, but not yet scanned. The collector knows this object is live but hasn't followed its outgoing pointers yet. Gray is the frontier of the search.
- Black — proven reachable and fully scanned. All of its children have been discovered (they're at least gray). The collector is done with a black object and will never look at it again this cycle.
The algorithm is just a graph traversal expressed in those colors. Shade the roots gray. Then repeatedly pick a gray object, shade all of its white children gray, and — once all its children are handled — shade it black. The gray set is the work queue: a worklist of "found but unexplored." When the gray set drains to empty, the search has reached a fixed point. Every reachable object is black; every white object is unreachable. Sweep the white ones.
This is breadth-first or depth-first search depending on whether the gray set is a queue or a stack — the colors don't care. What the colors add is a way to reason about a partially-complete trace, which is exactly what you need when the program is allowed to scribble on the graph while you walk it.
The tri-color invariant — and why it breaks
The whole scheme rests on one rule, the strong tri-color invariant:
No black object points directly to a white object.
Why does that matter? Black means "fully scanned — I will never revisit it." So if a black object ends up pointing to a white object, and that black pointer is the only path to the white object, the collector will finish the trace, never having followed that edge, and reclaim a live object. That's a use-after-free waiting to happen — the deadliest bug a memory manager can ship.
During a stop-the-world trace the invariant holds for free: nothing can create a black-to-white edge because the program is frozen. The danger appears the instant you let the program run concurrently. Consider this sequence, the canonical "lost object" race:
Initial: A(black) ──> B(gray) ──> C(white)
A is done. B is on the worklist. C is reachable only via B.
Mutator: A.field = C // A is black, C is white → invariant violated
B.field = null // last gray/white path to C is cut
Now: A(black) ──> C(white) and nothing else points to C
The collector has no gray object left that reaches C.
It finishes, C stays white, the sweep frees a live object. 💥
Two conditions had to coincide: the mutator wrote a white pointer into a black object, and it destroyed every other path the collector still had to that white object. Block either condition and the object survives. That's what a write barrier does.
Write barriers — the mutator's part of the deal
A write barrier is a small piece of code the compiler injects around every pointer store (obj.field = ptr) while a concurrent collection is in progress. It's how the mutator "cooperates" with the collector. Two classic designs preserve two different invariants:
- Dijkstra / incremental-update barrier. On
slot = white_ptr, if the object owningslotis black, shade the new target gray. This re-greys the white object so it lands back on the worklist. Preserves the strong invariant: black→white edges are never allowed to persist. - Yuasa / snapshot-at-the-beginning (SATB) barrier. Before overwriting a pointer (
slot = anything), shade the old value gray. This guarantees everything reachable at GC start stays reachable for this cycle, even if the program later disconnects it. Preserves the weak invariant: a black→white edge is tolerated as long as every white object is also reachable from some gray object.
The distinction is "which pointer do I protect?" Dijkstra protects the destination of the new edge; Yuasa protects the source value that's about to be lost. Both close the race in the example above — Dijkstra by re-greying C when A.field = C runs, Yuasa by greying C's old reachability when B.field = null overwrites the path.
There are also read barriers (used by moving collectors like Azul's C4 and OpenJDK's ZGC via load-reference barriers), which intercept pointer loads instead and are essential when objects relocate during the cycle. Read barriers fire far more often than write barriers, so they're only worth it when you need concurrent compaction, not just concurrent marking.
When to use tri-color marking
- Latency-sensitive services — trading systems, game servers, request handlers — where a 200 ms pause is a hard failure but slightly lower throughput is fine.
- Large heaps. Stop-the-world pause time scales with live heap; concurrent marking moves that work off the critical path, so multi-gigabyte heaps stay responsive.
- Multi-core machines with spare cores to run the collector concurrently with the mutator instead of stealing all of them at once.
- Incremental collectors on a single core, where you interleave tiny slices of marking between mutator steps to cap per-step pause time even without true parallelism.
It's overkill when pauses don't matter — a batch job that runs for an hour barely notices a 100 ms stop-the-world GC, and the simpler collector has higher throughput because it pays no barrier tax. Reference counting is the alternative when you want incremental reclamation without a tracing phase at all, at the cost of not collecting cycles.
Tri-color marking vs other reclamation strategies
| Tri-color concurrent | Stop-the-world mark-sweep | Reference counting | Generational copying | |
|---|---|---|---|---|
| Max pause | sub-millisecond (roots only) | O(live heap) — 10s–100s ms | O(1) amortized, spikes on cascades | O(young live set) |
| Mutator overhead | write barrier on every pointer store | none during normal execution | inc/dec on every assignment | write barrier (remembered set) |
| Throughput | lower (barrier + extra cores) | highest | moderate (counter churn) | high (cheap young allocation) |
| Collects cycles? | yes | yes | no (needs a backup tracer) | yes |
| Reclaims promptly? | at end of cycle | at end of cycle | immediately at zero count | at end of minor cycle |
| Floating garbage | yes — survives one cycle | none | none (immediate) | yes — promoted survivors |
| Used by | Go, Java G1/CMS, ZGC mark | early JVMs, simple runtimes | CPython, Swift, Rust Rc | HotSpot young gen, V8 scavenger |
The headline trade is pause time for throughput and a barrier tax. Tri-color marking buys you bounded pauses by making the mutator do a little work on every pointer write and by tolerating floating garbage — objects that died after the cycle started but won't be collected until the next one.
What the numbers actually say
- Go's collector targets sub-millisecond stop-the-world pauses. Before the 1.5 concurrent collector (2015), Go programs saw pauses in the tens to hundreds of milliseconds; the tri-color concurrent design (Dijkstra-style insertion barrier, later a hybrid barrier in 1.8) cut typical worst-case pauses below 1 ms, and the 1.8 release explicitly eliminated the stack re-scan that had kept some pauses in the 10–100 ms range.
- The trace itself is still O(live heap). Concurrency doesn't make marking cheaper — it just moves it off the pause path. A 10 GB live heap still touches ~10 GB of pointers; you've spread that across wall-clock time instead of bunching it into one freeze.
- Write-barrier cost is a few instructions per pointer store. A typical fast-path barrier is a load, a compare, and a predicted-not-taken branch — on the order of a nanosecond when the slow path isn't hit. Pointer-heavy code can see low-single-digit percent throughput loss; pointer-light numeric code barely notices.
- Floating garbage is bounded by one cycle's allocation. Anything that dies mid-cycle waits for the next cycle, so steady-state heap overhead is roughly the garbage produced during one collection's duration — a reason concurrent collectors run with more headroom than stop-the-world ones.
JavaScript implementation
A single-threaded model that interleaves the mutator and a Dijkstra-barrier collector. The barrier is the load-bearing line.
const WHITE = 0, GRAY = 1, BLACK = 2;
class Obj {
constructor(id) { this.id = id; this.color = WHITE; this.refs = []; }
}
class Heap {
constructor() { this.objects = []; this.roots = []; }
alloc(id) { const o = new Obj(id); this.objects.push(o); return o; }
// Dijkstra (incremental-update) write barrier:
// store `target` into `holder.refs[i]`; if holder is BLACK and target is WHITE,
// re-shade the target GRAY so the collector revisits it.
write(holder, i, target) {
if (holder.color === BLACK && target && target.color === WHITE) {
target.color = GRAY; // restore the strong invariant
this.gray.push(target);
}
holder.refs[i] = target;
}
}
function collect(heap) {
// 1. Shade roots gray. Everything else starts white.
for (const o of heap.objects) o.color = WHITE;
heap.gray = [];
for (const r of heap.roots) { r.color = GRAY; heap.gray.push(r); }
// 2. Drain the gray worklist. This loop can be sliced — do N items,
// yield to the mutator, resume. The barrier keeps it correct.
while (heap.gray.length) {
const o = heap.gray.pop();
for (const child of o.refs) {
if (child && child.color === WHITE) {
child.color = GRAY;
heap.gray.push(child);
}
}
o.color = BLACK; // fully scanned
}
// 3. Sweep: anything still white is unreachable.
const survivors = heap.objects.filter(o => o.color !== WHITE);
const freed = heap.objects.length - survivors.length;
heap.objects = survivors;
return freed;
}
Note that collect never inspects the program's call stack mid-loop — it relies entirely on the barrier in Heap.write to keep the gray set complete as the mutator runs between slices. Remove the barrier and the "lost object" race reappears.
Python implementation
The same algorithm, plus a deliberate demonstration of the race when the barrier is disabled — the famous failure this whole technique exists to prevent.
WHITE, GRAY, BLACK = 0, 1, 2
class Obj:
__slots__ = ('id', 'color', 'refs')
def __init__(self, oid):
self.id, self.color, self.refs = oid, WHITE, []
class Heap:
def __init__(self, barrier=True):
self.objects, self.roots, self.gray, self.barrier = [], [], [], barrier
def alloc(self, oid):
o = Obj(oid); self.objects.append(o); return o
def write(self, holder, i, target): # holder.refs[i] = target
if self.barrier and holder.color == BLACK and target and target.color == WHITE:
target.color = GRAY # Dijkstra insertion barrier
self.gray.append(target)
# pad refs list if needed, then store
while len(holder.refs) <= i:
holder.refs.append(None)
holder.refs[i] = target
def mark_init(heap):
for o in heap.objects: o.color = WHITE
heap.gray = []
for r in heap.roots:
r.color = GRAY; heap.gray.append(r)
def mark_step(heap, n=1): # do up to n units of work, then return
done = 0
while heap.gray and done < n:
o = heap.gray.pop()
for c in o.refs:
if c and c.color == WHITE:
c.color = GRAY; heap.gray.append(c)
o.color = BLACK
done += 1
return not heap.gray # True once the trace finishes
def sweep(heap):
keep = [o for o in heap.objects if o.color != WHITE]
freed = len(heap.objects) - len(keep)
heap.objects = keep
return freed
# --- the race, with the barrier turned OFF ---
def demo(barrier):
h = Heap(barrier=barrier)
A, B, C = h.alloc('A'), h.alloc('B'), h.alloc('C')
h.roots = [A]
h.write(A, 0, B); h.write(B, 0, C) # A -> B -> C
mark_init(h)
mark_step(h, 1) # scan A: B grays, A blackens
# mutator interleaves here, mid-trace:
h.write(A, 0, C) # A(black).field = C(white)
h.write(B, 0, None) # cut the only safe path to C
while not mark_step(h, 1): pass # finish the trace
return 'C' in [o.id for o in h.objects if o.color != WHITE]
assert demo(barrier=True) is True # barrier saves C
assert demo(barrier=False) is False # no barrier: C wrongly collected
The two asserts are the entire point of the article in four lines: with the barrier, the black-to-white store re-greys C and it survives; without it, C is silently freed while still live.
Variants worth knowing
Snapshot-at-the-beginning (Yuasa, 1990). Instead of protecting newly-stored pointers, protect deleted ones: before any overwrite, shade the old value gray. This conceptually photographs the heap at GC start and keeps everything in that photo alive for the cycle. It produces more floating garbage than Dijkstra but never needs to re-scan thread stacks, which simplifies termination.
Go's hybrid write barrier (1.8, 2017). Combines a Yuasa-style deletion barrier with a Dijkstra-style insertion barrier so that thread stacks can be marked black and left alone — no expensive stop-the-world stack re-scan at the end of marking. This is what pushed Go's worst-case pauses below a millisecond.
Incremental vs concurrent. Incremental marking interleaves small marking slices with the mutator on the same thread (V8's old-space marker works this way). Concurrent marking runs the collector on separate threads in parallel with the mutator. Both use tri-color colors and a write barrier; they differ only in scheduling.
Mark-compact and concurrent copying. When you also want to defragment, marking alone isn't enough — you must move objects, which invalidates pointers. ZGC and Shenandoah pair tri-color marking with a load/read barrier and colored pointers (bits stuffed into the address) so relocation can happen concurrently too.
Card marking. A coarse, cache-friendly write barrier: instead of recording individual pointers, divide the heap into fixed-size "cards" (often 512 bytes) and just dirty a byte in a side table when any pointer in that card is written. The collector later re-scans only dirty cards. Common in generational collectors for the old-to-young remembered set.
Common bugs and edge cases
- Forgetting the barrier on a single store path. One unbarriered pointer write — say, in hand-written assembly or an unsafe FFI boundary — reintroduces the lost-object race. The bug is non-deterministic and corrupts memory, making it among the hardest GC bugs to reproduce.
- Allocating white during marking. If new objects are born white mid-cycle, a black object can point to a brand-new white object the trace never reaches. Most collectors allocate objects black (or gray) during the mark phase to sidestep this entirely.
- Stack re-scan termination. Roots (stacks, registers) change constantly. A pure Dijkstra collector must re-scan stacks at the end, and if the mutator keeps mutating, termination can livelock. Hybrid and snapshot barriers exist largely to fix this.
- Confusing the weak and strong invariants. A Yuasa barrier permits black→white edges; that's correct under the weak invariant but looks like a bug if you assert the strong one. Match your assertions to your barrier.
- Floating garbage mistaken for a leak. Objects that die during a cycle linger until the next one. A heap that looks "too big" right after a collection may just be holding one cycle's worth of floating garbage, not leaking.
- Treating gray as a set when you need FIFO/LIFO order. The gray collection's order changes locality and pause distribution (DFS vs BFS) but never correctness — just don't drop items, and don't process an object before all its children are at least gray.
Frequently asked questions
What do white, gray, and black mean in tri-color marking?
White means not yet proven reachable — a candidate for collection. Gray means reachable but not fully scanned — its outgoing pointers haven't all been followed yet. Black means reachable and fully scanned — every object it points to is at least gray. When no gray objects remain, every white object is provably garbage.
What is the tri-color invariant?
The strong tri-color invariant says no black object may point directly to a white object. If that ever held without enforcement, a black object (already scanned, never revisited) could be the only path to a white object, which the collector would then wrongly reclaim. A write barrier restores the invariant whenever the program creates a black-to-white pointer.
Why does concurrent GC need a write barrier?
Because the program keeps mutating pointers while the collector traces. Without intervention, the mutator can hide a white object behind a black one: store a white reference into a black object, then delete the last gray/white path to it. The barrier intercepts that store and either shades the white object gray (Dijkstra) or shades the overwritten reference (Yuasa), so nothing live is lost.
What's the difference between a Dijkstra and a Yuasa write barrier?
A Dijkstra (incremental-update) barrier shades the NEW target gray when a black object gains a pointer to white, preserving the strong invariant. A Yuasa (snapshot-at-the-beginning) barrier shades the OLD value gray before a pointer is overwritten, preserving the weak invariant by keeping everything alive at GC start. Dijkstra reclaims more promptly; Yuasa needs no re-scan of roots.
Does tri-color marking eliminate stop-the-world pauses entirely?
No. Real collectors like Go's and Java's G1/ZGC still take short stop-the-world pauses to enable barriers, scan roots, and terminate the mark phase. The win is that the bulk of the trace — which is proportional to live heap size — runs concurrently, so pauses drop from tens of milliseconds to well under a millisecond on large heaps.
Why three colors instead of two?
A plain mark-sweep collector uses just marked and unmarked — two colors — but only because it stops the program, so the marked set never goes stale. Concurrency needs a third color to distinguish "reached but not yet scanned" (gray) from "reached and fully scanned" (black). The gray set is the work queue; the trace terminates exactly when it empties.