Systems

Mark-and-Sweep Garbage Collection

Trace everything you can still reach, then reclaim the rest

Mark-and-sweep is a tracing garbage collector that finds live memory by walking pointers from a set of roots, marking everything reachable, then sweeping the heap to reclaim every object left unmarked — in O(live) marking plus O(heap) sweeping per cycle.

Mark phaseO(live objects)
Sweep phaseO(heap size)
Extra space1 mark bit / object
Collects cyclesYes
First describedMcCarthy, 1960

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The one idea: reachability, not reference counts

A program's memory is a graph. The nodes are heap objects; the edges are pointers from one object to another. Some pointers, though, don't live in the heap at all — they sit in local variables on the stack, in CPU registers, and in global variables. Those are the roots. Everything your program can ever touch again, it must reach by following pointers starting from a root. If no chain of pointers connects a root to an object, the program has no way to name that object ever again — it is, by definition, garbage.

That single observation is the whole algorithm. John McCarthy spelled it out in 1960 in the paper that introduced Lisp, alongside the language itself: periodically, stop, paint every object reachable from the roots, then walk the entire heap and free anything you didn't paint. Mark, then sweep. No counting, no per-pointer bookkeeping — the collector reconstructs liveness from scratch each cycle by tracing the object graph.

This is why mark-and-sweep is called a tracing collector, in contrast to reference counting, which tracks liveness incrementally by storing a counter in each object and freeing it the instant the counter hits zero. The two techniques have opposite failure modes, and the contrast is the cleanest way to understand what tracing buys you — more on that below.

How a cycle runs: mark, then sweep

A collection cycle has exactly two phases, run back to back.

Phase 1 — Mark. Start with an empty worklist and push every root onto it. Pop an object, and if it isn't already marked, set its mark bit and push all the objects it points to. Repeat until the worklist is empty. This is just a graph traversal (a breadth-first search if the worklist is a queue, depth-first if it's a stack) over the reachable subgraph. When it finishes, every object the program can still reach is marked, and nothing else is.

    ROOTS                 HEAP
   ┌─────┐
   │stack│──▶ (A)──▶(B)──▶(D)        marked: A B D F
   │ reg │           │
   │globl│──▶ (F)    └──▶ ... (cycle back to A)
   └─────┘
                (C)──▶(E)            unmarked: C E  ← garbage
                 ▲      │            (they point at each other,
                 └──────┘             but no root reaches them)

Phase 2 — Sweep. Walk the heap linearly, object by object, from low address to high. For each object: if its mark bit is set, clear it (ready for next cycle) and move on. If it's clear, the object is unreachable — return its memory to the free list. The sweep touches every object on the heap, live or dead, which is why its cost is proportional to total heap size rather than to the amount of garbage.

Crucially, marking must complete before sweeping begins, and in the simplest design the program is paused for both — a stop-the-world collection. If the program could mutate pointers mid-mark, it might hand a live object a reference the collector has already passed, and the sweep would free memory still in use. Defeating that hazard without a full pause is the central engineering problem of modern collectors, and the reason tri-color marking and write barriers exist.

The cost, precisely

Let L be the number of live objects, E the number of pointers among them, and H the total heap size (live plus dead, measured in objects or bytes depending on the sweep granularity).

Marking is O(L + E). Each live object is pushed and popped once, and each outgoing pointer is examined once. Dead objects are never visited — that is the entire point. The mark bit makes "already visited?" an O(1) check, so there's no revisiting.
Sweeping is O(H). The collector must inspect every object to decide live-or-dead, including the garbage. This is the asymmetry that defines mark-sweep: marking ignores the dead, but the sweep cannot.
One cycle is O(L + E + H) = O(H). Since L ≤ H and a heap of H objects has at most O(H) pointers if object sizes are bounded, the whole cycle is linear in heap size.
Extra space is one bit per object for the mark, plus the worklist. A naive recursive mark can blow the stack on a deep linked list of L nodes (O(L) recursion depth), so production collectors use an explicit worklist — or pointer-reversal (the Deutsch–Schorr–Waite algorithm), which marks with O(1) auxiliary space by temporarily reversing pointers as it descends and restoring them as it returns.

The headline weakness is amortized throughput. Because the sweep is O(H) regardless of how little garbage there is, running the collector too often wastes work scanning a heap that's mostly live. The classic fix is to let garbage accumulate and collect rarely — but that bloats the heap and lengthens each pause. That tension is exactly what generational collectors exploit.

When mark-and-sweep is the right tool

When object graphs have cycles. Doubly-linked lists, graphs, parent↔child trees, observer/observable webs — any structure where objects point back at each other. Tracing collects these automatically; reference counting cannot without a separate cycle detector.
When you can tolerate pauses or afford a concurrent collector. Batch jobs, compilers, and build tools shrug off a 50 ms stop-the-world pause. Latency-sensitive servers need a concurrent variant, which is more complex but built on the same mark-sweep core.
When allocation is bursty and short-lived. Most objects die young, so combining mark-sweep with generational collection — frequent cheap collections of the young generation, rare full collections of the old — gives excellent throughput.
When you don't control every pointer store. Reference counting requires intercepting every assignment; tracing doesn't care how pointers got where they are, it just reads the graph at collection time.

Reach for something else when pauses are intolerable and you can't afford a concurrent GC (use arena/region allocation or manual management), or when the heap is enormous and mostly live (the O(H) sweep dominates — prefer a generational or region-based scheme that never scans the whole heap at once).

Mark-sweep vs the alternatives

	Mark-sweep	Mark-compact	Copying (semispace)	Reference counting	Generational
Liveness model	Tracing	Tracing	Tracing	Counting	Tracing (per region)
Collects cycles	Yes	Yes	Yes	No (leaks)	Yes
Cost per cycle	O(live + heap)	O(live + heap)	O(live only)	O(1) per pointer write	O(young live) usually
Fragmentation	Yes (free list)	None (compacts)	None (bump pointer)	Yes	Depends on old gen
Moves objects	No	Yes	Yes	No	Young gen: yes
Space overhead	1 mark bit + free list	1 mark bit	2× heap (two spaces)	1 counter / object	Remembered set + bits
Reclaim latency	End of cycle	End of cycle	End of cycle	Immediate	End of (minor) cycle
Real-world use	Boehm GC, old CRuby	JVM full GC, V8 old space	JVM young gen, V8 new space	CPython, Swift, Rust Rc	JVM (G1), V8, .NET CLR

The defining trade-offs: copying collectors are O(live) — they never touch garbage — but pay 2× memory and must move objects. Reference counting reclaims instantly and incrementally but leaks cycles and taxes every pointer write. Mark-compact eliminates fragmentation but moves every survivor. Mark-sweep is the simplest tracing collector that handles cycles, at the price of a whole-heap sweep and a fragmented free list. Most production runtimes combine several: a copying young generation backed by a mark-sweep or mark-compact old generation.

What the numbers actually say

The sweep dominates on healthy heaps. If 90% of a 4 GB heap is live (a common steady state), marking visits 3.6 GB of objects and the sweep still scans all 4 GB. The 10% that's garbage costs you a full-heap pass to find — the canonical argument for not collecting too eagerly.
One mark bit per object is ~0.4% overhead for 32-byte objects, but most collectors don't even spend that: they pack mark bits into a separate bitmap (a "mark map") so the sweep is a tight scan over a dense bit array, and the object headers stay clean for the CPU cache.
Pause time scales with live data, not garbage. A naive stop-the-world full GC on a 10 GB heap can pause 1–10 seconds. Concurrent mark-sweep collectors — Go's GC, the JVM's old CMS, ZGC, Shenandoah — push 99th-percentile pauses under 10 ms by marking while the program runs, regardless of heap size. ZGC advertises sub-millisecond pauses on terabyte heaps.
Concurrent marking isn't free throughput. The write barrier that preserves correctness adds roughly a few percent overhead to every pointer store. Go accepts about a 25% CPU tax during a GC cycle as the deliberate price of its low pause-time target.
Fragmentation is the silent cost. Because mark-sweep never moves objects, a long-running heap can end up unable to satisfy a large allocation despite having plenty of total free bytes — they're just scattered in too-small gaps. This is what eventually forces a compacting full GC in collectors that otherwise prefer mark-sweep.

JavaScript implementation

A self-contained mark-and-sweep collector over a toy object model. Objects are plain records with named fields; a field is a "pointer" if its value is another tracked object. The roots are whatever you hand the collector.

class Heap {
  constructor() { this.objects = new Set(); }       // every live-or-dead allocation

  alloc(fields = {}) {
    const obj = { __marked: false, fields };
    this.objects.add(obj);
    return obj;
  }

  // ── Phase 1: mark everything reachable from the roots ──
  mark(roots) {
    const worklist = [...roots];                     // explicit stack — no recursion
    while (worklist.length) {
      const obj = worklist.pop();
      if (!obj || obj.__marked) continue;            // already visited? O(1) skip
      obj.__marked = true;
      for (const value of Object.values(obj.fields)) {
        if (this.objects.has(value)) worklist.push(value);   // follow pointer
      }
    }
  }

  // ── Phase 2: reclaim every object left unmarked, reset bits ──
  sweep() {
    let freed = 0;
    for (const obj of this.objects) {
      if (obj.__marked) {
        obj.__marked = false;                        // clear for next cycle
      } else {
        this.objects.delete(obj);                    // return to "free list"
        freed++;
      }
    }
    return freed;
  }

  collect(roots) {
    this.mark(roots);
    return this.sweep();
  }
}

// ── Demo: build a cycle that no root can reach ──
const heap = new Heap();
const a = heap.alloc(), b = heap.alloc(), garbage1 = heap.alloc(), garbage2 = heap.alloc();
a.fields.next = b;            // root → a → b   (live)
garbage1.fields.peer = garbage2;
garbage2.fields.peer = garbage1;   // garbage1 ↔ garbage2  (an unreachable cycle)

const stackRoots = [a];       // only `a` is on the "stack"
console.log(heap.collect(stackRoots));   // → 2  (both cycle members reclaimed)
console.log(heap.objects.size);          // → 2  (a and b survive)

Two details mirror real collectors. First, the worklist is an explicit array, not recursion — a recursive mark on a million-node linked list overflows the call stack. Second, the sweep both reclaims unmarked objects and resets the mark bits on survivors, so the next cycle starts from a clean slate. The cycle garbage1 ↔ garbage2 is collected precisely because tracing never visits it — a reference-counting collector would leak it forever, since each holds a count of 1 from the other.

Python pseudocode: adding tri-color marking

The tri-color abstraction is what makes concurrent mark-and-sweep correct. Every object is white (unvisited), grey (visited, children pending), or black (visited, children done). This version exposes the colors explicitly and shows the invariant a write barrier must protect.

WHITE, GREY, BLACK = 0, 1, 2

class Obj:
    __slots__ = ('color', 'refs')
    def __init__(self, *refs):
        self.color = WHITE
        self.refs  = list(refs)        # outgoing pointers

def collect(all_objects, roots):
    # Reset: everything starts white.
    for o in all_objects:
        o.color = WHITE

    # Seed the grey set with the roots.
    grey = list(roots)
    for r in roots:
        r.color = GREY

    # Advance grey -> black until no grey remain.
    while grey:
        o = grey.pop()
        for child in o.refs:
            if child.color == WHITE:   # discover an unvisited object
                child.color = GREY
                grey.append(child)
        o.color = BLACK                # all children now at least grey

    # Sweep: anything still WHITE was never reached.
    live, freed = [], 0
    for o in all_objects:
        if o.color == WHITE:
            freed += 1                 # reclaim
        else:
            live.append(o)
    return live, freed

# The tri-color invariant (must hold for concurrent correctness):
#   No BLACK object may point directly to a WHITE object
#   without that WHITE object being reachable from some GREY object.
# A mutator that writes  black.ref = white  would hide `white` from the
# collector (black is already scanned, white is about to be swept). A WRITE
# BARRIER restores the invariant — e.g. Dijkstra's barrier re-greys the
# written-to white object; Yuasa's barrier greys the OLD value being
# overwritten (snapshot-at-the-beginning). Either way, no live object is lost.

The "strong tri-color invariant" — no black-to-white edge — is the formal guarantee that lets the program keep mutating the heap while marking runs. Stop-the-world collectors get it for free because nothing mutates during marking; concurrent collectors enforce it with a write barrier on every pointer store, which is exactly the few-percent throughput tax noted above.

Variants worth knowing

Mark-compact. Keep the mark phase; replace the sweep with a compaction pass that slides every live object toward one end of the heap and rewrites all pointers to it. Eliminates fragmentation and restores fast bump-pointer allocation, at the cost of moving survivors and a second (or third) heap pass. The JVM's full GC and V8's old-space collection are mark-compact.

Concurrent / incremental mark-sweep. Run marking in slices interleaved with the program, guarded by a write barrier and the tri-color invariant. This is how Go's collector, the JVM's (now-removed) CMS, ZGC, and Shenandoah keep pauses tiny. The marking is the same graph walk; the engineering is all about not losing objects the program touches mid-cycle.

Generational collection. Exploit the empirical fact that most objects die young. Allocate into a small nursery, collect it often and cheaply (usually by copying), and promote survivors to an old generation that's collected rarely with mark-sweep or mark-compact. A remembered set plus a write barrier tracks the few old→young pointers so a minor collection never has to scan the whole old heap. This is the throughput fix for mark-sweep's O(heap) sweep.

Conservative collection. When the runtime can't tell which stack slots and registers hold pointers (typical for C/C++), treat any word that looks like a heap address as a possible root. The Boehm–Demers–Weiser collector does this, letting unmodified C programs get tracing GC. The price is occasionally retaining a dead object because an integer happened to alias its address.

Deutsch–Schorr–Waite pointer reversal. Mark with O(1) extra space by reversing pointers as you descend into the graph and restoring them as you back out, encoding the traversal path inside the object graph itself. Historically vital when memory for an explicit worklist was unaffordable.

Common bugs and edge cases

Forgetting to clear mark bits. If the sweep doesn't reset survivors to unmarked, the next cycle sees everything as already-marked and frees nothing. Either clear bits during the sweep, or flip the meaning of "marked" each cycle (the classic mark-bit toggle trick).
Missing a root. The most dangerous GC bug. If a register or stack slot holding the only pointer to a live object isn't enumerated as a root, the collector frees memory the program is still using — a use-after-free that corrupts the heap. Precise stack maps and the JIT's GC-safepoint metadata exist to prevent exactly this.
Recursive marking blows the stack. A deep or long linked list can recurse L levels. Always use an explicit worklist or pointer reversal.
Mutating during marking without a barrier. In a concurrent collector, storing a white object into a black one without a write barrier violates the tri-color invariant and can lose a live object. Stop-the-world avoids this by definition; concurrent collectors must enforce it.
Treating the sweep cost as proportional to garbage. It isn't — it's proportional to total heap size. Collecting a nearly-full heap to reclaim a sliver of garbage is almost pure overhead.
Fragmentation starving large allocations. Plenty of free bytes, none of them contiguous enough. Mark-sweep can't fix this on its own; it needs an occasional compacting pass or a size-segregated free list to mitigate it.
Finalizers resurrecting objects. If a finalizer stores this back into a reachable object, an object the collector decided was dead becomes live again, requiring a re-scan. Finalization ordering and resurrection are a notorious source of subtle GC bugs.

Frequently asked questions

Why can't mark-and-sweep free unreachable cycles, but reference counting can?

It's the reverse. Mark-and-sweep collects cycles for free: if a clump of objects points only at each other but nothing in the clump is reachable from a root, the marking phase never visits any of them and the sweep reclaims them all. Reference counting is the technique that leaks cycles, because every object in the cycle keeps a non-zero count from its neighbour.

What are the GC roots in mark-and-sweep?

Roots are the pointers the program can reach without going through the heap: local variables and parameters on every thread's stack, CPU registers, global and static variables, and JNI/interned references held by the runtime. The collector treats these as definitely-live and traces outward from them. An object is garbage if and only if no path of pointers leads to it from any root.

How long does a mark-and-sweep pause last?

Naive stop-the-world mark-and-sweep pauses for the whole cycle: marking costs O(live objects) and sweeping costs O(heap size), so a multi-gigabyte heap can stall for hundreds of milliseconds to several seconds. Production collectors hide this by marking concurrently with the program and using a write barrier, cutting pauses to single-digit milliseconds (Go's GC targets sub-millisecond, ZGC and Shenandoah target under 10 ms regardless of heap size).

What's the difference between mark-sweep and mark-compact?

Both share the marking phase. Mark-sweep leaves survivors where they are and threads the freed gaps onto a free list, which is fast but fragments the heap. Mark-compact adds a third phase that slides all live objects to one end of the heap so free space is one contiguous block, eliminating fragmentation at the cost of moving every survivor and fixing up every pointer to it.

What is tri-color marking and why does it matter?

Tri-color marking colors each object white (unvisited), grey (visited but its children aren't), or black (visited with all children scanned). The collector advances grey objects to black until no grey remain; whatever is still white is garbage. The abstraction makes concurrent collection provably correct: as long as the program never lets a black object point directly to a white one without re-greying something (the tri-color invariant, enforced by a write barrier), the program can mutate the heap while marking is in progress.

Does mark-and-sweep need to stop the whole program?

The textbook version does, but it isn't required. Concurrent and incremental collectors run marking in slices interleaved with the program, using a write barrier to catch pointer updates that would otherwise hide a live object. The trade-off is throughput: barriers tax every pointer write, and the collector must conservatively retain anything that became reachable during the cycle — so concurrent GCs reclaim slightly less per cycle than a clean stop-the-world snapshot would.