Systems

Garbage Collection

Automatic memory management — free what's no longer reachable

Garbage collection (GC) automatically reclaims memory that's no longer reachable from the program. The runtime traces references from a "root set" (stack, globals, registers) and frees anything it can't reach. Modern GCs achieve millisecond pauses on heaps of gigabytes — the gulf from "stop-the-world for seconds" GC of the 1990s to today's parallel concurrent collectors is one of the great achievements of systems engineering.

  • Reference algorithmsMark-and-sweep, copying, generational, concurrent
  • Modern pause timesSub-10ms typical (G1, ZGC, Shenandoah)
  • Throughput vs latencyTradeoff — both can't be optimized simultaneously
  • Generational hypothesisMost objects die young; few live long
  • Used byJava, Go, .NET, JavaScript, Python (CPython hybrid), Ruby
  • Alternative approachesReference counting (Swift, Python), manual (C/C++), Rust ownership

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why have garbage collection?

Manual memory management is correct only when you free every allocation exactly once at the right time. Mistakes:

  • Memory leak. Forget to free → memory accumulates → process eventually crashes.
  • Use-after-free. Free, then access → undefined behavior, crash, or exploitable security vulnerability.
  • Double-free. Free the same allocation twice → heap corruption.
  • Dangling pointer. Pointer to freed memory accessed elsewhere → reads garbage or crashes.

These bugs are subtle, common, and the source of countless CVEs in C and C++ codebases. GC eliminates them entirely — the runtime knows exactly when something becomes unreachable and reclaims it. The cost is some runtime overhead and (historically) pause times.

Reachability — the foundation

An object is "reachable" if you can get to it via references starting from the root set:

  • Local variables on each thread's stack.
  • Global variables.
  • CPU registers and active call frames.
  • Some platform-specific roots (JIT code references, weak global tables).

Anything reachable from the roots must be kept alive. Anything not reachable can never be accessed by the program again — safely freeable.

The trick — tracing reachability efficiently and concurrently with the running application.

Basic GC algorithms

AlgorithmHow it worksProsCons
Reference countingEach object tracks ref count; freed when count = 0Immediate cleanup, predictableMisses cycles; per-assignment overhead
Mark-and-sweepMark live objects from roots; sweep frees unmarkedHandles cycles; conceptually simpleStop-the-world pauses; fragmentation
Mark-and-compactMark, then move live objects together to defragmentNo fragmentation; fast bump allocationCompacting is expensive; needs barriers
Copying (semi-space)Two heap halves; copy live objects to other halfCheap allocation; auto-defragmentingWastes 50% of heap; bad for long-lived data
GenerationalYoung gen collected often; old gen rarelyExploits "most die young"; mostly cheapCross-generation references need write barriers
Concurrent / incrementalGC and app run interleavedLow pause timesHigher CPU overhead, complex

Most modern GCs combine techniques — generational + concurrent + compacting. Java's G1, ZGC, and Shenandoah are all such hybrids; Go's GC is concurrent non-generational tri-color mark-and-sweep.

The generational hypothesis

Empirical fact across thousands of programs — most objects die young. Specifically:

  • ~95% of new allocations are unreachable within milliseconds.
  • The remaining 5% include long-lived data (caches, configuration, server-state).
  • Old objects rarely become garbage; they "live forever" relative to the GC cycle.

Generational GC organizes the heap into:

  • Young generation (eden). New allocations go here. Collected frequently — every few hundred ms. Most of the heap visited is dead, so collection is cheap (live objects are quickly copied out, the rest is bulk-cleared).
  • Survivor space(s). Objects that survived 1-2 young collections. Promoted to old gen if they survive more.
  • Old (tenured) generation. Long-lived objects. Collected rarely (every minute or so), but each collection is expensive (large heap to scan).

The young gen makes the common case fast; old gen is the rare case that takes longer. Net result — total GC time is much lower than non-generational across realistic workloads.

Java GC family — choose your tradeoff

GCTypePause goalBest for
SerialSingle-threaded stop-the-worldDoesn't matter — small heaps onlyTiny apps, low-throughput
Parallel (Throughput)Multi-threaded stop-the-worldMaximum throughput, longer pauses OKBatch jobs, no SLA on latency
G1Region-based, mostly concurrent, generational~100ms typicalGeneral-purpose, default since Java 9
ZGCRegion-based, fully concurrent, scalable to TBsSub-10ms regardless of heap sizeLatency-critical large-heap apps
ShenandoahConcurrent compactingSub-10msSame niche as ZGC; OpenJDK alternative
EpsilonNo-op (allocates, never collects)N/APerformance testing, short-lived processes

For new Java apps in 2025, the choice is usually G1 (default, well-tuned) or ZGC (when latency matters more than CPU efficiency).

Reference counting in CPython, Swift, Objective-C

Each object has a refcount — incremented on new references, decremented when references go away. When the count drops to 0, the object is freed immediately.

# CPython conceptually does this on every assignment:
a = [1, 2, 3]    # list refcount = 1
b = a            # list refcount = 2 (b also references it)
a = None         # list refcount = 1
b = None         # list refcount = 0 → list freed

Pros — immediate, deterministic cleanup. No pauses. Memory released as soon as it's not needed.

Cons — cycles. a = [b]; b = [a] — refcounts never reach 0 even after both a and b go out of scope. Solutions:

  • Cycle collector (CPython has one — runs periodically to find unreachable cycles).
  • Weak references — programmer marks one direction of the cycle as "weak" so it doesn't increment refcount. Used in Swift's ARC.
  • No support — early Objective-C and naive reference counting just leak.

Reference counting also has runtime cost — every reference assignment touches the refcount. In hot allocation paths, this overhead can exceed mark-and-sweep's amortized cost. Tracing GC wins on throughput; refcounting wins on latency predictability.

When GC behavior matters

  • Latency-sensitive services. Trading systems, real-time audio, game engines, video processing — even a 10ms pause is unacceptable. Tune GC carefully or use a low-pause GC (ZGC, Shenandoah) or move to manually-managed languages (Rust, C++).
  • High-allocation-rate workloads. Generational GC excels when most allocations die young. If your workload allocates many long-lived objects, tune the survivor sizes and old-gen capacity to match.
  • Memory-constrained environments. Embedded systems, mobile, FaaS containers. Less heap means more frequent collections; pick a GC that doesn't waste memory on overhead (e.g., copy GC requires 2× heap).
  • Long-running services with steady-state allocation. Tune for old-gen capacity and concurrent collection threshold to avoid full stop-the-world collections.

Alternatives — manual, ownership, escape analysis

  • Manual (C, C++). Programmer is responsible for malloc/free. Maximum performance and predictability; maximum bug surface.
  • RAII (C++). Resources tied to object lifetimes; destructors run automatically. Smart pointers (unique_ptr, shared_ptr) extend this for heap allocation. shared_ptr is reference counting; unique_ptr is single-owner.
  • Ownership (Rust). Compile-time enforcement of borrow rules — every value has exactly one owner; references must follow strict aliasing rules. No GC needed; no leaks possible (assuming Drop is correctly implemented). Rivals manual C with safety guarantees.
  • Escape analysis (some JVMs, Go). Compile-time detection that an allocation never escapes a function — replace with stack allocation. Eliminates GC pressure for the most common case (short-lived locals).

Common GC issues

  • Memory leaks via accidental references. Caches, listener lists, observer patterns — anything that holds strong references to objects beyond their useful lifetime. The GC can't reclaim them; they're "reachable but useless." Use weak references for caches and explicit cleanup.
  • Long pauses surprising production. Your dev test had a 100MB heap; production has 10GB. The full GC pause grows with heap size. Always test with production-realistic heap sizes and allocation rates.
  • Allocation-rate spikes triggering GC storms. A burst of allocations forces frequent young collections; if survivors-to-old-gen overflow, you get long old-gen pauses. Profile and tune.
  • Finalizers / destructors with side effects. Don't rely on them to run promptly (or at all). Java finalize() and Python __del__() run when the GC sees fit. For deterministic cleanup, use try/finally, with-statements, or Closeable patterns.
  • Reference counting + cycles. CPython has a cycle collector; Swift requires you to use weak references. Forgetting to break cycles in Swift is the canonical iOS memory leak.
  • Excessive boxing in JIT-compiled languages. Java and C# autobox primitive values into wrapper objects in some contexts — each box is a heap allocation. Avoid in hot loops; prefer primitive arrays over Integer[].

Frequently asked questions

What's the basic mark-and-sweep algorithm?

Two phases. (1) Mark — start from the root set (stack, registers, globals); recursively follow every reference, marking each visited object as "alive." (2) Sweep — scan the whole heap; free any object that wasn't marked. Then unmark all surviving objects for the next cycle. O(live + dead) per cycle. Simple, correct, but causes long stop-the-world pauses on large heaps.

What's the generational hypothesis?

Empirical observation that most objects die young — local variables, intermediate results, short-lived loop allocations all become unreachable quickly after creation. Few objects survive long. Generational GCs exploit this by allocating new objects in a small "young generation" and frequently collecting it (cheap because most are dead). Survivors get promoted to an "old generation" collected less often. 90%+ allocation churn happens in young; old gen sees orders of magnitude less work.

What's the difference between concurrent and parallel GC?

Parallel GC uses multiple threads but pauses the application entirely while collecting. Concurrent GC runs alongside the application — only short pauses for synchronization, not full stop-the-world. Concurrent is harder (race conditions between mutator and collector) but achieves much lower pause times. Modern Java GCs (G1, ZGC, Shenandoah) are concurrent.

What's reference counting and why isn't it used everywhere?

Each object stores a count of references to it. When count drops to 0, the object is freed immediately. Pros — predictable, no pauses, immediate cleanup. Cons — (1) cycles of references never decrement (memory leak), (2) every assignment requires updating counts (slower than tracing GC for hot allocation paths), (3) thread-safe counting requires atomics and contention. Swift, Python (CPython), Objective-C use it; cycle detection runs as a separate sweep when needed.

What's a stop-the-world pause?

When the GC stops all application threads to safely traverse references without them changing. Old GCs (Java's serial, early CMS) had pauses proportional to heap size — multiple seconds on a 32GB heap. Modern concurrent GCs (ZGC, Shenandoah) achieve sub-10ms pauses by doing most work concurrently with the application.

How does Go's GC achieve low pauses?

Concurrent tri-color mark-and-sweep with write barriers. The app and GC run concurrently — only brief stop-the-world pauses for stack scanning and termination check. Combined with non-generational design (Go traditionally avoided generational due to its allocation patterns), Go achieves pause times under 1ms even on multi-GB heaps. Trade-off — slightly higher CPU overhead than tuned generational GC.

When should I use a language with manual memory management?

When predictable, low-latency, no-pause performance matters more than productivity. Real-time systems (audio processing, robotics, embedded controllers), game engines (frame-rate-critical), high-frequency trading. Modern Rust gives manual-memory-grade control with compile-time safety — increasingly the choice for new low-level work, replacing C++ in many domains.