Systems
Garbage Collection
Automatic memory management — free what's no longer reachable
Garbage collection (GC) automatically reclaims memory that's no longer reachable from the program. The runtime traces references from a "root set" (stack, globals, registers) and frees anything it can't reach. Modern GCs achieve millisecond pauses on heaps of gigabytes — the gulf from "stop-the-world for seconds" GC of the 1990s to today's parallel concurrent collectors is one of the great achievements of systems engineering.
- Reference algorithmsMark-and-sweep, copying, generational, concurrent
- Modern pause timesSub-10ms typical (G1, ZGC, Shenandoah)
- Throughput vs latencyTradeoff — both can't be optimized simultaneously
- Generational hypothesisMost objects die young; few live long
- Used byJava, Go, .NET, JavaScript, Python (CPython hybrid), Ruby
- Alternative approachesReference counting (Swift, Python), manual (C/C++), Rust ownership
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
Why have garbage collection?
Manual memory management is correct only when you free every allocation exactly once at the right time. Mistakes:
- Memory leak. Forget to free → memory accumulates → process eventually crashes.
- Use-after-free. Free, then access → undefined behavior, crash, or exploitable security vulnerability.
- Double-free. Free the same allocation twice → heap corruption.
- Dangling pointer. Pointer to freed memory accessed elsewhere → reads garbage or crashes.
These bugs are subtle, common, and the source of countless CVEs in C and C++ codebases. GC eliminates them entirely — the runtime knows exactly when something becomes unreachable and reclaims it. The cost is some runtime overhead and (historically) pause times.
Reachability — the foundation
An object is "reachable" if you can get to it via references starting from the root set:
- Local variables on each thread's stack.
- Global variables.
- CPU registers and active call frames.
- Some platform-specific roots (JIT code references, weak global tables).
Anything reachable from the roots must be kept alive. Anything not reachable can never be accessed by the program again — safely freeable.
The trick — tracing reachability efficiently and concurrently with the running application.
Basic GC algorithms
| Algorithm | How it works | Pros | Cons |
|---|---|---|---|
| Reference counting | Each object tracks ref count; freed when count = 0 | Immediate cleanup, predictable | Misses cycles; per-assignment overhead |
| Mark-and-sweep | Mark live objects from roots; sweep frees unmarked | Handles cycles; conceptually simple | Stop-the-world pauses; fragmentation |
| Mark-and-compact | Mark, then move live objects together to defragment | No fragmentation; fast bump allocation | Compacting is expensive; needs barriers |
| Copying (semi-space) | Two heap halves; copy live objects to other half | Cheap allocation; auto-defragmenting | Wastes 50% of heap; bad for long-lived data |
| Generational | Young gen collected often; old gen rarely | Exploits "most die young"; mostly cheap | Cross-generation references need write barriers |
| Concurrent / incremental | GC and app run interleaved | Low pause times | Higher CPU overhead, complex |
Most modern GCs combine techniques — generational + concurrent + compacting. Java's G1, ZGC, and Shenandoah are all such hybrids; Go's GC is concurrent non-generational tri-color mark-and-sweep.
The generational hypothesis
Empirical fact across thousands of programs — most objects die young. Specifically:
- ~95% of new allocations are unreachable within milliseconds.
- The remaining 5% include long-lived data (caches, configuration, server-state).
- Old objects rarely become garbage; they "live forever" relative to the GC cycle.
Generational GC organizes the heap into:
- Young generation (eden). New allocations go here. Collected frequently — every few hundred ms. Most of the heap visited is dead, so collection is cheap (live objects are quickly copied out, the rest is bulk-cleared).
- Survivor space(s). Objects that survived 1-2 young collections. Promoted to old gen if they survive more.
- Old (tenured) generation. Long-lived objects. Collected rarely (every minute or so), but each collection is expensive (large heap to scan).
The young gen makes the common case fast; old gen is the rare case that takes longer. Net result — total GC time is much lower than non-generational across realistic workloads.
Java GC family — choose your tradeoff
| GC | Type | Pause goal | Best for |
|---|---|---|---|
| Serial | Single-threaded stop-the-world | Doesn't matter — small heaps only | Tiny apps, low-throughput |
| Parallel (Throughput) | Multi-threaded stop-the-world | Maximum throughput, longer pauses OK | Batch jobs, no SLA on latency |
| G1 | Region-based, mostly concurrent, generational | ~100ms typical | General-purpose, default since Java 9 |
| ZGC | Region-based, fully concurrent, scalable to TBs | Sub-10ms regardless of heap size | Latency-critical large-heap apps |
| Shenandoah | Concurrent compacting | Sub-10ms | Same niche as ZGC; OpenJDK alternative |
| Epsilon | No-op (allocates, never collects) | N/A | Performance testing, short-lived processes |
For new Java apps in 2025, the choice is usually G1 (default, well-tuned) or ZGC (when latency matters more than CPU efficiency).
Reference counting in CPython, Swift, Objective-C
Each object has a refcount — incremented on new references, decremented when references go away. When the count drops to 0, the object is freed immediately.
# CPython conceptually does this on every assignment:
a = [1, 2, 3] # list refcount = 1
b = a # list refcount = 2 (b also references it)
a = None # list refcount = 1
b = None # list refcount = 0 → list freed
Pros — immediate, deterministic cleanup. No pauses. Memory released as soon as it's not needed.
Cons — cycles. a = [b]; b = [a] — refcounts never reach 0 even after both a and b go out of scope. Solutions:
- Cycle collector (CPython has one — runs periodically to find unreachable cycles).
- Weak references — programmer marks one direction of the cycle as "weak" so it doesn't increment refcount. Used in Swift's ARC.
- No support — early Objective-C and naive reference counting just leak.
Reference counting also has runtime cost — every reference assignment touches the refcount. In hot allocation paths, this overhead can exceed mark-and-sweep's amortized cost. Tracing GC wins on throughput; refcounting wins on latency predictability.
When GC behavior matters
- Latency-sensitive services. Trading systems, real-time audio, game engines, video processing — even a 10ms pause is unacceptable. Tune GC carefully or use a low-pause GC (ZGC, Shenandoah) or move to manually-managed languages (Rust, C++).
- High-allocation-rate workloads. Generational GC excels when most allocations die young. If your workload allocates many long-lived objects, tune the survivor sizes and old-gen capacity to match.
- Memory-constrained environments. Embedded systems, mobile, FaaS containers. Less heap means more frequent collections; pick a GC that doesn't waste memory on overhead (e.g., copy GC requires 2× heap).
- Long-running services with steady-state allocation. Tune for old-gen capacity and concurrent collection threshold to avoid full stop-the-world collections.
Alternatives — manual, ownership, escape analysis
- Manual (C, C++). Programmer is responsible for malloc/free. Maximum performance and predictability; maximum bug surface.
- RAII (C++). Resources tied to object lifetimes; destructors run automatically. Smart pointers (unique_ptr, shared_ptr) extend this for heap allocation. shared_ptr is reference counting; unique_ptr is single-owner.
- Ownership (Rust). Compile-time enforcement of borrow rules — every value has exactly one owner; references must follow strict aliasing rules. No GC needed; no leaks possible (assuming Drop is correctly implemented). Rivals manual C with safety guarantees.
- Escape analysis (some JVMs, Go). Compile-time detection that an allocation never escapes a function — replace with stack allocation. Eliminates GC pressure for the most common case (short-lived locals).
Common GC issues
- Memory leaks via accidental references. Caches, listener lists, observer patterns — anything that holds strong references to objects beyond their useful lifetime. The GC can't reclaim them; they're "reachable but useless." Use weak references for caches and explicit cleanup.
- Long pauses surprising production. Your dev test had a 100MB heap; production has 10GB. The full GC pause grows with heap size. Always test with production-realistic heap sizes and allocation rates.
- Allocation-rate spikes triggering GC storms. A burst of allocations forces frequent young collections; if survivors-to-old-gen overflow, you get long old-gen pauses. Profile and tune.
- Finalizers / destructors with side effects. Don't rely on them to run promptly (or at all). Java finalize() and Python __del__() run when the GC sees fit. For deterministic cleanup, use try/finally, with-statements, or Closeable patterns.
- Reference counting + cycles. CPython has a cycle collector; Swift requires you to use weak references. Forgetting to break cycles in Swift is the canonical iOS memory leak.
- Excessive boxing in JIT-compiled languages. Java and C# autobox primitive values into wrapper objects in some contexts — each box is a heap allocation. Avoid in hot loops; prefer primitive arrays over Integer[].
Frequently asked questions
What's the basic mark-and-sweep algorithm?
Two phases. (1) Mark — start from the root set (stack, registers, globals); recursively follow every reference, marking each visited object as "alive." (2) Sweep — scan the whole heap; free any object that wasn't marked. Then unmark all surviving objects for the next cycle. O(live + dead) per cycle. Simple, correct, but causes long stop-the-world pauses on large heaps.
What's the generational hypothesis?
Empirical observation that most objects die young — local variables, intermediate results, short-lived loop allocations all become unreachable quickly after creation. Few objects survive long. Generational GCs exploit this by allocating new objects in a small "young generation" and frequently collecting it (cheap because most are dead). Survivors get promoted to an "old generation" collected less often. 90%+ allocation churn happens in young; old gen sees orders of magnitude less work.
What's the difference between concurrent and parallel GC?
Parallel GC uses multiple threads but pauses the application entirely while collecting. Concurrent GC runs alongside the application — only short pauses for synchronization, not full stop-the-world. Concurrent is harder (race conditions between mutator and collector) but achieves much lower pause times. Modern Java GCs (G1, ZGC, Shenandoah) are concurrent.
What's reference counting and why isn't it used everywhere?
Each object stores a count of references to it. When count drops to 0, the object is freed immediately. Pros — predictable, no pauses, immediate cleanup. Cons — (1) cycles of references never decrement (memory leak), (2) every assignment requires updating counts (slower than tracing GC for hot allocation paths), (3) thread-safe counting requires atomics and contention. Swift, Python (CPython), Objective-C use it; cycle detection runs as a separate sweep when needed.
What's a stop-the-world pause?
When the GC stops all application threads to safely traverse references without them changing. Old GCs (Java's serial, early CMS) had pauses proportional to heap size — multiple seconds on a 32GB heap. Modern concurrent GCs (ZGC, Shenandoah) achieve sub-10ms pauses by doing most work concurrently with the application.
How does Go's GC achieve low pauses?
Concurrent tri-color mark-and-sweep with write barriers. The app and GC run concurrently — only brief stop-the-world pauses for stack scanning and termination check. Combined with non-generational design (Go traditionally avoided generational due to its allocation patterns), Go achieves pause times under 1ms even on multi-GB heaps. Trade-off — slightly higher CPU overhead than tuned generational GC.
When should I use a language with manual memory management?
When predictable, low-latency, no-pause performance matters more than productivity. Real-time systems (audio processing, robotics, embedded controllers), game engines (frame-rate-critical), high-frequency trading. Modern Rust gives manual-memory-grade control with compile-time safety — increasingly the choice for new low-level work, replacing C++ in many domains.