Concurrency
Hazard Pointers
How lock-free readers stop the garbage collector that doesn't exist
Hazard pointers let lock-free readers publish the nodes they're using so no other thread frees that memory out from under them, solving the use-after-free problem in concurrent data structures with O(1) amortized reclamation.
- InventedMaged Michael, 2004
- Read-path cost1 store + 1 fence
- Reclaim costO(R) scan, O(1) amortized
- Slots per threadusually 1–2
- StandardizedC++26 std::hazard_pointer
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The race that has no winner
Picture a lock-free linked list. Thread A is walking it: it has just loaded a pointer to node X and is about to read X->value. In the same instant, Thread B unlinks X from the list with a successful compare-and-swap and calls free(X). A nanosecond later A dereferences a pointer into freed memory. The allocator may have already recycled that block for something else, so A reads garbage, corrupts state, or segfaults.
This is the central, nasty problem of lock-free programming. Removing a node from the structure is the easy part — a single CAS does it. Knowing when it is safe to actually free the memory is the hard part, because in a lock-free design no thread is ever allowed to block waiting for the others. A tracing garbage collector solves this for you (it never frees something still reachable), which is why lock-free code in Java or Go feels deceptively easy. In C, C++, or Rust there is no collector, so you have to build a tiny one — and hazard pointers are the smallest one that works.
The idea, due to Maged Michael in 2004, is disarmingly simple: before a thread dereferences a shared pointer, it writes that pointer into a globally visible per-thread slot called a hazard pointer. A thread that wants to free a node first scans every hazard pointer; if the node's address appears in any of them, the free is postponed. The reader announces "this one's mine — don't touch it," and the reclaimer respects the announcement. No locks, no reference counts on the hot path.
The protocol, precisely
Each thread owns a small fixed number of hazard-pointer slots (one or two is the common case). The slots live in a shared array so every thread can read every other thread's slots, but each slot has a single writer — only its owning thread ever stores to it. That single-writer property is what makes the read path cheap.
Protecting a pointer. Because the node can be unlinked between the time you read the head pointer and the time you publish it, you can't just store-and-go. You must publish, then re-read the source, and confirm it didn't change:
repeat:
p = *source // load the shared pointer
hp = p // publish it in my hazard slot
fence(seq_cst) // make the publish globally visible BEFORE re-read
if *source == p: // still the same? then my hazard pointer is valid
break // safe to dereference p
// otherwise it changed under me — retry
That sequence is the heart of the whole scheme. The store-load fence between publishing hp and re-reading *source is mandatory: it guarantees that if a reclaiming thread later reads my hazard pointer and sees nothing, then my dereference must not have happened yet — so it is genuinely free to reclaim. Drop the fence and the protocol is silently broken on any weakly-ordered CPU.
Retiring a node. When a thread removes a node, it does not free it. It appends the node to a thread-local retire list and increments a counter. When the retire list reaches a threshold R, the thread runs a scan:
- Collect every non-null hazard pointer across all threads into a set
H(sorting it, or dropping it in a hash set, makes lookups fast). - For each node on my retire list: if its address is in
H, keep it for later. Otherwise, no one is watching it —freeit now.
The threshold is chosen so that at least half the retired nodes get freed each scan, which is what gives the amortized bound. The canonical choice is R = (1 + k) · H_total for a small constant k, where H_total is the total number of hazard-pointer slots in the program. Since at most H_total nodes can be protected at any instant, a retire list of size R > H_total guarantees progress.
The cost, in big-O and in cycles
Let N be the number of threads, K the hazard pointers per thread, and H = N·K the total slot count.
- Protect (read path): O(1) — one store, one memory fence, one re-read, repeated only on the rare retry. No contended atomics, no cache-line ping-pong.
- Retire: O(1) to append to the local list.
- Scan (reclaim): O(H + R) — read all
Hslots, then check each of theRretired nodes against them. WithR = Θ(H)and at leastR/2nodes freed per scan, the amortized cost per reclaimed node is O(1). - Worst-case memory overhead: at most
N · R = O(N²·K)unreclaimed (retired-but-not-yet-freed) nodes across the system — bounded, unlike epoch-based reclamation where a single stalled thread can pin unbounded garbage.
That bounded-garbage guarantee is the headline property. With hazard pointers, a thread that stalls in the middle of an operation pins at most K nodes — exactly the ones in its slots. An epoch- or QSBR-based reclaimer, by contrast, lets one sleeping thread block reclamation of every node retired since it last announced a quiescent state, which can be the entire working set.
When to reach for hazard pointers
- Read-mostly lock-free structures — stacks, queues, hash maps, lists — where you need safe reclamation without a GC and can't tolerate the cache contention of reference counting.
- Hard real-time or bounded-memory systems where one slow thread must not be able to pin unbounded garbage.
- Code that must close the ABA window cheaply, without 128-bit tagged pointers or double-width CAS.
Reach for something else when: traversals are long and you'd be publishing a new hazard pointer at every hop (epoch-based reclamation amortizes better there — one announcement covers a whole critical section); or when you have a real garbage collector already (just let it do its job); or when the structure is read-only after construction (no reclamation needed at all).
Hazard pointers vs other reclamation schemes
| Hazard pointers | Epoch-based (EBR) | RCU / QSBR | Reference counting | Lock + free | |
|---|---|---|---|---|---|
| Read-path cost | store + fence per pointer | 1 announce per critical section | near-zero (quiescence) | atomic inc/dec (contended) | lock acquire |
| Worst-case garbage | bounded: O(N²·K) | unbounded if a thread stalls | unbounded if a thread stalls | bounded | bounded |
| Per-node overhead | none (slots are per-thread) | none | none | a counter word | none |
| Lock-free / wait-free safe | yes (lock-free) | yes | blocking-ish (grace period) | yes, but slow path | no — not lock-free |
| Solves ABA | yes (prevents reuse) | yes | yes | partial | n/a |
| Best for | short ops, bounded memory | long read traversals | kernel, read-mostly | shared ownership graphs | simplicity |
The defining trade-off is read-path cost versus worst-case memory. Hazard pointers pay a fence per protected pointer to cap the garbage; EBR and RCU make reads nearly free but let a stalled thread pin everything. That's why the Linux kernel uses RCU (where readers are short and a grace period is acceptable) while a userspace allocator that must never run away on memory will prefer hazard pointers.
What the numbers actually say
- One store plus one fence per protected pointer. On x86 the fence is an
mfenceor a locked instruction — roughly 20–30 cycles. That's the entire read-path tax, and it doesn't grow with thread count, unlike a contended reference-count increment which can cost hundreds of cycles when the cache line is hot across cores. - Scan runs once every R retirements. With
R = 2·Hand the half-freed guarantee, the amortized reclamation cost is a small constant per node. A 64-thread program with 2 slots each hasH = 128, so a scan reads 128 words and clears a ~256-node retire list — microseconds, infrequently. - Bounded garbage: ≤ N·R nodes. The same 64-thread system holds at most
64 × 256 ≈ 16,000retired-but-unfreed nodes in the absolute worst case — kilobytes to a few megabytes, never unbounded. - Patent-encumbered, then freed. IBM filed a patent application on Michael's technique in 2002 (US 2004/0107227, "Method for efficient implementation of dynamic lock-free data structures with safe memory reclamation"). The pending application chilled adoption for years; IBM ultimately let it lapse — it was abandoned in 2010 — and that removal of the patent cloud is what eventually cleared the path into C++26.
JavaScript model (single-threaded simulation)
JavaScript has no shared-memory threads on the main path and no manual free, so a literal implementation isn't possible. But the protocol is easy to model — a registry of per-thread hazard slots and a retire-then-scan reclaimer — which makes the algorithm concrete:
// A toy single-threaded model of the hazard-pointer protocol.
// "free" just means: this node may be recycled now.
class HazardDomain {
constructor() {
this.slots = new Map(); // threadId -> Set of protected node refs
this.retired = []; // nodes removed but not yet freed
this.R = 4; // scan threshold (tiny, for demonstration)
}
// A thread publishes "I'm using `node`" before dereferencing it.
protect(threadId, node) {
if (!this.slots.has(threadId)) this.slots.set(threadId, new Set());
this.slots.get(threadId).add(node);
return node; // caller may now safely dereference
}
// Thread is done with `node` — clear its hazard slot.
release(threadId, node) {
this.slots.get(threadId)?.delete(node);
}
// Node was unlinked from the structure. Don't free yet — retire it.
retire(node, onFree) {
this.retired.push({ node, onFree });
if (this.retired.length >= this.R) this.scan();
}
// Free every retired node that no live hazard pointer protects.
scan() {
const hazardous = new Set();
for (const set of this.slots.values())
for (const n of set) hazardous.add(n);
const survivors = [];
for (const entry of this.retired) {
if (hazardous.has(entry.node)) survivors.push(entry); // still watched
else entry.onFree(entry.node); // safe to free
}
this.retired = survivors;
}
}
// --- usage: a lock-free pop that hands a node to retire() ---
const hp = new HazardDomain();
function popAndRetire(threadId, node) {
hp.protect(threadId, node); // announce before touching it
const value = node.value; // safe dereference
hp.release(threadId, node); // done reading
hp.retire(node, n => { n.recycled = true; });
return value;
}
The two details that matter even in this toy: protect must precede the dereference, and retire never frees directly — it always goes through scan, which is the only place a node dies.
C++ implementation (the real thing)
Here is a stripped-down but faithful implementation for a fixed thread count, illustrating the fence-and-recheck protect and the retire/scan cycle. This is the shape of what folly::hazptr and std::hazard_pointer generalize:
#include <atomic>
#include <vector>
#include <algorithm>
constexpr int MAX_THREADS = 64;
constexpr int HP_PER_THREAD = 1;
constexpr int H = MAX_THREADS * HP_PER_THREAD;
constexpr int R = 2 * H; // scan threshold
// One global array of single-writer hazard slots.
std::atomic<void*> g_hazard[H];
thread_local int t_id; // 0..MAX_THREADS-1, assigned at start
thread_local std::vector<void*> t_retired; // this thread's retire list
// Publish `*source` into my hazard slot, defending against concurrent unlink.
template <class T>
T* protect(int slot, const std::atomic<T*>& source) {
T* p = source.load(std::memory_order_acquire);
while (true) {
g_hazard[t_id * HP_PER_THREAD + slot].store(p, std::memory_order_seq_cst);
// store-load fence (seq_cst above) — publish must precede the re-read
T* p2 = source.load(std::memory_order_acquire);
if (p2 == p) return p; // hazard pointer is now valid
p = p2; // it moved; republish and retry
}
}
void clear(int slot) {
g_hazard[t_id * HP_PER_THREAD + slot].store(nullptr, std::memory_order_release);
}
template <class T>
void retire(T* node) {
t_retired.push_back(node);
if ((int)t_retired.size() >= R) scan<T>();
}
template <class T>
void scan() {
// 1. Snapshot every non-null hazard pointer.
std::vector<void*> hazardous;
hazardous.reserve(H);
for (int i = 0; i < H; ++i) {
void* p = g_hazard[i].load(std::memory_order_acquire);
if (p) hazardous.push_back(p);
}
std::sort(hazardous.begin(), hazardous.end());
// 2. Free retired nodes that no one is protecting; keep the rest.
std::vector<void*> survivors;
for (void* n : t_retired) {
if (std::binary_search(hazardous.begin(), hazardous.end(), n))
survivors.push_back(n); // still hazardous — defer
else
delete static_cast<T*>(n); // safe to reclaim
}
t_retired.swap(survivors);
}
Two correctness anchors. First, the seq_cst store in protect is doing real work — it is the store-load fence that orders "publish my hazard pointer" before "re-read the source," so a concurrent reclaimer can never both miss my hazard pointer and free a node I went on to read. Second, retire is the only producer for the retire list and scan is the only consumer, both thread-local, so the list itself needs no synchronization.
Variants worth knowing
Pass-the-buck. Herlihy, Luchangco, Martin, and Moir's 2002–2005 design (predating Michael's published name) is the wait-free cousin: instead of dropping unfreed nodes back on its own list, a reclaiming thread hands responsibility for a still-hazardous node to the very thread protecting it. It achieves wait-freedom at the cost of more bookkeeping.
Hazard eras / interval-based reclamation (IBR). A pair of hybrids — Hazard Eras (Ramalhete & Correia, 2017) and IBR (Wen et al., 2018) — that stamp each node with a birth and retire epoch and give each thread a single "era" hazard slot. It gets EBR-like read performance — one announcement per operation rather than one per pointer — while keeping hazard-pointer-style bounded garbage. The best of both worlds for long traversals.
Optimistic access (folly). Folly's implementation lets readers run optimistically and only validate at the end, batching fences and using a global retirement domain with a backpressure mechanism so that no single thread's retire list grows pathologically.
Drop-the-anchor / DEBRA. Refinements that reduce the fence cost or bound the per-thread reclamation work more tightly, used in research-grade lock-free libraries.
Common bugs and edge cases
- Missing the store-load fence. The single most common and most subtle bug. Without a full fence between publishing the hazard pointer and re-reading the source, a weakly-ordered CPU (ARM, POWER) can reorder them, and the protocol silently allows a use-after-free. It works on x86 only by accident of TSO if you used a locked store; relax it and it breaks in production.
- Forgetting to re-check after publishing. Publishing a hazard pointer is not enough — the node may have been unlinked and retired in the gap before the publish became visible. You must re-read the source and confirm the pointer is still installed; otherwise retry.
- Not clearing the slot. If a thread protects a node and never clears the slot, that node — and every address the allocator might reuse for it — is pinned forever, leaking memory and stalling reclamation.
- Recursion / nesting beyond your slot budget. If an operation needs to hold three nodes live at once but you allocated two slots per thread, you'll overwrite a live hazard pointer and reopen the race. Size
Kto the deepest simultaneous dereference. - Reusing a node before it leaves every retire list. A node can be retired by one thread, survive a scan because another thread protects it, and sit on the retire list. Freeing or reusing it out of band corrupts that thread's bookkeeping.
- Confusing "unlinked" with "free". Removing a node from the structure with a successful CAS does not make it free. It only becomes free after a scan finds no hazard pointer on it. Treating unlink as free is exactly the bug hazard pointers exist to prevent.
Frequently asked questions
What problem do hazard pointers actually solve?
The use-after-free race in lock-free data structures. One thread reads a pointer to a node and starts dereferencing it; a second thread unlinks and frees that same node. Without coordination the reader touches freed memory — a crash or silent corruption. Hazard pointers let the reader publish "I'm using this node" so the writer defers the free until it's safe.
How are hazard pointers different from reference counting?
Reference counting mutates a shared counter on every acquire and release, which bounces the cache line between cores and is itself a synchronization bottleneck. A hazard pointer is a single-writer, per-thread slot — only the owning thread writes it, everyone else only reads it — so the read path has no contended atomic increments. The cost moves to a periodic O(R) scan at reclamation time instead.
Do hazard pointers solve the ABA problem?
Indirectly, yes. ABA happens when a value changes from A to B back to A and a compare-and-swap is fooled into thinking nothing happened — usually because the original A node was freed and the allocator handed the same address back. Hazard pointers prevent that node from being freed and reused while any thread is watching it, which closes the most common ABA window without needing tagged pointers.
Why is reclamation deferred instead of immediate?
A retiring thread can't safely free a node the instant it's unlinked, because another thread may have grabbed a pointer to it microseconds earlier. Instead it adds the node to a thread-local retire list. When that list crosses a threshold R, the thread scans every other thread's hazard pointers and frees only the retired nodes that no one is protecting. This batches the cost into an O(R) sweep that amortizes to O(1) per node.
How many hazard pointers does a thread need?
Just enough for the maximum number of nodes one thread dereferences simultaneously during a single operation. A Treiber stack needs one. A Michael-Scott queue or a linked-list traversal that holds a current and a next node needs two. The total slot count across the program is bounded by (number of threads) × (hazard pointers per thread), which is why the reclamation scan is cheap.
Where are hazard pointers used in real systems?
Facebook's Folly library ships a production hazard_ptr implementation used inside its concurrent maps and queues, and hazard pointers were standardized into C++26 as std::hazard_pointer. IBM filed a patent application on the technique in 2002 (Maged Michael; published as US 2004/0107227), but that application was abandoned in 2010, so the technique is no longer patent-encumbered — which is part of what cleared the way into the C++ standard.