Concurrency
The ABA Problem
When "nothing changed" is a lie your lock-free code believes
The ABA problem is a concurrency bug where a compare-and-swap succeeds because a value changed from A to B and back to A, so CAS can't tell the memory was touched — corrupting lock-free stacks and queues. Fixed with tagged pointers, hazard pointers, or LL/SC.
- ClassLock-free correctness bug
- Root causeCAS compares value, not history
- Classic victimTreiber stack pop
- Tag fix costDouble-width CAS (CMPXCHG16B)
- Immune hardwareLL/SC (ARM, RISC-V, POWER)
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How the ABA problem happens
Lock-free algorithms replace mutexes with a single hardware instruction: compare-and-swap. CAS takes a memory address, an expected value, and a new value. It writes the new value only if the address still holds the expected one, and reports whether it succeeded — all atomically. The usual pattern is read-modify-CAS in a loop: read the current value, compute an update, then CAS it in; if the CAS fails because someone else got there first, retry from the fresh value.
The whole pattern leans on one assumption: if my CAS expecting A succeeds, the location was untouched since I read A. The ABA problem is the case where that assumption is false. Here is the interleaving, with the canonical lock-free stack (the Treiber stack, R. Kent Treiber, IBM, 1986):
- Thread 1 wants to pop. It reads
head = Aand readsA.next = B. It is about toCAS(head, A, B). - Thread 1 is preempted right here. The OS scheduler hands the CPU to other threads.
- Thread 2 pops
A(head becomesB), popsB(head becomesC), and frees both nodesAandB. - Thread 2 pushes a brand-new node. The allocator hands back the same address that
Aused to occupy, so head is nowAagain — but it's a different object with a differentnext. - Thread 1 wakes up and runs
CAS(head, A, B). The bits match — head isA— so the CAS succeeds. It writeshead = B.
But B was freed in step 3. Thread 1 just resurrected a dangling pointer as the head of a live stack. The next pop dereferences freed memory: a crash, silent corruption, or a security exploit. The CAS did exactly what it promised — it just promised the wrong thing.
The precise mechanism: value identity vs. logical identity
CAS performs a bitwise equality test on a single machine word. On x86 the instruction is LOCK CMPXCHG; the comparison is "are these 8 bytes identical?" That is value identity. What lock-free correctness actually needs is logical identity: "is this the same node I reasoned about, with the same history?" The ABA problem is the gap between the two.
The bug requires three ingredients, and removing any one prevents it:
- Recycling. The value
Amust come back. For pointers this is the memory allocator reusing an address; for counters it's wraparound or symmetric inc/dec. - A stale read. A thread holds an expected value across a window in which the world can change — i.e., it reads
A, stalls, and CASes later. - Value-only comparison. The CAS checks the bits, not whether the location was written. (This is precisely the ingredient that memory ordering does not fix — fences control visibility, not identity.)
The complexity story is what makes ABA so insidious: a correct lock-free pop is still O(1) amortized with the bug present. Performance numbers look perfect. The bug only manifests under a specific scheduler interleaving plus allocator reuse, so it can pass millions of single-threaded and even lightly-contended tests, then corrupt memory in production once every few billion operations under load. ABA is a tail-probability bug, not a throughput bug.
When you have to worry about ABA
- You CAS a pointer that can be freed and re-allocated. The headline case — Treiber stacks, Michael–Scott queues, lock-free freelists. If a popped node's memory can return as a new node, you are exposed.
- You CAS a recyclable index or generation counter that can return to a prior value within one thread's read-modify-CAS window.
- You are on x86/x86-64, where CAS is value-comparing
CMPXCHG. Hardware with LL/SC (ARM, RISC-V, POWER) is structurally immune — see below.
You do not need ABA defenses when the CASed value is monotonic and never reused (a strictly increasing sequence number that never wraps), when nodes are never freed during the structure's lifetime (arena/pool with no reuse), or when you're using a garbage-collected runtime that won't reclaim a node while any thread still references it — which is why lock-free stacks are famously easy in Java and hard in C++.
Comparison of ABA defenses
| Tagged pointer (counter) | Hazard pointers | Epoch / RCU reclamation | LL/SC hardware | GC runtime | DCAS / transactional | |
|---|---|---|---|---|---|---|
| What it tracks | (pointer, version) word | Per-thread "in-use" set | Global epoch counters | Cache-line write flag | Live reference graph | Multi-word atomicity |
| Stops false CAS success? | Yes (until tag wraps) | Yes (node never freed) | Yes (deferred free) | Yes (any write fails SC) | Yes (no premature free) | Yes |
| Extra per-op cost | Wider CAS, +0 reads | ~2 stores + 1 fence/op | Cheap reads, batched frees | None (it's the instruction) | GC pauses / barriers | High; rare HW support |
| Memory overhead | Tag bits per word | O(threads) hazard slots | Retire lists per thread | None | GC heap headroom | Varies |
| Failure mode | Tag wraparound (narrow tags) | Forgotten hazard = use-after-free | Stalled reader delays reclaim | Spurious SC failures → retry | Pause-time, footprint | Aborts under contention |
| Typical use | C/C++ on x86 (CMPXCHG16B) | Folly, libcds, concurrent C++ | Linux kernel RCU, userspace | ARM/RISC-V/POWER lock-free | Java, C#, Go data structures | Hardware TM (Intel TSX) |
The honest summary: tagged pointers are the cheapest if your platform has a wide enough double-width CAS, hazard pointers and epoch reclamation are the portable general answer (they also solve the underlying use-after-free), LL/SC dodges the whole thing for free on the right CPU, and a GC runtime makes the problem evaporate at the cost of GC.
What the numbers actually say
- A 16-bit tag wraps after 65,536 successful CAS writes. At a modest 10 million updates/second per contended location, that's a full wraparound every 6.5 milliseconds — so a narrow tag is not a fix, it just makes ABA rarer by a factor that's far too small. This is why the practical solution is the double-width CAS.
- x86-64
CMPXCHG16Batomically compares-and-swaps 128 bits: 64 bits of pointer plus a 64-bit tag. A 64-bit counter at 10 M writes/sec wraps after ≈ 58,000 years — effectively never. That's why "tagged pointer" in practice means "double-width CAS," not "16-bit counter." - Hazard pointers add roughly two atomic stores plus one memory fence per protected access — on the order of tens of nanoseconds — but turn the free into a deferred scan, so reclamation is batched and amortized.
- A single CAS is ~10–30 cycles uncontended; the ABA defenses don't change that order of magnitude. The cost of getting it wrong, by contrast, is a use-after-free — unbounded.
JavaScript: simulating the interleaving
JavaScript is single-threaded, so it can't exhibit a real data race — but that makes it a clean teaching tool: we model "shared memory" plus an explicit allocator that recycles addresses, and step the interleaving by hand. This is exactly the sequence the 3D visualization animates.
// A toy heap where freed slots get handed back out — the source of ABA.
class RecyclingHeap {
constructor() { this.cells = new Map(); this.freeList = []; this.next = 1; }
alloc(next) {
const addr = this.freeList.length ? this.freeList.shift() : this.next++; // hands back the first freed slot
this.cells.set(addr, { next }); // a stack node: { next }
return addr; // an "address"
}
free(addr) { this.cells.delete(addr); this.freeList.push(addr); } // recycled!
read(addr) { return this.cells.get(addr); }
}
// Naive Treiber-stack head, vulnerable to ABA.
function cas(box, expected, val) { // value-comparing CAS
if (box.head === expected) { box.head = val; return true; }
return false;
}
const heap = new RecyclingHeap();
const A = heap.alloc(null); // A.next = null
const B = heap.alloc(A); // B.next = A
const box = { head: B }; // stack: head -> B -> A
// Thread 1 begins a pop: reads head=B, then B.next=A, plans CAS(head, B, A)
const t1_expected = B;
const t1_newHead = heap.read(B).next; // = A, captured NOW
// Thread 1 is preempted. Thread 2 runs a full pop, pop, push:
let nextHead = heap.read(box.head).next; // pop B: read .next BEFORE freeing
heap.free(box.head); box.head = nextHead; // free B, head -> A
nextHead = heap.read(box.head).next; // pop A: read .next (= null)
heap.free(box.head); box.head = nextHead; // free A, head -> null
const A2 = heap.alloc(null); // push: allocator reuses B's freed address
box.head = A2; // head -> B-address again (new object!)
// Thread 1 resumes — its CAS bits still "match":
const ok = cas(box, t1_expected, t1_newHead); // head === B-address, so SUCCEEDS (bug!)
console.log(ok, box.head); // true, 1 -> head now points at FREED node A (addr 1)
The fix is to compare a (pointer, tag) pair instead of a bare pointer. Every successful CAS bumps the tag, so the stale (B, tag0) no longer equals the live (B, tag3):
function casTagged(box, expPtr, expTag, newPtr) {
if (box.ptr === expPtr && box.tag === expTag) {
box.ptr = newPtr; box.tag = expTag + 1; // monotonic version bump
return true;
}
return false; // ABA caught: tag moved on
}
Python: tagged-pointer CAS that defeats ABA
Same idea in Python, written as the read-modify-CAS retry loop you'd actually deploy. The state is a (value, version) tuple, and the CAS only succeeds if both match — so a value that left and came back is rejected by its version.
from dataclasses import dataclass
@dataclass
class Tagged:
value: object
version: int = 0 # monotonic counter, the ABA antidote
class AtomicTagged:
def __init__(self, value):
self._state = Tagged(value, 0)
def load(self):
s = self._state
return s.value, s.version # read BOTH halves together
def cas(self, exp_value, exp_version, new_value):
s = self._state
if s.value == exp_value and s.version == exp_version:
self._state = Tagged(new_value, exp_version + 1)
return True
return False # value OR version mismatch -> fail
def increment(reg: AtomicTagged):
while True: # standard retry loop
v, ver = reg.load()
if reg.cas(v, ver, v + 1): # version makes A->B->A detectable
return v + 1
# A bare value-CAS would see A again and wrongly succeed;
# the version moved from ver to ver+2 across the round trip, so it fails and retries.
Note the structural rule that survives both languages: you must read the pointer and its tag in a single atomic load, and write them in a single atomic CAS. If you read them separately you've just reintroduced a race on the tag itself.
Defenses and variants worth knowing
Tagged pointers (version counters / "ABA tags"). Pack a counter beside the pointer and CAS the wide word. On x86-64 this needs CMPXCHG16B (128-bit). A cheaper trick on 64-bit platforms steals the unused high pointer bits (current hardware uses only 48 bits of virtual address) for a ~16-bit tag — but that tag is narrow enough to wrap, so it only reduces, not eliminates, the risk.
Hazard pointers (Maged Michael, 2004). Each thread publishes the pointers it's about to dereference into per-thread "hazard" slots. A node is only freed once no hazard slot points at it. This kills ABA by killing the recycling: A can't come back as a new node while anyone might still CAS against it. See reference counting for the conceptual cousin.
Epoch-based reclamation and RCU. Defer every free until all threads have passed through a quiescent state (advanced an epoch). The Linux kernel's RCU is the production-scale example. Like hazard pointers, it prevents premature reuse rather than detecting it.
Load-linked / store-conditional (LL/SC). A hardware primitive on ARM (LDXR/STXR), RISC-V (LR/SC), and POWER. The store-conditional fails if the cache line was written at all since the load-linked — so an A→B→A round trip fails the SC even though the value is identical. Structurally immune to ABA, at the cost of spurious failures from unrelated cache traffic.
Double-compare-and-swap (DCAS) and hardware transactional memory. Compare-and-swap two independent words atomically, or wrap a region in a transaction (Intel TSX). Rarely available portably, but sidesteps ABA by widening the atomic unit.
Common bugs and edge cases
- "My counter only goes up, so I'm safe" — until it wraps. A 32-bit sequence number at high throughput wraps in seconds; a wrapped counter that lands on a stale expected value is textbook ABA. Width matters.
- Narrow stolen-bit tags. A 16-bit tag in pointer high bits feels free but wraps after 65,536 writes — under contention that's milliseconds. Use a true double-width CAS if you need a real guarantee.
- Reading pointer and tag in two instructions. If the tagged word isn't loaded and stored as one atomic unit, you've moved the race from the pointer to the (pointer, tag) read. The whole word must be one CAS.
- Forgetting a hazard pointer on one access path. Hazard-pointer schemes are only as safe as their least-careful dereference; one unprotected read reopens use-after-free and, with it, ABA.
- Assuming a GC language has no ABA. It's true for pointers the GC tracks — but a CAS on a recycled index or a wrapping version int in a Java/Go ring buffer can still ABA, because the GC doesn't protect plain integers.
- Trusting that a memory fence fixes it. Barriers order visibility; they don't give CAS a memory of past writes. ABA is an identity problem, not an ordering problem.
Frequently asked questions
What exactly is the ABA problem?
A thread reads a value A from shared memory, gets preempted, and meanwhile other threads change the value to B and then back to A. When the first thread resumes and runs compare-and-swap expecting A, the CAS succeeds — even though the world changed underneath it. CAS compares bits, not history, so it cannot tell A is the same A it originally read.
Why doesn't compare-and-swap detect the change?
Compare-and-swap only asks one question: does memory still hold the expected bit pattern? It has no memory of how many times the location was written. If a pointer is freed and the allocator hands the same address back for a new node, the bits match and CAS happily swaps — installing a stale pointer into a live data structure.
How do tagged pointers fix the ABA problem?
You pack a monotonic version counter alongside the pointer and CAS the whole (pointer, counter) word at once. Every successful write bumps the counter, so even if the pointer returns to A, the counter is now higher and the CAS comparing the old (A, n) tag fails. A 16-bit tag wraps after 65,536 writes; a 64-bit double-width CAS gives a tag wide enough to never wrap in practice.
What's the difference between ABA and a use-after-free bug?
They're cousins. ABA is what happens when a memory address is reused: the address comes back to life as a different object, and a lagging CAS treats it as the original. Safe memory reclamation schemes — hazard pointers, epoch-based reclamation, RCU — solve both by ensuring a node is never freed while any thread might still dereference or CAS against it.
Does load-linked/store-conditional avoid ABA?
Yes. LL/SC (on ARM, RISC-V, POWER) tracks whether the cache line was written between the load-linked and the store-conditional, not just whether the value matches. Any intervening store — even one that restores the original value — makes the store-conditional fail. That's strictly stronger than value-comparing CAS, which is exactly why x86's CMPXCHG is vulnerable and LL/SC is not.
Is the ABA problem only about pointers?
No. Any CAS on a recycled value is exposed — a slot index in a free-list, a sequence number that wraps, a version stamp that overflows. Pointers are the classic case because allocators aggressively recycle addresses, but a plain integer counter that decrements to A, increments to B, and decrements back to A triggers the same false-success.