Data Structures

Van Emde Boas Tree

Q: Why O(log log U) and not O(log n)?

Comparison-based balanced BSTs answer predecessor in O(log n) — that is the comparison-model lower bound. Van Emde Boas exploits the fact that keys are integers from a bounded universe [0, U), giving access to non-comparison primitives like indexing and modular arithmetic. The recursion T(U) = T(sqrt(U)) + O(1) solves to T(U) = O(log log U). Concretely, for U = 2^32 (4 billion keys), log U = 32 but log log U = 5 — a 6x speedup. For U = 2^64, log log U = 6 vs log U = 64. The catch is that comparison-model lower bounds do not apply because we use indexing into arrays of size proportional to the universe.

Q: How does the sqrt(U) recursion work?

Split each key x of bit-length k into a high half (top k/2 bits) and low half (bottom k/2 bits). The vEB tree of universe U has a summary (a vEB tree of universe sqrt(U) tracking which clusters are non-empty) and an array of sqrt(U) clusters (each a vEB tree of universe sqrt(U)). Insert(x): if cluster x.high is empty, recurse into summary to mark it; recurse into cluster x.high with x.low. The recursion depth is log log U because each level halves the bit-length. Successor(x): try the local cluster; if no successor there, find the next non-empty cluster via summary and return that cluster's min. Each level does O(1) extra work on top of one recursive call.

Q: Why is space O(U) — and how does y-fast trie fix it?

A naive vEB tree allocates the full sqrt(U) cluster array eagerly, summing to S(U) = sqrt(U) * S(sqrt(U)) + S(sqrt(U)) ≈ O(U) bits. For U = 2^32 that is 512 MB even with one element — wildly wasteful. The y-fast trie (Willard 1983) replaces vEB's universe array with a hash table of x-fast tries containing only the non-empty positions. Result: O(n log log U) space (proportional to n, not U) while preserving O(log log U) query time amortized. The x-fast trie achieves O(log log U) successor by binary-searching the binary tree levels using a perfect hash. Real implementations of integer ordered sets in production (Tries in IP routers) use x-fast / y-fast variants.

Q: What is the relation to x-fast and y-fast tries?

All three target predecessor / successor on a bounded universe in O(log log U). vEB tree (1975): elegant recursion, O(U) space. x-fast trie (Willard 1983): a binary trie with a hash per level for fast prefix queries; O(n log U) space, O(log log U) successor query, O(log U) update. y-fast trie (also Willard 1983): partitions n keys into balanced BSTs of size O(log U); the BST representatives form an x-fast trie. Best-of-both: O(n) space, O(log log U) successor query, O(log log U) amortized update. y-fast trie is the production-quality vEB-style structure when memory matters.

Q: When does it beat skip list / red-black?

When keys are integers in a bounded universe and the universe is small enough that log U fits in a few CPU words. For U = 2^32, predecessor is 5 levels for vEB vs ~30 levels for a skip list of n = 10^9 elements — 6x fewer comparisons. But cache behavior matters: vEB requires O(log log U) random pointer chases through different clusters, often missing L1/L2 cache. A flat sorted array with a binary search tuned for cache (e.g. the Eytzinger layout) can match or beat vEB for n in the millions, despite being O(log n). Real-world wins for vEB occur in router longest-prefix match and integer-key priority queues with small U where the 5-vs-30 comparison advantage dominates.

Q: Why are real implementations rare?

Three reasons. First, the O(U) space of textbook vEB is impractical — only y-fast trie is space-efficient and it is harder to implement (requires a perfect hash table, careful representative-balancing). Second, the constant factor per O(log log U) step is high — pointer chasing, hash lookups, dynamic memory. A well-tuned 4-way B-tree or skiplist often beats vEB on n up to ~10^7. Third, modern integer-set workloads frequently benefit from amortized hashing or radix sort more than from vEB's specific predecessor query. Production code shipping vEB is rare — most appearances are in routing-table research and theoretical algorithm libraries. Boost and STL provide neither. The seminal use case — Thorup's 1996 O(m + n log log n) shortest-path algorithm — is more often realized via simpler radix heaps in practice.

Recursively halve the universe at √U — predecessor and successor in O(log log U)

A van Emde Boas (vEB) tree is a tree-based data structure that stores n integers from a universe of size U and supports membership, insert, delete, predecessor, and successor in O(log log U) time per operation — exponentially faster than the O(log n) of comparison-based balanced BSTs when U is small. The structure recursively partitions the universe into √U blocks, each itself a vEB tree. Invented by Peter van Emde Boas in 1975. Used in network packet routers (longest-prefix match) and integer-key priority queues for shortest paths in O(m log log U).

Per-opO(log log U)
SpaceO(U) (or O(n log log U) lazy)
Universe halvingat √U
Inventedvan Emde Boas 1975
Lazy varianty-fast trie
Beats balanced BST whenU = 2^O(log n)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why vEB matters

Theoretically optimal predecessor. O(log log U) per predecessor query for integer keys from a bounded universe — exponentially better than the O(log n) of any comparison-based structure. The Patrascu-Thorup lower bound shows this is optimal in the cell-probe model up to constants for U polynomial in n.
Foundational for integer priority queues. Thorup's 1996 result: shortest paths in O(m + n log log n) on integer-weight graphs uses a vEB-style queue to extract the min in O(log log W) per pop, where W bounds edge weights. Beats Fibonacci-heap Dijkstra's O(m + n log n) when weights are small integers.
Teaches recursion-on-universe. The clean T(U) = T(√U) + O(1) recurrence solves to log log U. The pattern "halve the bit-length, not the count" appears in fusion trees, signature sort, and integer hashing — vEB is the entry point to learning it.
Network packet routers. IP-prefix lookup tables for IPv4 (U = 2^32) and IPv6 (U = 2^128) need fast predecessor on integer keys — vEB and its space-efficient variants ship in core routing hardware. Cisco's NetFlow uses vEB-derived structures.
OS scheduler timer wheels. Linux's hierarchical timing wheels and BSD callout queues approximate vEB-like structures for O(log log T) timer expiry over a bounded time horizon.
Stepping stone to fusion trees. Fusion trees (Fredman, Willard 1990) achieve O(log n / log log n) predecessor in word-RAM via packed-comparisons; the design intuition starts from vEB's hierarchical universe split.

Structure

Recursive layout. A vEB tree of universe size U has: a min and max value at the top level (lazy storage); a summary structure (a smaller vEB tree of universe √U tracking which sub-clusters are non-empty); and an array cluster[0..√U - 1] of sub-vEB trees, each with universe √U.
Bit-split. Encode key x using k = log U bits. Define x.high = floor(x / √U) — the top k/2 bits — and x.low = x mod √U — the bottom k/2 bits. Then x lives in cluster[x.high] at position x.low.
Min and max stored explicitly. The minimum and maximum of the entire tree are stored at the root (and recursively at every sub-vEB) without descending. This is the trick that drives the O(log log U) bound: a single recursive call per operation, not log U levels.
Min outside. The min value is stored at the root but not in any cluster — it is "outside" the recursion. This avoids the recursion needing to deal with the smallest element, halving the work.
Empty representation. An empty vEB tree has min = max = NIL. A single-element tree has min = max = the element, with no clusters allocated.
Recursion depth. Universe halves bit-wise per level: U → √U → √(√U) → ... = U^(1/2^d) at depth d. Reaches universe size 2 after log log U levels.

Core operations

Member(x): O(log log U). If x is min or max, return true. Else recurse into cluster[x.high] for x.low.
Successor(x): O(log log U). If x < min, return min. Else look in cluster[x.high]: if cluster[x.high] is non-empty and its max > x.low, recurse to find x.low's successor in that cluster, prepend x.high. Otherwise consult summary to find the next non-empty cluster (call summary.successor(x.high)); return its min prepended.
Predecessor(x): O(log log U). Symmetric to successor.
Insert(x): O(log log U). Special case empty / one-element trees. Otherwise: if cluster[x.high] was empty, recursively insert x.high into summary AND set the new cluster's min to x.low without further recursion. Otherwise recurse into cluster[x.high] with x.low. Critical optimization: at most one recursive call per level (either summary OR cluster, not both).
Delete(x): O(log log U). Symmetric: if cluster[x.high] becomes empty, remove x.high from summary; otherwise recurse into cluster.
Why one recursion. Each operation makes a single recursive call into a sub-vEB on universe √U, plus O(1) work. T(U) = T(√U) + O(1) solves to T(U) = O(log log U). If we instead recursed into both summary and cluster naively, we'd get O(log U).

Space analysis

Naive recurrence. S(U) = √U * S(√U) + S(√U) + O(1). Solving: S(U) = O(U).
Concrete cost. U = 2^16 (65,536 keys): S = 64 KB. U = 2^32: S = 4 GB — prohibitive even for a single key. U = 2^64: 1.8 * 10^19 bytes — physically impossible.
Lazy allocation. Allocate clusters and summary lazily on first insertion. With n keys, only O(n log log U) clusters are ever touched, giving practical space O(n log log U) at the cost of pointer-based indirection.
x-fast trie. Replace vEB's universe array with a hash table containing only the non-empty positions. Predecessor in O(log log U) via per-level hash lookups. Space O(n log U).
y-fast trie. Group n keys into n / log U buckets of balanced BSTs of size log U each. The bucket representatives form an x-fast trie. Space O(n), per-op time O(log log U) amortized. The production-quality choice when memory matters.
Direct comparison. For n = 10^6, U = 2^32: vEB lazy 200 MB, y-fast trie 20 MB, balanced BST 24 MB.

A worked successor example

Setup. U = 16, so √U = 4. Keys stored: {2, 3, 4, 5, 7, 14, 15}. Each key x splits into x.high (top 2 bits) and x.low (bottom 2 bits). Clusters: cluster[0] = {2, 3} (high = 00, lows {10, 11}); cluster[1] = {4, 5, 7} (high = 01, lows {00, 01, 11}); cluster[3] = {14, 15} (high = 11, lows {10, 11}). Summary tracks {0, 1, 3}.
Successor(5). 5 is not min (which is 2). 5.high = 1, 5.low = 1. Look at cluster[1]: max is 7 (low = 11). 1 (5.low) < 11 (cluster max), so the successor is in cluster[1]. Recurse: cluster[1].successor(1) finds low = 11 (key 7). Result: high * √U + low = 1 * 4 + 3 = 7.
Successor(7). 7.high = 1, 7.low = 11 (which equals cluster[1].max). Cluster[1] has no successor. Consult summary: summary.successor(1) returns 3. cluster[3].min is 14 (low = 10). Result: 3 * 4 + 2 = 14.
Successor(15). Equals max — no successor in this universe. Return NIL.
Recursion depth. log2 log2 16 = log2(4) = 2 levels of recursion. Each step does O(1) work plus one recursive call. Total operations for any successor: ~10 array indexes and pointer dereferences.

Common misconceptions

"Fastest in practice." Theoretically yes for predecessor; practically no for n < 10^6. The constant factor per O(log log U) step is high — pointer chasing, conditional branches, possible cache misses. A flat sorted array with cache-aware binary search often beats vEB on small n. Production integer ordered sets typically use B-trees, hash tables, or radix-based structures.
"Always O(log log U) memory." The textbook recurrence gives S(U) = O(U), not O(log log U). Without the y-fast trie or lazy allocation, a vEB tree for U = 2^32 needs gigabytes regardless of how few keys it stores.
"Works for arbitrary keys." Only integers from a bounded universe. Strings, floats, or unbounded integers do not fit; you must hash or quantize them first.
"Membership is fast like a hash." Hash tables answer membership in O(1) average. vEB membership is O(log log U) — slower for membership-only workloads. The advantage shows for predecessor/successor queries where hashes give nothing.
"Recurrence is T(U) = 2 T(√U) + O(1)." No. The correct recurrence has a single recursive call per level — either summary OR cluster, not both. Two recursive calls would give T(U) = O(log U), not log log U.
"vEB is the way to do shortest paths in O(m + n log log n)." Not literally. Thorup's algorithm uses vEB-style hierarchical bucketing but is delicate; many implementations use radix heaps with the same asymptotic bound at lower constants.
"Just like a B-tree." Different. B-trees are comparison-based with O(log_B n) depth and use blocks of size B; vEB is non-comparison and exploits integer arithmetic. Their bounds and applicability are different.

Practical alternatives

y-fast trie. O(n) space, O(log log U) amortized per op. The space-efficient successor of vEB. Production-quality implementations exist in research libraries (libsdsl, succinct).
x-fast trie. O(n log U) space, O(log log U) successor, O(log U) update. Useful when reads dominate writes.
Fusion tree. Fredman, Willard 1990 — O(log n / log log n) predecessor in word-RAM. Different bound shape than vEB but theoretically interesting on word-RAM with w = log U bits.
Radix heap. Ahuja et al. 1990 — bucketed priority queue for non-decreasing extractions. Used in practice for Dijkstra over bounded integer weights with O(m + n log W / log log W) total.
4-way B-tree. Branching factor 4 with cache-aligned nodes. Beats vEB on real benchmarks for n < 10^6 due to cache friendliness, despite worse asymptotic bound.
Hash table + sorted bucket. Cuckoo or robin hood hash for O(1) membership; secondary skip-list or BST for ordered queries. Production ordered-integer sets (Java's TreeSet, C++'s std::set) take a balanced-BST approach because vEB's universe-size dependency is awkward.

Frequently asked questions

Why O(log log U) and not O(log n)?

Comparison-based balanced BSTs answer predecessor in O(log n) — that is the comparison-model lower bound. Van Emde Boas exploits the fact that keys are integers from a bounded universe [0, U), giving access to non-comparison primitives like indexing and modular arithmetic. The recursion T(U) = T(sqrt(U)) + O(1) solves to T(U) = O(log log U). Concretely, for U = 2^32 (4 billion keys), log U = 32 but log log U = 5 — a 6x speedup. For U = 2^64, log log U = 6 vs log U = 64. The catch is that comparison-model lower bounds do not apply because we use indexing into arrays of size proportional to the universe.

How does the sqrt(U) recursion work?

Split each key x of bit-length k into a high half (top k/2 bits) and low half (bottom k/2 bits). The vEB tree of universe U has a summary (a vEB tree of universe sqrt(U) tracking which clusters are non-empty) and an array of sqrt(U) clusters (each a vEB tree of universe sqrt(U)). Insert(x): if cluster x.high is empty, recurse into summary to mark it; recurse into cluster x.high with x.low. The recursion depth is log log U because each level halves the bit-length. Successor(x): try the local cluster; if no successor there, find the next non-empty cluster via summary and return that cluster's min. Each level does O(1) extra work on top of one recursive call.

Why is space O(U) — and how does y-fast trie fix it?

A naive vEB tree allocates the full sqrt(U) cluster array eagerly, summing to S(U) = sqrt(U) * S(sqrt(U)) + S(sqrt(U)) ≈ O(U) bits. For U = 2^32 that is 512 MB even with one element — wildly wasteful. The y-fast trie (Willard 1983) replaces vEB's universe array with a hash table of x-fast tries containing only the non-empty positions. Result: O(n log log U) space (proportional to n, not U) while preserving O(log log U) query time amortized. The x-fast trie achieves O(log log U) successor by binary-searching the binary tree levels using a perfect hash. Real implementations of integer ordered sets in production (Tries in IP routers) use x-fast / y-fast variants.

What is the relation to x-fast and y-fast tries?

All three target predecessor / successor on a bounded universe in O(log log U). vEB tree (1975): elegant recursion, O(U) space. x-fast trie (Willard 1983): a binary trie with a hash per level for fast prefix queries; O(n log U) space, O(log log U) successor query, O(log U) update. y-fast trie (also Willard 1983): partitions n keys into balanced BSTs of size O(log U); the BST representatives form an x-fast trie. Best-of-both: O(n) space, O(log log U) successor query, O(log log U) amortized update. y-fast trie is the production-quality vEB-style structure when memory matters.

When does it beat skip list / red-black?

When keys are integers in a bounded universe and the universe is small enough that log U fits in a few CPU words. For U = 2^32, predecessor is 5 levels for vEB vs ~30 levels for a skip list of n = 10^9 elements — 6x fewer comparisons. But cache behavior matters: vEB requires O(log log U) random pointer chases through different clusters, often missing L1/L2 cache. A flat sorted array with a binary search tuned for cache (e.g. the Eytzinger layout) can match or beat vEB for n in the millions, despite being O(log n). Real-world wins for vEB occur in router longest-prefix match and integer-key priority queues with small U where the 5-vs-30 comparison advantage dominates.

Why are real implementations rare?

Three reasons. First, the O(U) space of textbook vEB is impractical — only y-fast trie is space-efficient and it is harder to implement (requires a perfect hash table, careful representative-balancing). Second, the constant factor per O(log log U) step is high — pointer chasing, hash lookups, dynamic memory. A well-tuned 4-way B-tree or skiplist often beats vEB on n up to ~10^7. Third, modern integer-set workloads frequently benefit from amortized hashing or radix sort more than from vEB's specific predecessor query. Production code shipping vEB is rare — most appearances are in routing-table research and theoretical algorithm libraries. Boost and STL provide neither. The seminal use case — Thorup's 1996 O(m + n log log n) shortest-path algorithm — is more often realized via simpler radix heaps in practice.