Data Structures
Red-Black Tree
Self-balancing BST with looser rules — fewer rotations, faster mutations
A red-black tree is a binary search tree where every node is colored red or black, and a small set of color-and-balance rules keep the tree's height within 2× log n. It guarantees O(log n) for all operations, like AVL trees, but rebalances with fewer rotations on insert and delete — which is why std::map, Java's TreeMap, and the Linux kernel use it.
- Search / insert / deleteO(log n)
- Max height2 × log₂(n + 1)
- Rotations on insertAt most 2
- Rotations on deleteAt most 3
- vs AVL treeSlightly slower lookups, faster mutations
- Used bystd::map, Java TreeMap, Linux kernel CFS scheduler
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How a red-black tree works
It's a BST with one extra bit per node — red or black — and five rules that together cap the tree's height at 2 × log₂(n+1):
- Every node is either red or black.
- The root is black.
- Every leaf (NIL/sentinel) is black.
- If a node is red, both its children are black. (No red-red sequences along any path.)
- Every path from a given node to any descendant leaf contains the same number of black nodes (the "black height" property).
The interplay matters. Rule 5 means every path has the same number of black nodes. Rule 4 means red nodes can't be adjacent — so red nodes are at most half of any path. The longest possible path has alternating black-red-black-red, which is at most twice the length of an all-black path. Hence: height ≤ 2 × log₂(n+1).
Red-black vs AVL trees
| Red-black tree | AVL tree | |
|---|---|---|
| Balance criterion | Heights differ by at most 2× | Heights differ by at most 1 |
| Max height | 2 × log₂(n+1) | 1.44 × log₂(n+2) |
| Lookup speed | Slightly slower | Slightly faster |
| Insert rotations (worst) | 2 | O(log n) recoloring + 2 rotations |
| Delete rotations (worst) | 3 | O(log n) |
| Memory per node | + 1 color bit (often padded to 1 byte) | + height/balance factor (1 byte) |
| Best for | Mutation-heavy workloads | Read-heavy workloads |
| Used by | std::map, TreeMap, Linux kernel | Some database indexes, theoretical CS |
For a generic ordered map exposed as a library, RB tree wins because library users mix reads and writes — and RB has no horrible worst case on writes. Specialty workloads with strong read bias might prefer AVL.
Insert rebalancing — five cases
After a standard BST insert (the new node colored red so we don't violate the black-height rule), one of three situations holds:
- Parent is black. No fixup needed — done.
- Parent is red, uncle is red. Recolor parent and uncle to black, grandparent to red. Recurse upward (now grandparent might violate red-red).
- Parent is red, uncle is black. Rotate around parent or grandparent (depends on whether the new node is on the inside or outside of the parent's subtree). At most 2 rotations and we're done — no further upward propagation.
The bound: case 2 may recurse up the tree (O(log n) recoloring), but it does no rotations. Case 3 ends the operation in O(1) more work. Total: at most 2 rotations per insert, O(log n) recolorings (which are O(1) memory writes each).
When to use a red-black tree
- Anywhere you'd use a BST. If you need an ordered map / set with O(log n) guarantees, RB tree is the default. Most languages' standard libraries pick it.
- Schedulers. Linux's Completely Fair Scheduler (CFS) keeps runnable processes ordered by virtual runtime in an RB tree. Pick the leftmost (lowest vruntime) to run next; insert/remove on every context switch.
- Range queries with frequent updates. Time-windowed analytics, IP routing tables (often), event timelines. RB trees support efficient successor/predecessor traversal and range scans.
- In-memory database indexes. Some embedded databases use RB trees as primary indexes — though most go with B-trees once the data exceeds memory.
Skip RB trees when you need persistent (immutable) ordered maps (use a balanced tree of a functional flavor like a finger tree), when concurrency is the dominant concern (use a skip list — naturally lock-free with CAS), or when keys are short and uniform (a hash table with O(1) average is faster).
JavaScript: minimal RB tree skeleton
const RED = 0, BLACK = 1;
const NIL = { color: BLACK, left: null, right: null, parent: null }; // sentinel
class RBNode {
constructor(key, value, color = RED) {
this.key = key; this.value = value;
this.color = color;
this.left = NIL; this.right = NIL; this.parent = NIL;
}
}
class RedBlackTree {
constructor() { this.root = NIL; }
search(key) {
let n = this.root;
while (n !== NIL) {
if (key === n.key) return n.value;
n = key < n.key ? n.left : n.right;
}
return undefined;
}
insert(key, value) {
// Standard BST insert + fixup; full implementation is ~80 lines
// Pseudocode flow:
// 1. Walk down the tree to find insertion point
// 2. Create new red node, attach as child
// 3. Run insertFixup to restore RB invariants:
// - while node's parent is red, decide case (uncle red? side?)
// - recolor and/or rotate appropriately
// - propagate upward if recoloring case
// 4. Color the root black
// ...
}
// Left rotation: y = x.right; x.right = y.left; y.left = x; ...
// Right rotation: mirror image
// delete + deleteFixup follow the same pattern but with more cases
}
The full, correct implementation runs ~250 lines (insert + insertFixup + delete + deleteFixup + both rotations + iterators). Refer to CLRS chapter 13 for the standard pseudocode if you need to implement one yourself.
When RB is the wrong choice
- Concurrent access. RB rebalancing rotations require multi-node locks. Skip lists are lock-free with CAS; B-trees with optimistic concurrency are easier. Reach for those for high-contention workloads.
- External storage (disk, network). The branching factor is 2 — way too low for disk. Each level is a separate cache line / disk read. B-trees with branching factor in the hundreds win — same height bound (log_b n) but vastly fewer cache misses.
- Hash-only workloads. If you only need point queries with no ordering, a hash table is O(1) average vs O(log n).
- Very small n (under ~30). A sorted array with binary search beats any tree on small data — better cache locality, no pointer overhead.
Common red-black tree bugs and edge cases
- Forgetting the NIL sentinel. Real implementations use a single shared sentinel for all "leaves" — simplifies the rebalancing code by ensuring every node has non-null children. Using actual null pointers means dozens of null checks scattered through the rebalance logic.
- Wrong case detection in fixup. The five insert cases (and similarly for delete) are easy to mistype. Always implement with reference to a known-good source (CLRS, Sedgewick) and write extensive randomized tests.
- Missing the "color the root black" step. The root must be black after every operation. If the fixup recurses up to the root and recolors it red, you must explicitly fix it.
- Comparing keys that aren't comparable. Like any BST, RB trees need a total ordering. Mixing strings and numbers, NaN values, or unstable comparison functions silently produces a tree that violates BST invariants.
- Implementing your own. The standard library version is ten years debugged. Don't roll your own unless you have a specific reason.
Frequently asked questions
What are the red-black tree invariants?
Five rules. (1) Every node is red or black. (2) The root is black. (3) Every leaf (NIL sentinel) is black. (4) A red node's children are both black — no two reds in a row. (5) Every path from a node to its descendant leaves contains the same number of black nodes (the "black height"). Together these guarantee height ≤ 2 × log₂(n+1).
How is a red-black tree different from an AVL tree?
AVL is more strictly balanced (heights of left and right subtrees differ by at most 1) — slightly faster lookups but more rotations on insert/delete. Red-black is looser (height ≤ 2 × log n) — slightly slower lookups but cheaper mutations. RB trees do at most 2 rotations per insert; AVL may need O(log n). For workloads with many mutations, RB wins; for read-heavy workloads, AVL has a tiny edge.
Why is the Linux kernel CFS scheduler a red-black tree?
The Completely Fair Scheduler picks the next process to run by extracting the minimum vruntime from a set of runnable processes. Insertions and deletions happen on every context switch — extremely mutation-heavy. RB tree's cheap mutations beat AVL here. Scheduler operations are O(log n) in the number of runnable processes.
What's a rotation in a red-black tree?
A local restructuring that moves a node up by one level while pushing its parent down — preserving the BST invariant. Two flavors — left rotation (right child becomes the parent) and right rotation (left child becomes the parent). Rotations rebalance the tree without violating ordering. Both run in O(1).
Why color nodes red or black?
It's the simplest way to encode local balance information. The five rules together force a specific structural property — the longest root-to-leaf path is at most twice the shortest. Two bits per node would suffice; one bit (red/black) does the job. The colors aren't structural — they're a coloring on top of an otherwise plain BST that lets the algorithm cheaply detect when rotations are needed.
Are red-black trees balanced?
Yes — log n + O(1) levels — but not as strictly balanced as AVL. The "black height" is the same on every path from a node to its leaves, but the actual height (counting both colors) can be up to 2× the black height because every black node may have one red child below it. So the longest path is at most 2× the shortest, giving height ≤ 2 × log₂(n+1).
When would I implement my own red-black tree?
Almost never. The standard library version (std::map in C++, TreeMap in Java) is more thoroughly tested and faster than anything you'll write. Implement one for educational understanding or for embedded systems where you can't pull in a library. For everything else, use the standard.