Data Structures
Union-Find
Are these two items in the same group? — answered in O(α(n))
Union-Find (also called Disjoint Set Union) tracks a partition of n elements into disjoint groups. Two operations — find (which group is X in?) and union (merge the groups of X and Y) — both run in O(α(n)) amortized, where α is the inverse Ackermann function (effectively a small constant for any imaginable n). Used in Kruskal's MST, percolation simulations, and connected-component tracking.
- Find (with path compression + union by rank)O(α(n)) amortized — effectively O(1)
- Union (with both optimizations)O(α(n)) amortized
- SpaceO(n) — one parent pointer + rank per element
- α(n) for n = 10^80 (atoms in universe)Less than 5
- Vs naive approach (no optimizations)O(log n) instead of O(α)
- Used inKruskal's MST, image segmentation, network connectivity
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How union-find works
Each element points to a "parent." Roots point to themselves. Finding the root of an element walks up the parent chain. Two elements are in the same group if they have the same root.
- find(x): walk up x's parent chain until reaching a root.
- union(x, y): find their roots; if different, attach one under the other.
Naive implementations cost O(n) for find in the worst case (long chain). Two optimizations bring it down to O(α(n)) amortized — virtually constant.
The two optimizations that make it fast
Path compression
When you walk up the chain from x to its root, point every node along the path directly at the root. Subsequent finds for those nodes are O(1).
function find(x) {
if (parent[x] !== x) {
parent[x] = find(parent[x]); // recurse, then collapse path
}
return parent[x];
}
Iterative version avoids recursion stack:
function find(x) {
let root = x;
while (parent[root] !== root) root = parent[root];
// Second pass: compress
while (parent[x] !== root) {
const next = parent[x];
parent[x] = root;
x = next;
}
return root;
}
Union by rank
When merging two trees, attach the shorter under the taller. This keeps the maximum depth at O(log n) even without path compression. Rank is an upper bound on depth — incremented only when two equal-rank trees merge.
function union(x, y) {
const rx = find(x), ry = find(y);
if (rx === ry) return false; // already in same group
if (rank[rx] < rank[ry]) parent[rx] = ry;
else if (rank[rx] > rank[ry]) parent[ry] = rx;
else { parent[ry] = rx; rank[rx]++; }
return true;
}
Why both optimizations together?
| Optimizations | Find amortized | Union amortized | Tree height |
|---|---|---|---|
| None | O(n) worst case | O(n) worst case | O(n) |
| Path compression only | O(log n) amortized | O(log n) amortized | Variable |
| Union by rank only | O(log n) per find | O(log n) per union | O(log n) |
| Both | O(α(n)) amortized | O(α(n)) amortized | Effectively bounded |
Either optimization alone gives O(log n). Together they multiply down to O(α(n)) — effectively O(1) — proven by Tarjan in 1975 in one of the deepest results in classical algorithm analysis.
JavaScript implementation
class UnionFind {
constructor(n) {
this.parent = Array.from({ length: n }, (_, i) => i);
this.rank = new Array(n).fill(0);
this.componentCount = n;
}
find(x) {
let root = x;
while (this.parent[root] !== root) root = this.parent[root];
// Path compression
while (this.parent[x] !== root) {
const next = this.parent[x];
this.parent[x] = root;
x = next;
}
return root;
}
union(x, y) {
const rx = this.find(x), ry = this.find(y);
if (rx === ry) return false;
// Union by rank
if (this.rank[rx] < this.rank[ry]) {
this.parent[rx] = ry;
} else if (this.rank[rx] > this.rank[ry]) {
this.parent[ry] = rx;
} else {
this.parent[ry] = rx;
this.rank[rx]++;
}
this.componentCount--;
return true;
}
connected(x, y) { return this.find(x) === this.find(y); }
}
Python implementation
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
self.components = n
def find(self, x):
root = x
while self.parent[root] != root:
root = self.parent[root]
# Path compression
while self.parent[x] != root:
self.parent[x], x = root, self.parent[x]
return root
def union(self, x, y):
rx, ry = self.find(x), self.find(y)
if rx == ry: return False
if self.rank[rx] < self.rank[ry]:
self.parent[rx] = ry
elif self.rank[rx] > self.rank[ry]:
self.parent[ry] = rx
else:
self.parent[ry] = rx
self.rank[rx] += 1
self.components -= 1
return True
Famous union-find problems
Kruskal's MST
function kruskal(n, edges) {
// edges: [[u, v, weight], ...]
edges.sort((a, b) => a[2] - b[2]);
const uf = new UnionFind(n);
const mst = [];
for (const [u, v, w] of edges) {
if (uf.union(u, v)) {
mst.push([u, v, w]);
if (mst.length === n - 1) break;
}
}
return mst;
}
Sort edges by weight, take each one if it doesn't create a cycle (which union-find checks in α(n)). Total: O(m log m) for the sort + O(m α(n)) for the unions ≈ O(m log m).
Number of Islands II
Given a grid initially all water, you add land cells one at a time. After each addition, report the number of islands. Each addition unions the new land cell with up to 4 neighboring land cells. The component count from union-find is the answer. O(k α(n)) for k operations.
Accounts Merge
Given a list of accounts (each with a list of emails), merge accounts that share at least one email. Map each email to a node, union the emails in each account, then group emails by root. Linear in total emails.
When to reach for union-find
- Connectivity over a stream of edge additions. Kruskal's MST is the canonical example. Any algorithm that processes edges and asks "are these connected?" benefits.
- Connected components in image processing. Pixel grouping, region labeling, image segmentation — union neighboring pixels with similar properties.
- Equivalence classes / set partitioning. "Group these accounts by some equivalence relation." Union-find is the partition data structure.
- Cycle detection in undirected graphs. Process edges; if both endpoints already share a root, the edge would create a cycle. Faster than DFS for offline / batch processing.
- Network failure tolerance. Track which nodes are still connected as edges go down. Union-find handles additions; for deletions, link-cut trees or other dynamic structures are required.
Common union-find bugs
- Forgetting path compression. Without it, finds become O(log n) and the algorithm degrades. Always include it for production use.
- Implementing recursive find on deep trees. Path compression naturally limits depth, but during the first deep find you may stack-overflow. Use the iterative two-pass version above for safety.
- Initializing rank inconsistently. All ranks should start at 0. Initial parents must be self-references (each element is its own root in its own group).
- Returning the wrong rank semantics. Rank is NOT the exact tree depth — it's an upper bound that's easier to maintain. Don't try to use rank for anything other than the union-by-rank heuristic.
- Confusing "weight" with "rank." Some textbooks use "union by size" (count of elements) instead of rank. Both work and give the same asymptotic bound; pick one and stick with it.
Frequently asked questions
What's the inverse Ackermann function?
α(n) is the inverse of an extremely fast-growing function. For all practical inputs (n ≤ 2^65536, far more than atoms in the universe), α(n) ≤ 4. So O(α(n)) is effectively O(1) — just with a formal asymptotic that proves the bound for any conceivable input size. The proof that union-find with path compression and union-by-rank achieves this bound is one of the most beautiful results in algorithms (Tarjan, 1975).
What's path compression?
When walking up the tree to find a root, point every node along the path directly at the root. Subsequent finds for those nodes are O(1). Without path compression, each find is O(tree height). With it, repeated finds amortize to nearly constant time. The tree gets flatter every time we touch it.
What's union by rank?
When merging two trees, attach the shorter tree under the root of the taller one. This keeps tree heights small — log n at most without path compression, and α(n) effectively when combined. The "rank" is an upper bound on the tree's depth (not exactly the depth — it's an estimate that's easier to maintain).
How does Kruskal's algorithm use union-find?
Kruskal's MST processes edges in sorted order, adding each one if it doesn't create a cycle. "Doesn't create a cycle" = "the two endpoints are in different components." Union-find tracks the components: find(u) and find(v) are equal iff they're already in the same component (would create a cycle); if different, take the edge and union them. m × O(α(n)) = O(m α(n)) total — almost linear in the edge count.
Can union-find handle deletes?
Not naturally. Deleting an element requires splitting a group, which is much harder than merging. Solutions exist but are complex (link-cut trees, Euler-tour trees) and slower. If you need both union and split, plain union-find isn't the right tool.
Why use union-find instead of just BFS/DFS for connectivity?
BFS/DFS computes connectivity from scratch — O(V + E) per query. If you have many connectivity queries interleaved with edge additions, union-find amortizes to O(α(n)) per operation. For static graphs, BFS/DFS once. For dynamic graphs with mostly-merge updates, union-find.
How does union-find help with image segmentation?
Treat each pixel as a node. For each pair of adjacent pixels with similar color, union them. After processing all adjacent pairs, find(pixel) returns a group id; pixels with the same id are in the same connected region. Linear time in pixel count via union-find; trivial to parallelize the unions.