Data Structures
Centroid Decomposition
Halve the tree at every step — log N depth, distance queries in seconds
Centroid decomposition recursively splits a tree at its centroid — the node whose removal leaves balanced subtrees. The result is a recursion tree of depth log N, the basis for tree distance queries and divide-and-conquer on trees.
- BuildO(N log N)
- Centroid-tree depth≤ log₂ N
- Distance queryO(log N) typical
- Distance updateO(log N) per ancestor
- MemoryO(N log N)
- N = 10⁵~17 levels deep
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How centroid decomposition works
Consider a tree of N nodes. A centroid is a node whose removal partitions the remaining tree into subtrees each of size ≤ ⌊N/2⌋. Every tree has at least one centroid (sometimes two — adjacent — when the tree is even in size). You find it by computing subtree sizes from an arbitrary root, then walking toward the heaviest child until every remaining subtree has size ≤ N/2.
Centroid decomposition runs as a recursion:
- Find the centroid c of the current component.
- Make c a node of the centroid tree.
- Remove c. For each remaining component, recurse — the result becomes a child of c in the centroid tree.
The centroid tree has depth ≤ log₂ N because each recursive call at least halves component size. Crucially, the centroid tree is independent of the original tree's shape: a 10⁵-node linear path has a centroid tree of depth ~17, even though the original has depth 10⁵.
The key property: every path in the original tree passes through the centroid that becomes the lowest common ancestor of its endpoints in the centroid tree. That makes distance and counting problems decomposable — for each centroid c, look only at paths whose midpoint or LCA in the centroid tree is c.
Worked example — distance queries
Take a binary-tree-shaped tree of 15 nodes rooted at 1. The centroid sits somewhere near the middle — say node 4 — whose removal splits the tree into three roughly-7-node components. Inside each component find the local centroid (a node like 2 or 11), recurse, until every component has size 1.
For each node u and each of its centroid-tree ancestors c, precompute dist(u, c) in the original tree — at most log N entries per node. Total storage: O(N log N).
To answer "what's the closest red node to u", walk u's centroid-tree ancestors. At each ancestor c, look up the minimum dist(red, c) over all red nodes recorded at c, then add dist(u, c). Take the minimum across log N ancestors. Each query: O(log N) time, sometimes O(log² N) when updates are interleaved.
Numbers for N = 10⁵: 17 ancestors per node, 17 lookups per query. A red-update touches 17 ancestor records. Compare to brute force — O(N) BFS per query, 10⁵ ops × 10⁵ queries = 10¹⁰ ops, far too slow.
When to use centroid decomposition
- Tree distance queries with updates — closest-red, k-th nearest, weight-conditioned shortest path.
- Counting paths in a tree with a property — paths of length k, paths whose XOR equals a value, paths with sum divisible by k.
- Divide and conquer on trees where the problem splits at any balanced cut — find pairs (u, v) satisfying some predicate.
- Offline batched queries that benefit from the centroid-tree ancestor chain caching mechanism.
- Augmented data structures over trees where path-shaped structures (HLD) don't apply.
Avoid it when the operation needs path-aggregate queries with point updates — HLD is faster and simpler. Avoid it when updates change tree structure — fall back to link-cut trees.
Centroid decomposition vs HLD vs Euler tour
| Centroid decomposition | Heavy-light decomposition | Euler tour + segment tree | |
|---|---|---|---|
| Path query | Limited / awkward | O(log² N) | No native support |
| Distance query | O(log N) typical | O(log² N) with extra work | O(log N) for LCA + O(1) depth |
| Counting paths-with-property | O(N log² N) total | Awkward | Awkward |
| Subtree query | Awkward | O(log N) | O(log N) |
| Build | O(N log N) | O(N) | O(N) |
| Memory | O(N log N) | O(N) | O(N) |
| Code length | ~80 lines | ~80 lines | ~50 lines |
The three are complementary. Path query: HLD. Subtree query: Euler tour. Distance and counting: centroid.
Pseudo-code
subtreeSize[v] = size of v's subtree (re-computed per component)
deleted[v] = whether v has been removed
function dfsSize(u, parent):
subtreeSize[u] = 1
for v in children(u):
if v == parent or deleted[v]: continue
subtreeSize[u] += dfsSize(v, u)
return subtreeSize[u]
function findCentroid(u, parent, totalSize):
for v in children(u):
if v == parent or deleted[v]: continue
if subtreeSize[v] > totalSize / 2:
return findCentroid(v, u, totalSize)
return u
function decompose(u, centroidParent):
totalSize = dfsSize(u, -1)
c = findCentroid(u, -1, totalSize)
centroidTreeParent[c] = centroidParent
deleted[c] = true
for v in neighbours(c):
if not deleted[v]:
decompose(v, c)
JavaScript implementation
class CentroidDecomp {
constructor(tree) {
this.n = tree.length;
this.tree = tree;
this.size = new Int32Array(this.n);
this.deleted = new Uint8Array(this.n);
this.parent = new Int32Array(this.n).fill(-1); // centroid-tree parent
this.decompose(0, -1);
}
dfsSize(u, p) {
this.size[u] = 1;
for (const v of this.tree[u]) {
if (v === p || this.deleted[v]) continue;
this.dfsSize(v, u);
this.size[u] += this.size[v];
}
}
findCentroid(u, p, total) {
for (const v of this.tree[u]) {
if (v === p || this.deleted[v]) continue;
if (this.size[v] > total / 2) {
// re-orient size for the walk
this.size[u] = total - this.size[v];
return this.findCentroid(v, u, total);
}
}
return u;
}
decompose(u, centroidParent) {
this.dfsSize(u, -1);
const total = this.size[u];
const c = this.findCentroid(u, -1, total);
this.parent[c] = centroidParent;
this.deleted[c] = 1;
for (const v of this.tree[c]) {
if (!this.deleted[v]) this.decompose(v, c);
}
}
ancestors(u) {
const path = [];
while (u !== -1) { path.push(u); u = this.parent[u]; }
return path; // root of centroid tree last
}
}
// Example
const tree = [[1,2], [0,3,4], [0,5,6], [1], [1], [2], [2]];
const cd = new CentroidDecomp(tree);
console.log(cd.parent); // centroid-tree parents
console.log(cd.ancestors(3)); // path 3 → ... → root
Python implementation — closest-red query
Combine the centroid tree with per-centroid sorted distance lists, and you can answer "closest red node to u" with point updates. Each centroid keeps a min-heap of distances to red nodes recorded under it; query and update walk u's O(log N) centroid ancestors.
import heapq
from collections import defaultdict, deque
class ClosestRed:
def __init__(self, tree):
self.n = len(tree)
self.tree = tree
self.size = [0] * self.n
self.deleted = [False] * self.n
self.parent = [-1] * self.n
# For each node u, distance to each centroid ancestor
self.dist_to = [dict() for _ in range(self.n)]
# For each centroid c, heap of (dist, node) for red nodes
self.heap = defaultdict(list)
# For each centroid c, current "alive" status of red node counts
self.is_red = [False] * self.n
self._decompose(0, -1)
def _dfs_size(self, root):
order = []
stack = [(root, -1)]
while stack:
u, p = stack.pop()
self.size[u] = 1
order.append((u, p))
for v in self.tree[u]:
if v == p or self.deleted[v]: continue
stack.append((v, u))
for u, p in reversed(order):
if p != -1 and not self.deleted[p]:
self.size[p] += self.size[u]
def _find_centroid(self, root):
self._dfs_size(root)
total = self.size[root]
u, p = root, -1
while True:
heavy = -1
for v in self.tree[u]:
if v == p or self.deleted[v]: continue
if self.size[v] > total // 2: heavy = v; break
if heavy == -1: return u
self.size[u] = total - self.size[heavy]
p, u = u, heavy
def _bfs_dist(self, c):
# Distances from centroid c to every alive node in its component
d = {c: 0}
q = deque([c])
while q:
u = q.popleft()
for v in self.tree[u]:
if v in d or self.deleted[v]: continue
d[v] = d[u] + 1
q.append(v)
return d
def _decompose(self, root, centroid_parent):
c = self._find_centroid(root)
self.parent[c] = centroid_parent
for u, dist in self._bfs_dist(c).items():
self.dist_to[u][c] = dist
self.deleted[c] = True
for v in self.tree[c]:
if not self.deleted[v]:
self._decompose(v, c)
def toggle_red(self, u):
self.is_red[u] = not self.is_red[u]
if self.is_red[u]:
cur = u
while cur != -1:
heapq.heappush(self.heap[cur], (self.dist_to[u][cur], u))
cur = self.parent[cur]
# (lazy deletion: heap entries are validated at query time)
def closest_red(self, u):
best = float('inf')
cur = u
while cur != -1:
# Drop stale heap entries
while self.heap[cur] and not self.is_red[self.heap[cur][0][1]]:
heapq.heappop(self.heap[cur])
if self.heap[cur]:
d_cu = self.dist_to[u][cur]
d_centroid = self.heap[cur][0][0]
best = min(best, d_cu + d_centroid)
cur = self.parent[cur]
return best if best < float('inf') else -1
Each toggle walks log N centroid ancestors and pushes log N heap entries. Each query also walks log N ancestors and at most pops a few stale entries amortised. Total per operation: O(log² N) with the lazy-delete trick.
Variants
Path-counting
Count pairs (u, v) whose tree distance equals k. For each centroid c, BFS to get distances from c to all alive nodes. Pair distances d_u + d_v = k contribute one count — implementable with a sorted multiset or a frequency array. Total: O(N log N).
Path-XOR queries
Replace distance with XOR of edge weights on the path. Centroid c sees all paths through itself; count pairs (u, v) with xor[u] ^ xor[v] = target using a hashmap. Same O(N log N).
Dynamic centroid decomposition
Edge insertions/deletions need link-cut trees or top trees — pure centroid decomposition is static. Approximate dynamic variants exist (rebuild every √N updates) but are rarely worth the complexity.
Centroid + segment tree per centroid
For ordered queries (k-th smallest distance), maintain a segment tree of distances at each centroid. Updates and queries become O(log² N).
Common bugs and edge cases
- Forgetting to mark deleted before recursing. The recursion must mark the centroid as deleted before diving into children, or DFS revisits it and the size DFS becomes nonsense.
- Subtree-size reuse across components. Re-running
dfsSizefor each component is mandatory; sizes computed before the deletion are stale for the post-deletion subgraph. - Centroid not found because of stale sizes. When walking toward the heaviest child, the walk needs current sizes — flip
size[u] = total − size[v]as you descend, or recompute from each subtree root. - Stack depth on chain-shaped trees. The size DFS can recurse 10⁵ deep on a linear input — convert to iterative BFS or raise the recursion limit.
- Distance arrays sized wrong. Each node stores one distance per centroid ancestor — at most log N per node, but the worst case is sometimes N for poor centroid choices. Use a hashmap, not a fixed array.
- Heap of stale entries growing unbounded. Lazy deletion on heaps requires validating the top entry against current state at every query; otherwise the heap leaks memory and slows queries.
Frequently asked questions
What is a tree centroid?
A centroid is a node whose removal leaves no subtree with more than ⌊N/2⌋ nodes. Every tree has one or two centroids — never more. You find one in O(N) by computing subtree sizes from any root and walking toward the heaviest child until the heaviest subtree drops to ≤ N/2.
Why does centroid decomposition give O(N log N) preprocessing?
Each recursion level processes the full N nodes (computing centroid, walking depths). Each split at least halves the component size, so the recursion is at most log N deep. N work × log N levels = O(N log N) total. The same shape as merge sort or median-of-medians.
How does centroid decomposition speed up tree distance queries?
Every pair of nodes (u, v) shares some centroid c in the centroid tree where the path u — v passes through c. Stored: for each node u, the list of (centroid, distance) pairs along its ancestor chain in the centroid tree — O(log N) entries. To query distance, find each centroid common to u and v, take min of d(u, c) + d(c, v). Total query work O(log N) — sometimes log² with extra updates.
What problems besides distance use centroid decomposition?
Counting paths in a tree whose total weight is divisible by k, finding the closest red node to each node with dynamic colour updates, the K-th smallest distance in a tree, and the path-with-specific-edge-count counting problem. Any problem where 'paths can be decomposed at a balanced midpoint' fits the pattern.
Is the centroid tree the same shape as the original tree?
No. The centroid tree is a balanced recursion structure: depth ≤ log N regardless of how skewed the original tree was. A 10⁵-node path graph (linear chain) has a centroid tree of depth ~17, even though the original tree has depth 10⁵. That depth flattening is the whole point.
How does centroid decomposition differ from heavy-light decomposition?
HLD partitions the tree into paths and uses segment trees for path queries. Centroid decomposition splits the tree at balanced cuts and answers distance/aggregate queries that don't fit a chain shape. HLD is great for path queries; centroid is great for counting pairs and distance metrics.