What is a tree centroid?

A centroid is a node whose removal leaves no subtree with more than ⌊N/2⌋ nodes. Every tree has one or two centroids — never more. You find one in O(N) by computing subtree sizes from any root and walking toward the heaviest child until the heaviest subtree drops to ≤ N/2.

Why does centroid decomposition give O(N log N) preprocessing?

Each recursion level processes the full N nodes (computing centroid, walking depths). Each split at least halves the component size, so the recursion is at most log N deep. N work × log N levels = O(N log N) total. The same shape as merge sort or median-of-medians.

How does centroid decomposition speed up tree distance queries?

Every pair of nodes (u, v) shares some centroid c in the centroid tree where the path u — v passes through c. Stored: for each node u, the list of (centroid, distance) pairs along its ancestor chain in the centroid tree — O(log N) entries. To query distance, find each centroid common to u and v, take min of d(u, c) + d(c, v). Total query work O(log N) — sometimes log² with extra updates.

What problems besides distance use centroid decomposition?

Counting paths in a tree whose total weight is divisible by k, finding the closest red node to each node with dynamic colour updates, the K-th smallest distance in a tree, and the path-with-specific-edge-count counting problem. Any problem where 'paths can be decomposed at a balanced midpoint' fits the pattern.

Is the centroid tree the same shape as the original tree?

No. The centroid tree is a balanced recursion structure: depth ≤ log N regardless of how skewed the original tree was. A 10⁵-node path graph (linear chain) has a centroid tree of depth ~17, even though the original tree has depth 10⁵. That depth flattening is the whole point.

How does centroid decomposition differ from heavy-light decomposition?

HLD partitions the tree into paths and uses segment trees for path queries. Centroid decomposition splits the tree at balanced cuts and answers distance/aggregate queries that don't fit a chain shape. HLD is great for path queries; centroid is great for counting pairs and distance metrics.

Centroid Decomposition — Tree Distance Queries in O(N log N)

How centroid decomposition works

Consider a tree of N nodes. A centroid is a node whose removal partitions the remaining tree into subtrees each of size ≤ ⌊N/2⌋. Every tree has at least one centroid (sometimes two — adjacent — when the tree is even in size). You find it by computing subtree sizes from an arbitrary root, then walking toward the heaviest child until every remaining subtree has size ≤ N/2.

Centroid decomposition runs as a recursion:

Find the centroid c of the current component.
Make c a node of the centroid tree.
Remove c. For each remaining component, recurse — the result becomes a child of c in the centroid tree.

The centroid tree has depth ≤ log₂ N because each recursive call at least halves component size. Crucially, the centroid tree is independent of the original tree's shape: a 10⁵-node linear path has a centroid tree of depth ~17, even though the original has depth 10⁵.

The key property: every path in the original tree passes through the centroid that becomes the lowest common ancestor of its endpoints in the centroid tree. That makes distance and counting problems decomposable — for each centroid c, look only at paths whose midpoint or LCA in the centroid tree is c.

Worked example — distance queries

Take a binary-tree-shaped tree of 15 nodes rooted at 1. The centroid sits somewhere near the middle — say node 4 — whose removal splits the tree into three roughly-7-node components. Inside each component find the local centroid (a node like 2 or 11), recurse, until every component has size 1.

For each node u and each of its centroid-tree ancestors c, precompute dist(u, c) in the original tree — at most log N entries per node. Total storage: O(N log N).

To answer "what's the closest red node to u", walk u's centroid-tree ancestors. At each ancestor c, look up the minimum dist(red, c) over all red nodes recorded at c, then add dist(u, c). Take the minimum across log N ancestors. Each query: O(log N) time, sometimes O(log² N) when updates are interleaved.

Numbers for N = 10⁵: 17 ancestors per node, 17 lookups per query. A red-update touches 17 ancestor records. Compare to brute force — O(N) BFS per query, 10⁵ ops × 10⁵ queries = 10¹⁰ ops, far too slow.

When to use centroid decomposition

Tree distance queries with updates — closest-red, k-th nearest, weight-conditioned shortest path.
Counting paths in a tree with a property — paths of length k, paths whose XOR equals a value, paths with sum divisible by k.
Divide and conquer on trees where the problem splits at any balanced cut — find pairs (u, v) satisfying some predicate.
Offline batched queries that benefit from the centroid-tree ancestor chain caching mechanism.
Augmented data structures over trees where path-shaped structures (HLD) don't apply.

Avoid it when the operation needs path-aggregate queries with point updates — HLD is faster and simpler. Avoid it when updates change tree structure — fall back to link-cut trees.

Centroid decomposition vs HLD vs Euler tour

	Centroid decomposition	Heavy-light decomposition	Euler tour + segment tree
Path query	Limited / awkward	O(log² N)	No native support
Distance query	O(log N) typical	O(log² N) with extra work	O(log N) for LCA + O(1) depth
Counting paths-with-property	O(N log² N) total	Awkward	Awkward
Subtree query	Awkward	O(log N)	O(log N)
Build	O(N log N)	O(N)	O(N)
Memory	O(N log N)	O(N)	O(N)
Code length	~80 lines	~80 lines	~50 lines

The three are complementary. Path query: HLD. Subtree query: Euler tour. Distance and counting: centroid.

Pseudo-code

subtreeSize[v] = size of v's subtree (re-computed per component)
deleted[v] = whether v has been removed

function dfsSize(u, parent):
    subtreeSize[u] = 1
    for v in children(u):
        if v == parent or deleted[v]: continue
        subtreeSize[u] += dfsSize(v, u)
    return subtreeSize[u]

function findCentroid(u, parent, totalSize):
    for v in children(u):
        if v == parent or deleted[v]: continue
        if subtreeSize[v] > totalSize / 2:
            return findCentroid(v, u, totalSize)
    return u

function decompose(u, centroidParent):
    totalSize = dfsSize(u, -1)
    c = findCentroid(u, -1, totalSize)
    centroidTreeParent[c] = centroidParent
    deleted[c] = true
    for v in neighbours(c):
        if not deleted[v]:
            decompose(v, c)

JavaScript implementation

class CentroidDecomp {
  constructor(tree) {
    this.n = tree.length;
    this.tree = tree;
    this.size = new Int32Array(this.n);
    this.deleted = new Uint8Array(this.n);
    this.parent = new Int32Array(this.n).fill(-1);  // centroid-tree parent
    this.decompose(0, -1);
  }

  dfsSize(u, p) {
    this.size[u] = 1;
    for (const v of this.tree[u]) {
      if (v === p || this.deleted[v]) continue;
      this.dfsSize(v, u);
      this.size[u] += this.size[v];
    }
  }

  findCentroid(u, p, total) {
    for (const v of this.tree[u]) {
      if (v === p || this.deleted[v]) continue;
      if (this.size[v] > total / 2) {
        // re-orient size for the walk
        this.size[u] = total - this.size[v];
        return this.findCentroid(v, u, total);
      }
    }
    return u;
  }

  decompose(u, centroidParent) {
    this.dfsSize(u, -1);
    const total = this.size[u];
    const c = this.findCentroid(u, -1, total);
    this.parent[c] = centroidParent;
    this.deleted[c] = 1;
    for (const v of this.tree[c]) {
      if (!this.deleted[v]) this.decompose(v, c);
    }
  }

  ancestors(u) {
    const path = [];
    while (u !== -1) { path.push(u); u = this.parent[u]; }
    return path;  // root of centroid tree last
  }
}

// Example
const tree = [[1,2], [0,3,4], [0,5,6], [1], [1], [2], [2]];
const cd = new CentroidDecomp(tree);
console.log(cd.parent);            // centroid-tree parents
console.log(cd.ancestors(3));      // path 3 → ... → root

Python implementation — closest-red query

Combine the centroid tree with per-centroid sorted distance lists, and you can answer "closest red node to u" with point updates. Each centroid keeps a min-heap of distances to red nodes recorded under it; query and update walk u's O(log N) centroid ancestors.

import heapq
from collections import defaultdict, deque

class ClosestRed:
    def __init__(self, tree):
        self.n = len(tree)
        self.tree = tree
        self.size = [0] * self.n
        self.deleted = [False] * self.n
        self.parent = [-1] * self.n
        # For each node u, distance to each centroid ancestor
        self.dist_to = [dict() for _ in range(self.n)]
        # For each centroid c, heap of (dist, node) for red nodes
        self.heap = defaultdict(list)
        # For each centroid c, current "alive" status of red node counts
        self.is_red = [False] * self.n
        self._decompose(0, -1)

    def _dfs_size(self, root):
        order = []
        stack = [(root, -1)]
        while stack:
            u, p = stack.pop()
            self.size[u] = 1
            order.append((u, p))
            for v in self.tree[u]:
                if v == p or self.deleted[v]: continue
                stack.append((v, u))
        for u, p in reversed(order):
            if p != -1 and not self.deleted[p]:
                self.size[p] += self.size[u]

    def _find_centroid(self, root):
        self._dfs_size(root)
        total = self.size[root]
        u, p = root, -1
        while True:
            heavy = -1
            for v in self.tree[u]:
                if v == p or self.deleted[v]: continue
                if self.size[v] > total // 2: heavy = v; break
            if heavy == -1: return u
            self.size[u] = total - self.size[heavy]
            p, u = u, heavy

    def _bfs_dist(self, c):
        # Distances from centroid c to every alive node in its component
        d = {c: 0}
        q = deque([c])
        while q:
            u = q.popleft()
            for v in self.tree[u]:
                if v in d or self.deleted[v]: continue
                d[v] = d[u] + 1
                q.append(v)
        return d

    def _decompose(self, root, centroid_parent):
        c = self._find_centroid(root)
        self.parent[c] = centroid_parent
        for u, dist in self._bfs_dist(c).items():
            self.dist_to[u][c] = dist
        self.deleted[c] = True
        for v in self.tree[c]:
            if not self.deleted[v]:
                self._decompose(v, c)

    def toggle_red(self, u):
        self.is_red[u] = not self.is_red[u]
        if self.is_red[u]:
            cur = u
            while cur != -1:
                heapq.heappush(self.heap[cur], (self.dist_to[u][cur], u))
                cur = self.parent[cur]
        # (lazy deletion: heap entries are validated at query time)

    def closest_red(self, u):
        best = float('inf')
        cur = u
        while cur != -1:
            # Drop stale heap entries
            while self.heap[cur] and not self.is_red[self.heap[cur][0][1]]:
                heapq.heappop(self.heap[cur])
            if self.heap[cur]:
                d_cu = self.dist_to[u][cur]
                d_centroid = self.heap[cur][0][0]
                best = min(best, d_cu + d_centroid)
            cur = self.parent[cur]
        return best if best < float('inf') else -1

Each toggle walks log N centroid ancestors and pushes log N heap entries. Each query also walks log N ancestors and at most pops a few stale entries amortised. Total per operation: O(log² N) with the lazy-delete trick.

Variants

Path-counting

Count pairs (u, v) whose tree distance equals k. For each centroid c, BFS to get distances from c to all alive nodes. Pair distances d_u + d_v = k contribute one count — implementable with a sorted multiset or a frequency array. Total: O(N log N).

Path-XOR queries

Replace distance with XOR of edge weights on the path. Centroid c sees all paths through itself; count pairs (u, v) with xor[u] ^ xor[v] = target using a hashmap. Same O(N log N).

Dynamic centroid decomposition

Edge insertions/deletions need link-cut trees or top trees — pure centroid decomposition is static. Approximate dynamic variants exist (rebuild every √N updates) but are rarely worth the complexity.

Centroid + segment tree per centroid

For ordered queries (k-th smallest distance), maintain a segment tree of distances at each centroid. Updates and queries become O(log² N).

Common bugs and edge cases

Forgetting to mark deleted before recursing. The recursion must mark the centroid as deleted before diving into children, or DFS revisits it and the size DFS becomes nonsense.
Subtree-size reuse across components. Re-running dfsSize for each component is mandatory; sizes computed before the deletion are stale for the post-deletion subgraph.
Centroid not found because of stale sizes. When walking toward the heaviest child, the walk needs current sizes — flip size[u] = total − size[v] as you descend, or recompute from each subtree root.
Stack depth on chain-shaped trees. The size DFS can recurse 10⁵ deep on a linear input — convert to iterative BFS or raise the recursion limit.
Distance arrays sized wrong. Each node stores one distance per centroid ancestor — at most log N per node, but the worst case is sometimes N for poor centroid choices. Use a hashmap, not a fixed array.
Heap of stale entries growing unbounded. Lazy deletion on heaps requires validating the top entry against current state at every query; otherwise the heap leaks memory and slows queries.

Centroid Decomposition

Interactive visualization

Watch the 60-second explainer

How centroid decomposition works

Worked example — distance queries

When to use centroid decomposition

Centroid decomposition vs HLD vs Euler tour

Pseudo-code

JavaScript implementation

Python implementation — closest-red query

Variants

Path-counting

Path-XOR queries

Dynamic centroid decomposition

Centroid + segment tree per centroid

Common bugs and edge cases

Frequently asked questions

Interactive visualization

Watch the 60-second explainer

How centroid decomposition works

Worked example — distance queries

When to use centroid decomposition

Centroid decomposition vs HLD vs Euler tour

Pseudo-code

JavaScript implementation

Python implementation — closest-red query

Variants

Path-counting

Path-XOR queries

Dynamic centroid decomposition

Centroid + segment tree per centroid

Common bugs and edge cases

Frequently asked questions

Related concepts