Data Structures

Persistent Segment Tree

Q: What problems does a persistent segment tree solve?

The classic application is 'k-th smallest element in range [l, r]'. Build a persistent segment tree where version i is the histogram of arr[0..i] — each leaf counts occurrences of a value. Then version r minus version l-1 (subtree-wise) gives the histogram of arr[l..r], and you find the k-th smallest by walking down: descend left if the left-subtree count >= k, else descend right with k decremented. O(log n) per query. Other applications: count distinct in range, find the smallest value >= x in range, offline interval-graph algorithms, version-control inspection.

Every version of the array survives — query any past state in O(log n)

A persistent segment tree keeps every historical version of the underlying array available for query. Each update creates only O(log n) new nodes (one per level on the path from root to the modified leaf), sharing the rest with the previous version.

Update costO(log n) new nodes
Query any versionO(log n)
Space after q updatesO(n + q log n)
Techniquepath copying
Classic problemk-th smallest in range
Foundational paperDriscoll et al. 1989

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why a persistent segment tree

A regular segment tree answers range-aggregate queries (sum, min, max, gcd) and point updates in O(log n), but it has only one state. After you call update(i, v), the previous value at index i is gone — you can't query the array as it was before the update. For many algorithmic problems, that's exactly what you need.

The classic example is "k-th smallest element in range [l, r]". You can't solve this with a regular segment tree without re-processing the entire range each query. But if you build a sequence of persistent segment trees — version i is the histogram of arr[0..i] — then the range [l, r] is just version r minus version l−1, and you can walk the difference tree in O(log n) to find the k-th smallest. The problem becomes tractable on a million-element array.

The price: O(log n) extra memory per update. For n = 10⁵ and q = 10⁵ updates that's ~1.7 million extra nodes, about 25-30 MB. Acceptable for competitive programming. For production workloads with millions of versions you usually need garbage collection to reclaim old roots, but the algorithmic core is unchanged.

Path copying — the one trick

Every persistent data structure built on trees uses the same trick: when an update modifies a node, allocate a new copy of that node and every node above it on the path to the root. Don't touch any other node — its child pointers are still valid. The new root is the head of the new version. The old root still points at the old (unchanged) structure.

For a balanced binary tree of n leaves, the path from root to any leaf has exactly ⌈log₂ n⌉ + 1 nodes. So each update allocates O(log n) new nodes. For n = 10⁶, that's 20 nodes — about 320 bytes per update. Compare to copying the entire tree (~30 MB per update) and you see why structural sharing is essential.

function update(node, l, r, idx, value):
    new_node = new Node()
    new_node.left  = node.left
    new_node.right = node.right
    if l == r:
        new_node.aggregate = value
        return new_node
    mid = (l + r) / 2
    if idx <= mid:
        new_node.left = update(node.left, l, mid, idx, value)
    else:
        new_node.right = update(node.right, mid+1, r, idx, value)
    new_node.aggregate = combine(new_node.left.aggregate, new_node.right.aggregate)
    return new_node

Read this carefully. We allocate new_node and initialize both children to point at the original node's children — structural sharing. Then we recurse on the side that contains idx, overwriting that child pointer with the new path. Aggregate is recomputed from the (possibly new) children. The recursion bottoms out when l == r and we set the leaf's value.

The textbook application: k-th smallest in range

You're given an array of n integers and q queries of the form "in arr[l..r], what's the k-th smallest value?" Naive solution: O(q · (r − l) · log(r − l)) by sorting. Persistent segment tree solution: O(n log n) preprocessing, O(log n) per query.

Build phase. For each i in 0..n−1, build version i as a histogram of values in arr[0..i]. Specifically, version i is a segment tree indexed by value (range V = [min, max]) where each leaf counts occurrences. To create version i+1, start from version i and increment the leaf for arr[i+1] by 1 (one persistent update, O(log V) new nodes).

Query phase. For a query (l, r, k), walk versions r and l−1 simultaneously. At each level, the count in [l..r] is leftCount(version r) − leftCount(version l−1). If that count ≥ k, descend left; otherwise subtract it from k and descend right. Bottom out at a leaf — that's the k-th smallest value.

struct Node {
    int count;
    Node *left, *right;
    Node(int c = 0) : count(c), left(nullptr), right(nullptr) {}
};

Node* build(int l, int r) {
    Node* node = new Node();
    if (l == r) return node;
    int mid = (l + r) / 2;
    node->left  = build(l, mid);
    node->right = build(mid + 1, r);
    return node;
}

Node* update(Node* prev, int l, int r, int idx) {
    Node* node = new Node(prev->count + 1);
    if (l == r) return node;
    int mid = (l + r) / 2;
    if (idx <= mid) {
        node->left  = update(prev->left, l, mid, idx);
        node->right = prev->right;
    } else {
        node->left  = prev->left;
        node->right = update(prev->right, mid + 1, r, idx);
    }
    return node;
}

int kth(Node* u, Node* v, int l, int r, int k) {
    if (l == r) return l;
    int leftCount = v->left->count - u->left->count;
    int mid = (l + r) / 2;
    if (k <= leftCount) return kth(u->left, v->left, l, mid, k);
    return kth(u->right, v->right, mid + 1, r, k - leftCount);
}

This is the SPOJ "MKTHNUM" problem solution in 30 lines. Same template solves "count distinct in [l, r]" and many offline-query problems.

Version control over arrays

Beyond the k-th-smallest pattern, persistent segment trees serve as version-controlled arrays. You can hold a sequence of versions in memory, jump between them in O(1) (just keep an array of roots), and answer range queries on any of them in O(log n). Common usages:

Undo systems. Each user edit produces a new version. The undo stack is an array of roots. Constant-time jump to any past state.
Snapshotting databases. Every committed transaction is a new persistent root. Older snapshots remain queryable without locking the active tree.
Functional algorithmic programming. ML and Haskell libraries that need range queries on immutable arrays use persistent segment trees as their internal representation.
Offline range-query batching. Sort queries by right endpoint, build a persistent tree on the fly, answer queries as you go. Reduces some O(n √q) problems to O((n + q) log n).

Persistent segment tree vs alternatives

	Persistent seg tree	Regular seg tree	Wavelet tree	Sqrt-decomposition
Range aggregate query	O(log n) any version	O(log n) current only	O(log V) any range	O(√n)
Point update	O(log n) new nodes	O(log n) in-place	No	O(1)
k-th smallest in range	O(log n)	No	O(log V)	O(√n log n)
Versioned queries	Yes, any past version	No	Static only	Block snapshots
Memory after q updates	O(n + q log n)	O(n)	O(n log V) static	O(n)
Code length	~60 lines	~40 lines	~80 lines	~25 lines
Best for	k-th, versioned, offline	Single mutable state	Static rank/select	Weird ops, small n

Python — minimal persistent segment tree

class Node:
    __slots__ = ('count', 'left', 'right')
    def __init__(self, count=0, left=None, right=None):
        self.count = count
        self.left = left
        self.right = right

def build(l, r):
    if l == r: return Node()
    mid = (l + r) // 2
    return Node(0, build(l, mid), build(mid + 1, r))

def update(prev, l, r, idx):
    if l == r:
        return Node(prev.count + 1)
    mid = (l + r) // 2
    if idx <= mid:
        return Node(prev.count + 1, update(prev.left, l, mid, idx), prev.right)
    else:
        return Node(prev.count + 1, prev.left, update(prev.right, mid + 1, r, idx))

def kth_smallest(u, v, l, r, k):
    if l == r: return l
    left_count = v.left.count - u.left.count
    mid = (l + r) // 2
    if k <= left_count:
        return kth_smallest(u.left, v.left, l, mid, k)
    return kth_smallest(u.right, v.right, mid + 1, r, k - left_count)

# Build versions: roots[i] = persistent segment tree after inserting arr[0..i]
arr = [3, 1, 4, 1, 5, 9, 2, 6]
V_MAX = max(arr)
roots = [build(0, V_MAX)]
for x in arr:
    roots.append(update(roots[-1], 0, V_MAX, x))

# Query: 3rd smallest in arr[2..6] (0-indexed). Expected: sorted(arr[2:7]) = [1,2,4,5,9], 3rd = 4
ans = kth_smallest(roots[2], roots[7], 0, V_MAX, 3)
print(f"3rd smallest in arr[2..6] = {ans}")   # 4

Variants and refinements

Persistent segment tree with lazy propagation

Range updates with lazy tags in a persistent setting are subtle. Each push-down must allocate new nodes for both children. The amortized analysis still gives O(log n) new nodes per range update, but the constant factor is higher and the code is harder. Most competitive programmers avoid persistent lazy unless absolutely required.

Implicit persistent segment tree

For huge value ranges (e.g., values up to 10⁹), don't pre-allocate the entire 4·V skeleton. Allocate nodes on demand: when you first descend into a sub-range, create the node only then. Memory becomes O(q log V) instead of O(n + q log V) — much friendlier when n << V.

Persistent treap (implicit key)

A sister structure for sequence operations: insert/delete at arbitrary index, reverse range, all in O(log n) expected and persistent. Treap nodes get path-copied like segment tree nodes. The split-merge primitives compose cleanly with persistence.

Wavelet tree

An alternative to persistent segment trees for static rank/select problems. Solves k-th-smallest-in-range without updates. Less general but smaller constant factor.

Common pitfalls

Forgetting to share unchanged children. The whole point of persistence is structural sharing. If you accidentally copy both subtrees instead of just the path, memory blows from O(log n) to O(n) per update.
Aliasing bugs in pointer-heavy code. Pointer-based implementations need careful constructor semantics — every Node() must initialize both child pointers to the parent's previous children unless this update modifies that subtree. Easy bug: leaving one child as nullptr by accident.
Memory pressure from too many versions. A million versions × 20 new nodes × 16 bytes = 320 MB. Pre-allocate a node pool; consider GC for production deployments.
Subtracting versions of different shapes. The k-th-smallest trick assumes both trees have the same shape (same domain). When you build versions incrementally that's automatic, but if you build versions from independent updates it's not — verify the invariant.
Recursion stack depth. For n = 10⁵, recursion depth is ~17 — safe. For n = 10⁶, depth is ~20 — still safe. For n > 10⁸, consider iterative implementations.
Confusing partial vs full persistence. A persistent segment tree built this way is fully persistent — you can update any version, not just the latest. The data structure doesn't care which root you pass in.

Frequently asked questions

How does path copying work in a persistent segment tree?

To update an element, walk from the root to the leaf representing that index. At each node on the path, allocate a new node that's a copy of the original — but with one child pointer rewritten to point at the new path. The other child still points at the original (shared) subtree. The new root is the head of the new version. Old root remains valid and points at the unchanged old tree. For a segment tree of n leaves, the path has exactly ceil(log2 n) nodes, so each update creates O(log n) new nodes — about 20 for n = 1 million.

How much memory does a persistent segment tree use?

The initial tree is O(n) nodes — usually allocated as 4n for safety. Each of the q updates adds O(log n) new nodes for the new path. So after q updates total memory is O(n + q log n). For n = 100,000 and q = 100,000, that's roughly 100,000 + 100,000 * 17 = 1.8 million nodes. Each node typically takes 12-16 bytes (two child pointers plus aggregate value), so the whole structure fits in ~30 MB. Manageable for competitive programming; for production workloads with millions of versions, you usually need garbage collection to reclaim old versions.

What problems does a persistent segment tree solve?

The classic application is "k-th smallest element in range [l, r]". Build a persistent segment tree where version i is the histogram of arr[0..i] — each leaf counts occurrences of a value. Then version r minus version l-1 (subtree-wise) gives the histogram of arr[l..r], and you find the k-th smallest by walking down: descend left if the left-subtree count >= k, else descend right with k decremented. O(log n) per query. Other applications: count distinct in range, find the smallest value >= x in range, offline interval-graph algorithms, version-control inspection.

How is a persistent segment tree different from a regular one?

A regular segment tree has a single mutable structure. Updates overwrite nodes in place. You can answer queries on the current state but not on past states. A persistent segment tree never overwrites — every update returns a new root for the new version, and old roots remain valid. You can query any past version, mix updates from any base version (full persistence), or even merge versions (confluent persistence). The trade-off: O(log n) extra memory per update versus O(1) for in-place mutation, and slightly more code (~20 lines extra).

Is a persistent segment tree fast enough for production?

For competitive programming and most algorithmic workloads, yes — typical updates and queries run in 1-5 microseconds. The constant factor is 3-4x a non-persistent segment tree because of allocation overhead. For high-throughput production systems (databases, time-series stores), purpose-built persistent structures like LSM-trees or copy-on-write B-trees are usually better. Persistent segment trees shine when you need exact O(log n) range queries on arbitrary historical versions — niche, but invaluable when needed.

How do you garbage-collect old versions?

In a managed language (Java, Python, JavaScript) the runtime GC handles it automatically — when no reference to an old root remains, the entire chain of nodes reachable only from that root is collected. In a manual-memory language (C++) you typically use reference counting per node: each node tracks how many parents reference it; when the count drops to zero the node is freed and its children's counts decrement. Reference counting plays well with the path-copying pattern because shared subtrees are referenced by every version that contains them.