Data Structures

Interval Tree

Augmented BST for "what overlaps this range?"

An interval tree is a balanced binary search tree augmented with each subtree's maximum endpoint. It finds every interval overlapping a query range in O(log n + k) time, where k is the number of overlapping intervals — the data structure behind calendar conflict checks, genomic range queries, and packet flow timeouts.

  • InsertO(log n)
  • DeleteO(log n)
  • Overlap queryO(log n + k)
  • Stabbing queryO(log n + k)
  • SpaceO(n)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How an interval tree works

You're booking a meeting room from 14:00 to 15:30. The room already has 200 reservations. Does your slot conflict with any of them? Brute-force checking is O(n). An interval tree answers in O(log n + k) — logarithmic descent, plus only the time needed to read out the actual conflicts.

The structure is simple in shape, clever in augmentation. Each node holds:

  • An interval [low, high].
  • The standard BST keys (left and right children, balanced via red-black or AVL).
  • A max field: the largest high value across all intervals in this node's subtree.

The tree is keyed on the low endpoints, so it's an ordered BST under that comparator. The augmentation — max — is what lets you prune subtrees during overlap queries.

To find any interval overlapping the query [qLow, qHigh]:

  1. Start at the root.
  2. If the current node's interval overlaps the query (i.e., node.low ≤ qHigh AND node.high ≥ qLow), report it.
  3. If the left child exists and its max ≥ qLow, descend left — some interval there might extend right enough to overlap. Otherwise, prune the entire left subtree.
  4. Always also descend right (unless the current node's low > qHigh, in which case the right subtree is too far right to matter either).

The max field is the difference between O(n) and O(log n + k). Without it, you'd have to scan every node.

When to use an interval tree

  • Calendar conflict detection. "Show me every meeting between 9 AM and 10 AM Tuesday." Every modern calendar app uses some flavor of interval tree or interval skip list internally.
  • Genomic range queries. "Find every annotated gene that overlaps chromosome 3 positions 12,000,000–12,001,000." BEDTools, samtools, and IGV all rely on interval trees.
  • Network flow tables. Packet routing rules with port-range matches; the kernel checks "does this packet's destination port fall in any active range?"
  • Real-time scheduling. Deadline-monotonic schedulers want fast "what tasks are runnable in this quantum?" lookups — a stabbing query at now.
  • Geometry beyond 1D. Window queries on 2D rectangles via segment-tree-of-interval-trees, or higher-dim data via R-trees layered on interval trees.

Interval tree vs segment tree vs related structures

Interval treeSegment treeRange treeSweep + sorted list
StoresExplicit intervalsPre-built grid over coordinate rangeSorted-by-coordinate pointsSorted endpoints
InsertO(log n)O(log n) on fixed structureHard — usually batch-buildO(log n)
Overlap queryO(log n + k)O(log n) for count or aggregateO(log² n + k)O(log n + k) with range scan
Aggregate (sum/min over range)Not nativeNative — its specialtyNot nativeNot native
MemoryO(n)O(n) on coordinate range, often 4nO(n log n) for 2DO(n)
Best forSparse, dynamic intervalsStatic range with aggregate queriesMultidim range countOffline 1D problems
Static or dynamicFully dynamicStatic structure, dynamic valuesMostly staticEasy to update

Confusingly, "segment tree" sometimes refers to the interval-tree-like structure de Berg's textbook describes. In competitive programming "segment tree" almost always means the 4n-node aggregate tree. Always check which one your library or paper means.

JavaScript implementation — schedule conflict detection

Below is a working interval tree using a simple BST as the base (in production use red-black or AVL for guaranteed log-time updates). The findConflicts method demonstrates the pruning trick.

class IntervalNode {
  constructor(low, high, payload) {
    this.low = low;
    this.high = high;
    this.payload = payload;
    this.max = high;        // augmentation
    this.left = null;
    this.right = null;
  }
}

class IntervalTree {
  constructor() { this.root = null; }

  static overlap(a, b) {
    return a.low <= b.high && b.low <= a.high;
  }

  insert(low, high, payload) {
    const node = new IntervalNode(low, high, payload);
    this.root = this._insert(this.root, node);
  }

  _insert(root, node) {
    if (root === null) return node;
    if (node.low < root.low) root.left  = this._insert(root.left,  node);
    else                     root.right = this._insert(root.right, node);
    if (node.high > root.max) root.max = node.high;   // maintain augmentation
    return root;
  }

  // Returns every interval that overlaps the query range.
  findConflicts(qLow, qHigh) {
    const results = [];
    this._search(this.root, qLow, qHigh, results);
    return results;
  }

  _search(node, qLow, qHigh, out) {
    if (node === null) return;
    if (node.low <= qHigh && node.high >= qLow) out.push(node);
    // PRUNE: if left subtree's max-high < qLow, no interval there can overlap.
    if (node.left !== null && node.left.max >= qLow) {
      this._search(node.left, qLow, qHigh, out);
    }
    // Right subtree: only worth visiting if the query extends to the right
    // of the current node's low.
    if (qHigh >= node.low) {
      this._search(node.right, qLow, qHigh, out);
    }
  }

  // Boolean shortcut: any conflict at all?
  hasConflict(qLow, qHigh) {
    return this._anyOverlap(this.root, qLow, qHigh);
  }

  _anyOverlap(node, qLow, qHigh) {
    if (node === null) return false;
    if (node.low <= qHigh && node.high >= qLow) return true;
    if (node.left !== null && node.left.max >= qLow) {
      return this._anyOverlap(node.left, qLow, qHigh);
    }
    return this._anyOverlap(node.right, qLow, qHigh);
  }
}

// Usage: meeting-room booking conflict detection.
const calendar = new IntervalTree();
calendar.insert(900,  1030, "Standup");
calendar.insert(1100, 1200, "1:1 with PM");
calendar.insert(1330, 1500, "Design review");
calendar.insert(1530, 1630, "Customer call");

console.log(calendar.hasConflict(1400, 1430));
// true — overlaps "Design review"

console.log(calendar.findConflicts(1000, 1200).map(n => n.payload));
// ["Standup", "1:1 with PM"]

The single-line node.left.max >= qLow check is the entire reason this is faster than a linear scan. It says: "If the deepest-reaching interval in the left subtree still doesn't reach my query, don't bother going there."

Python implementation

class IntervalNode:
    __slots__ = ('low', 'high', 'payload', 'max', 'left', 'right')
    def __init__(self, low, high, payload=None):
        self.low, self.high, self.payload = low, high, payload
        self.max = high
        self.left = self.right = None

class IntervalTree:
    def __init__(self):
        self.root = None

    def insert(self, low, high, payload=None):
        node = IntervalNode(low, high, payload)
        self.root = self._insert(self.root, node)

    def _insert(self, root, node):
        if root is None:
            return node
        if node.low < root.low:
            root.left = self._insert(root.left, node)
        else:
            root.right = self._insert(root.right, node)
        if node.high > root.max:
            root.max = node.high
        return root

    def find_conflicts(self, q_low, q_high):
        out = []
        self._search(self.root, q_low, q_high, out)
        return out

    def _search(self, node, q_low, q_high, out):
        if node is None:
            return
        if node.low <= q_high and node.high >= q_low:
            out.append(node)
        if node.left is not None and node.left.max >= q_low:
            self._search(node.left, q_low, q_high, out)
        if q_high >= node.low:
            self._search(node.right, q_low, q_high, out)

    def has_conflict(self, q_low, q_high):
        return self._any_overlap(self.root, q_low, q_high)

    def _any_overlap(self, node, q_low, q_high):
        if node is None:
            return False
        if node.low <= q_high and node.high >= q_low:
            return True
        if node.left is not None and node.left.max >= q_low:
            return self._any_overlap(node.left, q_low, q_high)
        return self._any_overlap(node.right, q_low, q_high)

# Real-world: SAM/BAM read pile-up over genomic regions.
genes = IntervalTree()
genes.insert(12000000, 12015000, "BRCA1")
genes.insert(12018000, 12030000, "BRCA1-AS1")
genes.insert(12100000, 12150000, "NBR2")
hits = genes.find_conflicts(12012000, 12020000)
print([g.payload for g in hits])   # ['BRCA1', 'BRCA1-AS1']

For Python, the intervaltree PyPI package wraps a red-black-balanced version with a friendlier API; banyan.SortedDict with a "max" augmentation also works. For genomics, Quinlan and Hall's BEDTools handles billions of intervals using a forest of interval trees per chromosome.

Variants

  • Red-black-augmented interval tree. The standard production form — a red-black tree where each node carries the max field, updated during rotations. Guarantees O(log n) inserts and deletes.
  • Centered interval tree. Pick a center point and split intervals into "left of center", "right of center", and "spanning center". The spanning set is stored at the node, sorted by both endpoints. Often simpler to teach but tricky to keep balanced under updates.
  • Interval skip list. Replace the BST with a skip list of low endpoints, augmented similarly. Slightly slower in cache terms but trivially concurrent — used in some DB engines.
  • Persistent interval tree. Use path-copy persistence so each modification yields a new version sharing structure with the old. Useful for snapshots, MVCC, and undo systems.
  • Stabbing-only structures. If you only need stabbing queries (point in interval) and not range overlap, an interval array sorted by low with binary search and a parallel maxRight prefix-array is simpler and faster in cache.
  • 2D / multi-dim extensions. Segment-tree-of-interval-trees handles 2D rectangle stabbing. R-trees generalize the bounding-volume idea to k dimensions for spatial DBs (PostGIS, MongoDB geo).

Common bugs and edge cases

  • Overlap vs containment confusion. The standard predicate a.low ≤ b.high AND a.high ≥ b.low matches any overlap, including pure containment. If you want only "partial overlap, not containment", filter the results — the tree itself can't distinguish.
  • Forgetting to update max on rotation. A red-black rotation rearranges three subtrees; the max of the rotated node and its replacement both need recomputation. Skipping this silently breaks pruning, with no runtime error.
  • Half-open vs closed intervals. Calendar slots [09:00, 10:00) and meetings [10:00, 11:00) do not overlap; with closed intervals they do. Pick a convention and apply it consistently in the comparison.
  • Equal endpoints across many intervals. If hundreds of meetings start at 09:00 sharp, BST insert order matters for balance. Always use a self-balancing tree, never a plain BST.
  • Treating the max field as static. Deleting the deepest-reaching interval in a subtree should reduce the parent's max down to whatever's now deepest. A common bug is to update max only on insert, leaving stale values that hurt pruning quality but don't cause wrong answers.
  • Returning a single interval when many overlap. The boolean hasConflict answer is a hint, not the full result — it might short-circuit on the first hit. Always use findConflicts when you need to display a list.

Frequently asked questions

Why O(log n + k) and not O(log n)?

An interval-overlap query can return many intervals (k of them), and you must visit each one to report it. The 'log n' is the descent cost; the '+ k' is the unavoidable cost of producing the output. If you only need the count, augment the tree with subtree size and answer in pure O(log n).

What's the difference between an interval tree and a segment tree?

Interval trees store n explicit intervals and answer 'which intervals overlap [a, b]?' in O(log n + k). Segment trees pre-discretize a 1D coordinate range into 4n nodes and answer aggregate questions like 'how many intervals cover this point?' or 'min/sum over [a, b]?' in O(log n). Interval trees are sparse and dynamic; segment trees are dense and great for range aggregates.

How does the max-endpoint augmentation work?

Each node stores the maximum 'high' endpoint of any interval in its subtree. During an overlap query, when descending, you can prune the entire left subtree if its max-high is less than the query's low — no interval there could possibly extend far enough to overlap. That's the trick that gets O(log n + k).

Where are interval trees used in production?

Calendar applications (Google Calendar, Outlook conflict checks), DNA range queries (BEDTools, htslib for bioinformatics), DBMS internal scheduling, network packet flow timeouts, video editor track collision, and OS deadline schedulers. Anywhere you ask 'does this range conflict with anything I've already booked?'

What if intervals only contain other intervals — never crossing?

Standard interval trees handle overlap as a single binary predicate; they don't distinguish 'A is contained in B' from 'A and B partially overlap'. If you need containment specifically (e.g., XML/JSON nesting), augment by storing both endpoints sortedly or use a nested-interval tree. The base structure still works — you just filter the result.

Can interval trees handle stabbing queries?

Yes — a stabbing query (find all intervals containing point p) is just an overlap query against the degenerate interval [p, p]. The standard algorithm handles it directly in O(log n + k).