What's the difference between a B-tree and a B+ tree?

In a B-tree, every node stores both keys and the associated values. In a B+ tree, internal nodes are pure search index — only the leaves hold values. The leaves are also linked left-to-right, which makes range scans dramatically faster.

Why do databases use B+ trees instead of hash tables?

Hash tables can't answer range queries (WHERE age BETWEEN 30 AND 40), and they don't keep data sorted on disk for sequential reads. B+ trees give O(log n) point lookups and O(k) range scans on linked leaves — both essential for SQL workloads.

What does fanout mean and why does it matter?

Fanout is how many children each internal node has. With 4 KB pages and 16-byte keys, fanout sits around 250. A 4-level tree fits 250⁴ ≈ 4 billion keys; in practice 256 million entries fit in 4 levels because nodes don't always fill completely.

How does insertion split a leaf?

When a leaf overflows past its capacity, it splits in half: the median key is copied (not moved) up to the parent as a separator, and the leaves are linked through the sibling pointer. If the parent overflows, it splits too — the median there is moved up, since internal nodes don't hold values.

Are B+ trees still used in modern systems?

Yes — InnoDB (MySQL), PostgreSQL's btree access method, SQLite, SQL Server, Oracle, LMDB, and BoltDB all use B+ trees. The challengers are LSM trees (RocksDB, Cassandra) for write-heavy workloads and log-structured indexes for SSDs.

Why are leaves linked into a list?

So a range scan can walk from one leaf to the next without climbing back up to the parent. After locating the start of the range with O(log n) descent, scanning k consecutive entries is O(k) — basically a linked-list traversal across leaves.

B+ Tree — How It Works, Splits, Range Scans & Code

How a B+ tree works

A B+ tree is a generalisation of the binary search tree where each node has many children instead of two. The shape is shallow and wide — a 4-level tree with fanout 250 indexes about 250⁴ ≈ 4 billion keys. Reading any key from disk costs at most 4 page reads.

Two structural rules separate B+ from B-tree:

Internal nodes are pure index. They hold separator keys and child pointers. No row data lives in them.
Leaves hold all the data, and they're linked. Each leaf has a pointer to its right sibling. A range scan descends once, then walks leaves left-to-right.

            [50 | 90]                ← internal: separators only
            /    |    \
   [10 30] [60 70 80] [95 99]        ← leaves: actual rows
   ↔----↔  ↔-------↔  ↔----↔         ← linked siblings

To find a key, walk down comparing against separators. To range-scan from key 25 to 95, descend to the leaf containing 25, then follow sibling pointers until you pass 95. No backtracking up the tree — sequential disk reads, prefetcher-friendly, the SQL BETWEEN operator's best friend.

Why this shape, on disk

The cost model on disk is dominated by page reads, not comparisons. A typical 4 KB or 8 KB page holds hundreds of (key, pointer) entries — call that the fanout b. The number of page reads to find a key is log_b n, which is tiny when b is in the hundreds:

Keys	Tree levels (b = 250)
1,000	2
62,500	3
15,600,000	4
4,000,000,000	5

The root is almost always cached in RAM, so a 4-level tree often costs just 3 disk reads — and the bottom one is the only one that touches actual data. That's why "the index fits in memory but the data doesn't" is the normal operating regime.

When B+ trees shine

Range queries — between, less-than, greater-than, ORDER BY with LIMIT.
Mixed read/write loads — writes amplify only ~log_b n pages, but the tree stays balanced.
Disk- and SSD-backed storage where page-aligned IO matters more than comparison count.
Prefix scans — composite indexes naturally answer "all rows where (a, b, c) starts with (5, 7)".

If your workload is purely random point inserts at write-heavy throughput (Cassandra, time-series ingest), an LSM tree often beats a B+ tree. If you need durability without locking, copy-on-write B+ trees (LMDB) skip the write-ahead log entirely.

B+ tree vs B-tree vs LSM tree

	B+ tree	B-tree	LSM tree	AVL / RB tree	Hash index	Skip list
Point lookup	O(log_b n) ~3-4 IOs	O(log_b n) similar	O(L) levels checked	O(log n) in-memory	O(1) avg	O(log n) probabilistic
Range scan	Excellent (linked leaves)	Slower (in-order traversal)	Merge across L levels	In-order traversal	Not supported	Excellent
Insert throughput	Moderate (in-place writes)	Moderate	High (sequential writes only)	O(log n) in-memory	O(1) avg	O(log n) probabilistic
Write amplification	1× to 3×	1× to 3×	10× to 30×	N/A (no disk)	1×	1×
Read amplification	1× (one leaf read)	1× per read	L× (check each level)	N/A	1×	O(log n) hops
Space amplification	~1.5× (50% min fill)	~1.5×	2×–10× (uncompacted)	1× + pointers	1× + load factor	1× + ~2 pointers/node
Where used	InnoDB, Postgres, SQLite, LMDB	Filesystems (NTFS, HFS+)	RocksDB, Cassandra, LevelDB	std::map, Java TreeMap	Memcached, Redis hash	Redis sorted set, RocksDB memtable

The headline numbers aren't asymptotic — they're amplification ratios. A B+ tree's small read amplification is why DBAs reach for it first; an LSM's small write amplification per request (despite high background amplification) is why ingest-heavy systems pick it instead.

What the numbers actually look like

4 levels indexes 256M keys at fanout 250, assuming nodes are 80% full: 250⁴ × 0.8⁴ ≈ 1.6B in theory, ~256M observed in practice once you account for partial pages.
Minimum occupancy ≈ 50%. A leaf with capacity 250 must hold at least 125 keys before it can release a sibling on delete. Worst-case space waste: 2× the data size.
Write amplification of 1× to 3×. A single row insert dirties one leaf page; a leaf split dirties two leaves and one parent. Compare with LSM trees at 10× to 30×.
Range scan reads exactly ⌈k / leaf_capacity⌉ + log_b n pages for k results. With capacity 250 and 1M results, that's ~4000 page reads vs ~1M comparisons in a binary tree.

JavaScript implementation (in-memory B+ tree)

// In-memory B+ tree, fanout = ORDER. Real DBs page nodes; this version stays simple.
const ORDER = 4;

class Leaf {
  constructor() { this.keys = []; this.vals = []; this.next = null; this.isLeaf = true; }
}

class Internal {
  constructor() { this.keys = []; this.children = []; this.isLeaf = false; }
}

class BPlusTree {
  constructor() { this.root = new Leaf(); }

  // Returns leaf containing key (or where key would be inserted).
  _findLeaf(key) {
    let n = this.root;
    while (!n.isLeaf) {
      let i = 0;
      while (i < n.keys.length && key >= n.keys[i]) i++;
      n = n.children[i];
    }
    return n;
  }

  get(key) {
    const leaf = this._findLeaf(key);
    const idx  = leaf.keys.indexOf(key);
    return idx === -1 ? undefined : leaf.vals[idx];
  }

  insert(key, val) {
    const leaf = this._findLeaf(key);
    const idx  = leaf.keys.findIndex(k => k >= key);
    const at   = idx === -1 ? leaf.keys.length : idx;
    if (leaf.keys[at] === key) { leaf.vals[at] = val; return; }   // upsert
    leaf.keys.splice(at, 0, key);
    leaf.vals.splice(at, 0, val);
    if (leaf.keys.length < ORDER) return;
    this._splitLeaf(leaf);
  }

  _splitLeaf(leaf) {
    const mid     = Math.ceil(leaf.keys.length / 2);
    const sibling = new Leaf();
    sibling.keys  = leaf.keys.splice(mid);
    sibling.vals  = leaf.vals.splice(mid);
    sibling.next  = leaf.next;
    leaf.next     = sibling;
    this._insertIntoParent(leaf, sibling.keys[0], sibling);
  }

  _insertIntoParent(left, sepKey, right) {
    if (left === this.root) {
      const r = new Internal();
      r.keys = [sepKey];
      r.children = [left, right];
      this.root = r;
      return;
    }
    // (For brevity: production code keeps a parent pointer or stack
    // built during _findLeaf to walk the split up the tree.)
  }

  // Range scan from start (inclusive) to end (exclusive)
  *range(start, end) {
    let n = this._findLeaf(start);
    while (n) {
      for (let i = 0; i < n.keys.length; i++) {
        if (n.keys[i] >= end) return;
        if (n.keys[i] >= start) yield [n.keys[i], n.vals[i]];
      }
      n = n.next;
    }
  }
}

Two real-world implementation details get glossed over in textbooks. First, internal-node splits move the median key up (it doesn't appear in either child); leaf-node splits copy the median up so it still appears in the right child as a real key. Second, production B+ trees keep a parent pointer or build a path stack during search — it's how splits propagate without a second descent.

Python implementation (range scan + split)

ORDER = 4

class Leaf:
    __slots__ = ('keys', 'vals', 'next')
    def __init__(self):
        self.keys, self.vals, self.next = [], [], None
    is_leaf = True

class Internal:
    __slots__ = ('keys', 'children')
    def __init__(self):
        self.keys, self.children = [], []
    is_leaf = False

class BPlusTree:
    def __init__(self):
        self.root = Leaf()

    def _find_leaf_with_path(self, key):
        path, n = [], self.root
        while not n.is_leaf:
            i = 0
            while i < len(n.keys) and key >= n.keys[i]: i += 1
            path.append((n, i))
            n = n.children[i]
        return n, path

    def get(self, key):
        leaf, _ = self._find_leaf_with_path(key)
        for i, k in enumerate(leaf.keys):
            if k == key: return leaf.vals[i]
        return None

    def insert(self, key, val):
        leaf, path = self._find_leaf_with_path(key)
        i = 0
        while i < len(leaf.keys) and leaf.keys[i] < key: i += 1
        if i < len(leaf.keys) and leaf.keys[i] == key:
            leaf.vals[i] = val
            return
        leaf.keys.insert(i, key)
        leaf.vals.insert(i, val)
        if len(leaf.keys) >= ORDER:
            self._split_leaf(leaf, path)

    def _split_leaf(self, leaf, path):
        mid = len(leaf.keys) // 2
        sib = Leaf()
        sib.keys, leaf.keys = leaf.keys[mid:], leaf.keys[:mid]
        sib.vals, leaf.vals = leaf.vals[mid:], leaf.vals[:mid]
        sib.next, leaf.next = leaf.next, sib
        sep = sib.keys[0]
        # Walk up the path, splitting internal nodes as needed
        right = sib
        while path:
            parent, idx = path.pop()
            parent.keys.insert(idx, sep)
            parent.children.insert(idx + 1, right)
            if len(parent.keys) < ORDER: return
            mid = len(parent.keys) // 2
            sep = parent.keys[mid]
            new = Internal()
            new.keys = parent.keys[mid + 1:]
            new.children = parent.children[mid + 1:]
            parent.keys = parent.keys[:mid]
            parent.children = parent.children[:mid + 1]
            right = new
        # Bumped past the root: grow the tree taller
        new_root = Internal()
        new_root.keys = [sep]
        new_root.children = [self.root, right]
        self.root = new_root

    def range_scan(self, start, end):
        leaf, _ = self._find_leaf_with_path(start)
        while leaf is not None:
            for k, v in zip(leaf.keys, leaf.vals):
                if k >= end: return
                if k >= start: yield k, v
            leaf = leaf.next

The split has three distinct cases: leaf overflow (copy median up), internal overflow (move median up), and root overflow (allocate a new root). Most textbook B+ tree bugs come from forgetting that copy-vs-move distinction.

Variants used in production

Copy-on-write B+ tree (LMDB, BoltDB). Instead of in-place writes, every modified page is rewritten as a new page; the root is swapped atomically. No write-ahead log, no torn writes, snapshot reads for free. The trade-off is write amplification — a single change rewrites every page on the path to the root.

Bw-tree (Microsoft Hekaton, CMU Peloton). Lock-free B+ tree variant that appends "delta records" to nodes instead of mutating them. A periodic consolidation collapses deltas back into the base node. Optimised for many-core CPUs.

Adaptive Radix Tree (ART). Not strictly a B+ tree, but a common in-memory replacement. ART uses 4/16/48/256-way nodes that grow as keys arrive, plus path compression. Used by HyPer, DuckDB, and TigerBeetle as the in-memory index of choice.

Fractal tree (Tokutek, MyRocks's predecessor). Adds a per-node message buffer; writes are queued and flushed lazily on read. Trades read amplification for dramatically better write throughput.

Compressed B+ tree. Many real systems prefix-compress keys within a leaf and run-length-encode duplicates. PostgreSQL's btree_gin, MySQL's compressed pages, and LMDB's prefix compression are examples.

FB-tree / OLFIT-style concurrent B+ trees. Use optimistic concurrency control (versioning + retry) instead of latch coupling. The default in modern in-memory engines.

Common bugs and edge cases

Forgetting to update the sibling pointer. A leaf split that doesn't link the new sibling makes range scans skip half the data, silently.
Confusing copy-up (leaves) with move-up (internals). If you remove the median from the leaf during a split, point lookups for that key fail.
Underflow merging that forgets to fix the parent separator. Delete is harder than insert; missing the separator update creates phantom keys.
Treating equal keys ambiguously. Production B+ trees use unique tie-breakers (row IDs) to disambiguate duplicates and keep range scans deterministic.
Not handling root collapse. When the root has exactly one child after a delete, you must replace the root with that child or the tree grows artificially tall.
Concurrent splits without latch coupling. Two writers can split the same node simultaneously and lose one update — production engines use optimistic concurrency or hand-over-hand latching.
Page-aligned writes that aren't page-aligned. A node that crosses a 4 KB boundary defeats the cache benefit of the whole structure.

B+ Tree

Interactive visualization

Watch the 60-second explainer

How a B+ tree works

Why this shape, on disk

When B+ trees shine

B+ tree vs B-tree vs LSM tree

What the numbers actually look like

JavaScript implementation (in-memory B+ tree)

Python implementation (range scan + split)

Variants used in production

Common bugs and edge cases

Frequently asked questions

Interactive visualization

Watch the 60-second explainer

How a B+ tree works

Why this shape, on disk

When B+ trees shine

B+ tree vs B-tree vs LSM tree

What the numbers actually look like

JavaScript implementation (in-memory B+ tree)

Python implementation (range scan + split)

Variants used in production

Common bugs and edge cases

Frequently asked questions

Related concepts