Data Structures
Rope (data structure)
Balanced tree of string fragments — edit gigabytes in log time
A rope is a balanced binary tree whose leaves hold string fragments. By storing characters in leaves and the cumulative length in internal nodes, ropes turn insert, delete, and substring-concat into O(log n) operations on text — making them the data structure behind text editors, IDEs, persistent buffers, and collaborative editing systems.
- ConcatO(1) unbalanced, O(log n) balanced
- Split / Insert / DeleteO(log n)
- Index char iO(log n)
- Substring [i, j]O(log n + (j−i))
- Persistent snapshotO(log n)
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How a rope works
Imagine a 100 MB log file open in your editor. You position the cursor at byte 50,000,000 and type one character. With an array-backed buffer, the editor must shift 50 MB of bytes one position to the right — 50 million memcpy ops for a single keystroke. A rope does the same edit by walking down a tree, allocating one new leaf, and rewiring two pointers. The 50 MB on either side is untouched.
The structure itself is straightforward:
- Leaf nodes hold a short string (typically 64–1024 bytes).
- Internal nodes have two children, plus a single number:
weight= total length of the left subtree.
To find character i, descend the tree: at each internal node, if i < weight go left; otherwise subtract weight from i and go right. At a leaf, index into the leaf's string. Done in O(log n).
The key operations all reduce to two primitives:
- Concat(A, B) creates a new internal node with
left = A,right = B,weight = length(A). Trivially O(1) ignoring rebalance. - Split(rope, i) walks down to position
i, breaks the tree along the path, and returns two ropes representing[0..i)and[i..end). O(log n).
From these, everything else follows: insert(rope, i, s) = concat(left, s, right) where (left, right) = split(rope, i). delete(rope, i, j) = concat(split(rope, i)[0], split(rope, j)[1]).
When to use a rope
- Multi-megabyte text editors. xi-editor (Google), Helix, Nim's compiler buffer, and Atom's old buffer used ropes for files in the 100 MB+ range where gap buffers thrash on bulk edits.
- Persistent text and undo. Each edit yields a new tree sharing structure with the old. A 10 MB document with 1,000 undo states costs ~10 MB plus 1,000 × log n nodes — not 10 GB.
- Collaborative editing / CRDTs. Operational transformation and CRDTs need cheap insert-at-position and snapshot — exactly what ropes give. Yjs and Automerge use rope-like structures internally.
- Append-mostly streaming. Concatenation is O(1) (or O(log n) with rebalance), so accumulating a large output buffer doesn't pay the realloc tax of a contiguous string.
- String slicing libraries. Ocaml's
Rope, Java'sfastutilrope, Boost'sboost::flat_string— anywhere you want substring views without copying.
Rope vs gap buffer vs piece table
| Rope | Gap buffer | Piece table | Plain array / vector | |
|---|---|---|---|---|
| Insert at position | O(log n) | O(1) at gap, O(n) elsewhere (gap move) | O(log n) with balanced tree of pieces | O(n) |
| Delete range | O(log n) | O(1) at gap | O(log n) | O(n) |
| Concat | O(1) unbalanced, O(log n) balanced | O(n) | O(1) — append a piece | O(n) |
| Random read (char i) | O(log n) | O(1) | O(log n) | O(1) |
| Cache locality | Poor — pointer chasing | Excellent — one big array | Moderate | Excellent |
| Memory overhead | ~30–50% (tree nodes) | ~10% (gap) | Tiny — just piece descriptors + 2 buffers | 0% |
| Persistent / cheap snapshot | Native | No | Easy | No |
| Used by | xi-editor, Helix, Nim, Y.js | Emacs, Sam | VS Code, Word | String / Vec<char> |
The headline tradeoff: ropes win the asymptotic war (O(log n) for everything), but lose the cache war. For a 1 KB string in a textarea, every approach is fine. For a 100 MB log file with edits anywhere, gap buffers crawl and ropes shine. VS Code's piece table is the middle ground that won "best practical engineering" in the 2018 rewrite — it's a tree of pieces that point into two append-only buffers (original file + appends), which is cache-friendly while still O(log n).
JavaScript implementation — text editor with O(log n) edits
This is a teaching implementation: leaf-or-internal nodes, split, concat, insert, delete, and toString. In production you'd add AVL or splay rebalancing, leaf-merging on adjacent-small leaves, and metadata caches (line counts).
const LEAF_MAX = 64; // characters per leaf — tune for cache size
class Rope {
constructor() { this.root = null; }
static fromString(s) {
const r = new Rope();
r.root = Rope._buildBalanced(s, 0, s.length);
return r;
}
static _buildBalanced(s, lo, hi) {
if (hi - lo <= LEAF_MAX) return { kind: 'leaf', str: s.slice(lo, hi), len: hi - lo };
const mid = (lo + hi) >> 1;
const left = Rope._buildBalanced(s, lo, mid);
const right = Rope._buildBalanced(s, mid, hi);
return { kind: 'node', left, right, weight: left.len ?? Rope._len(left), len: (left.len ?? Rope._len(left)) + (right.len ?? Rope._len(right)) };
}
static _len(n) { return n.kind === 'leaf' ? n.len : n.len; }
// Concat — O(1), produces an internal node referencing both subtrees.
static concat(a, b) {
if (a === null) return b;
if (b === null) return a;
return { kind: 'node', left: a, right: b, weight: Rope._len(a), len: Rope._len(a) + Rope._len(b) };
}
// Split into [0, i) and [i, end). O(log n) on a balanced tree.
static split(node, i) {
if (node === null) return [null, null];
if (node.kind === 'leaf') {
if (i === 0) return [null, node];
if (i >= node.len) return [node, null];
const left = { kind: 'leaf', str: node.str.slice(0, i), len: i };
const right = { kind: 'leaf', str: node.str.slice(i), len: node.len - i };
return [left, right];
}
if (i < node.weight) {
const [a, b] = Rope.split(node.left, i);
return [a, Rope.concat(b, node.right)];
} else {
const [a, b] = Rope.split(node.right, i - node.weight);
return [Rope.concat(node.left, a), b];
}
}
// Index — return char at position i in O(log n).
charAt(i) {
let node = this.root;
while (node.kind !== 'leaf') {
if (i < node.weight) node = node.left;
else { i -= node.weight; node = node.right; }
}
return node.str[i];
}
insert(i, s) {
const [left, right] = Rope.split(this.root, i);
const middle = Rope.fromString(s).root;
this.root = Rope.concat(Rope.concat(left, middle), right);
}
delete(start, end) {
const [left, mid_right] = Rope.split(this.root, start);
const [, right] = Rope.split(mid_right, end - start);
this.root = Rope.concat(left, right);
}
// Materialize back to a string — O(n).
toString() {
const out = [];
const walk = (n) => {
if (n === null) return;
if (n.kind === 'leaf') out.push(n.str);
else { walk(n.left); walk(n.right); }
};
walk(this.root);
return out.join('');
}
}
// Demo: edit "Hello, world!" without copying the whole string per change.
const doc = Rope.fromString("Hello, world!");
doc.insert(7, "wonderful ");
console.log(doc.toString()); // "Hello, wonderful world!"
doc.delete(5, 7);
console.log(doc.toString()); // "Hellowonderful world!"
The above is unbalanced — repeated left-leaning concats build a stick. Real implementations sit on top of an AVL or splay tree, or use Boehm's Fibonacci-rebalance trigger.
Python implementation
LEAF_MAX = 64
class Rope:
def __init__(self, root=None):
self.root = root
@staticmethod
def from_string(s):
return Rope(Rope._build(s, 0, len(s)))
@staticmethod
def _build(s, lo, hi):
if hi - lo <= LEAF_MAX:
return {'kind': 'leaf', 'str': s[lo:hi], 'len': hi - lo}
mid = (lo + hi) // 2
left = Rope._build(s, lo, mid)
right = Rope._build(s, mid, hi)
return {
'kind': 'node',
'left': left,
'right': right,
'weight': left['len'],
'len': left['len'] + right['len'],
}
@staticmethod
def _len(n):
return 0 if n is None else n['len']
@staticmethod
def concat(a, b):
if a is None:
return b
if b is None:
return a
return {
'kind': 'node',
'left': a,
'right': b,
'weight': Rope._len(a),
'len': Rope._len(a) + Rope._len(b),
}
@staticmethod
def split(node, i):
if node is None:
return None, None
if node['kind'] == 'leaf':
if i == 0: return None, node
if i >= node['len']: return node, None
return ({'kind': 'leaf', 'str': node['str'][:i], 'len': i},
{'kind': 'leaf', 'str': node['str'][i:], 'len': node['len'] - i})
if i < node['weight']:
a, b = Rope.split(node['left'], i)
return a, Rope.concat(b, node['right'])
else:
a, b = Rope.split(node['right'], i - node['weight'])
return Rope.concat(node['left'], a), b
def char_at(self, i):
n = self.root
while n['kind'] != 'leaf':
if i < n['weight']:
n = n['left']
else:
i -= n['weight']
n = n['right']
return n['str'][i]
def insert(self, i, s):
left, right = Rope.split(self.root, i)
middle = Rope.from_string(s).root
self.root = Rope.concat(Rope.concat(left, middle), right)
def delete(self, start, end):
left, mid_right = Rope.split(self.root, start)
_, right = Rope.split(mid_right, end - start)
self.root = Rope.concat(left, right)
def to_string(self):
out = []
def walk(n):
if n is None: return
if n['kind'] == 'leaf': out.append(n['str'])
else: walk(n['left']); walk(n['right'])
walk(self.root)
return ''.join(out)
For real Python use, the pyrsistent library has a PRope; for genuine performance, drop to the C extension cffi bindings of xi-rope or use a piece-table-based editor backend.
Variants
- AVL rope. Wrap each internal node with AVL height, rotating on imbalance. Guarantees O(log n) splits and concats. Boehm's original 1995 paper used a Fibonacci-spaced threshold instead.
- Splay rope. After each access, splay the touched node to the root. Amortized O(log n) with great cache behavior on locality-of-reference workloads (typing keeps editing near one cursor).
- B-tree rope (xi-rope). Use a B-tree with leaf chunks of ~1 KB and metadata stored at internal nodes (line counts, invalid-UTF-8 markers). Vastly better cache behavior than binary ropes.
- Persistent rope. Path-copy on edit so each modification yields a new immutable rope sharing structure with the old. The basis for cheap undo and concurrent reads.
- Augmented rope. Each internal node carries derived metadata: code-point counts, line counts, grapheme cluster counts, syntax-tree synchronization markers. Lets cursor moves by line / character / grapheme run in O(log n) too.
- Hybrid rope+gap. Use a rope at coarse granularity, gap buffers within each leaf. Combines the asymptotic wins of ropes with cache-locality wins of gaps for sustained-cursor edits.
Common bugs and edge cases
- Failure to rebalance. Repeated
insertat the end of a rope without rebalance produces a left-leaning stick — a 1 GB document becomes a depth-1,000,000 tree, and your "O(log n)" lookups are now O(n). Always pair the rope with a balancing strategy. - Splitting a multi-byte UTF-8 sequence. If you index into a UTF-8 byte stream and a leaf boundary lands inside a multi-byte char, the rope shows two corrupted halves. Either index by code point (slower) or use UTF-8-aware leaves that refuse to split inside a sequence.
- Stale weights after edit. When you split or concat, the parent's
weightfield needs to reflect the new left-subtree length. A common bug is to updatelenbut forgetweight; the next index lookup walks the wrong way and silently returns garbage. - Forgetting that ropes share structure. If you mutate a leaf's
strin place, you break every other rope that shares that subtree (undo states, concurrent readers). Treat nodes as immutable; allocate new ones on edit. - O(n)
toStringin a hot path. Editor authors sometimes callrope.toString()on every keystroke for redrawing. Don't — read directly from the rope, or maintain a viewport-cached substring. - Adjacent-small-leaf fragmentation. After many tiny inserts, the tree fills with 1-character leaves — overhead dominates. Periodically merge adjacent leaves whose combined size is below a threshold.
Historical note
Boehm, Atkinson, and Plass introduced the rope in their 1995 paper "Ropes: an Alternative to Strings", motivated by Cedar Mesa, an experimental editor at Xerox PARC that needed to handle multi-megabyte buffers on 1990s-era hardware. The paper popularized the Fibonacci-rebalance trigger and the standard split/concat primitives. Today, the Rust ecosystem has xi-rope (Raph Levien) and jumprope; OCaml has Rope; and CRDTs like Yjs use rope-flavored sequence structures internally.
Frequently asked questions
Why is rope insertion O(log n) when inserting into an array is O(n)?
Arrays must shift every byte after the insertion point, costing O(n). Ropes split the tree at the insertion position (O(log n) descent), create one new internal node, and concatenate. No bytes move. The character data is shared with the rest of the tree, untouched.
Why don't browsers use ropes for textarea editing?
Most textareas hold under 100 KB and most edits are at one cursor position, so a gap buffer beats a rope on cache locality. Ropes start winning around 10 MB or when edits scatter across the document — where shifting bytes becomes the bottleneck. VS Code's text engine uses a piece table for similar reasons.
Rope vs gap buffer vs piece table — which is right for an editor?
Gap buffer wins for small files with cursor-localized edits — Emacs uses it. Piece table wins for large files with append-mostly history (good undo) — VS Code uses it. Rope wins for very large files, scattered edits, or scenarios that need cheap clone/snapshot — used in xi-editor, Helix, Nim's compiler, and CRDTs for collaborative editing.
How do you keep a rope balanced?
Most production ropes piggyback on a self-balancing tree: AVL ropes, splay ropes, or red-black ropes. The 'rebalance threshold' approach in Boehm's original 1995 paper uses the Fibonacci sequence — flatten and rebuild whenever the tree is more imbalanced than the n-th Fibonacci-bounded shape allows. Modern implementations (xi-rope) use a B-tree with leaf chunks of ~1 KB.
Are ropes good for Unicode?
They're agnostic — ropes index by byte (or by code point if you choose), and a multi-byte UTF-8 sequence must never be split across leaves. Production implementations like xi-rope augment leaf nodes with metadata (line counts, code-point counts, grapheme cluster counts) so cursor moves and line counts run in O(log n) too.
Can ropes support cheap snapshots and undo?
Yes — ropes are a natural fit for persistent data structures. After an edit, the new tree shares all unchanged subtrees with the old one. A 1-billion-character document with 1,000 undo states costs roughly 1 billion + (1,000 × log n) characters of memory, not 1 trillion.