Data Structures
Fenwick Tree (Binary Indexed Tree)
Twelve lines of bit-tricks for O(log n) prefix sums
A Fenwick tree, also called a binary indexed tree, supports point updates and prefix sums on an array in O(log n) per operation using just one array of length n. It is the smallest, fastest range-aggregate structure when sums or XORs are enough.
- UpdateO(log n)
- Prefix queryO(log n)
- Range queryO(log n)
- BuildO(n)
- Spacen integers
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How a Fenwick tree works
You have an array of n numbers and a stream of two operations: update(i, delta) adds delta to position i, and prefixSum(i) returns the sum of positions 1 through i. The naive solutions sit at opposite extremes:
- Plain array: update is O(1), prefix sum is O(n).
- Maintained prefix-sum array: prefix sum is O(1), update is O(n) (you have to fix every later prefix).
Fenwick noticed that you can split the difference. Each cell tree[i] stores the sum of a range whose length is the value of the lowest set bit of i. Cell 12 (binary 1100, lowest bit = 4) stores arr[9..12]. Cell 8 (binary 1000, lowest bit = 8) stores arr[1..8]. Cell 6 (binary 0110, lowest bit = 2) stores arr[5..6].
The result is a forest of partial sums. To compute prefixSum(13), you visit tree[13] (covers 13..13), then tree[12] (covers 9..12), then tree[8] (covers 1..8), then stop. Three reads. The path always halves: 13 → 12 → 8 → 0 strips one bit per step, so the loop runs at most ⌊log₂ n⌋ + 1 times.
The bit trick that drives both update and query is i & -i, which extracts the lowest set bit of i by exploiting two's complement. To update, you walk up by adding the lowest set bit (i → i + (i & -i)). To query, you walk down by subtracting it (i → i - (i & -i)). Two loops, three lines each, both O(log n).
Why it's fast in practice
A Fenwick tree node is one machine word. Update and query are tight loops with one bitwise AND, one negation, one addition, one subtraction, and one array access per iteration — so roughly 5 cycles per level. On a modern CPU with branch prediction warmed up, a Fenwick tree handles about 50 million updates per second on an array of size 10^6. A segment tree on the same workload runs at maybe 15-20 million.
The build step is also linear: write the array values into the BIT, then for each i propagate tree[i] into tree[i + (i & -i)] in one pass. That's O(n), versus the naive build that does n separate updates for O(n log n).
When to use a Fenwick tree
- Point updates and prefix or range sums on a dynamic array.
- Counting how many elements seen so far are less than x — the canonical inversion-count problem.
- 2D "submatrix sum" with a 2D BIT (one BIT-update operation per dimension nested).
- Order-statistic queries (k-th smallest) by binary lifting on the BIT.
- Anywhere you'd reach for a segment tree but only need an invertible operation (sum, XOR, count).
Reach for a segment tree instead when you need range min/max/gcd (no inverse exists) or when you need range updates with arbitrary monoids — Fenwick handles range update + range sum but not range min.
Fenwick tree vs segment tree vs sqrt-decomposition
| Fenwick tree | Segment tree | Sqrt-decomposition | |
|---|---|---|---|
| Update | O(log n) | O(log n) | O(1) + O(√n) for block fixup |
| Range query | O(log n) | O(log n) | O(√n) |
| Memory | n integers | ≈4n integers | n + √n integers |
| Code length | ~12 lines | ~40 lines | ~25 lines |
| Operations supported | Sum, XOR, any group | Any associative monoid | Any associative monoid |
| Range updates | Sum only, with 2-BIT trick | Native, with lazy propagation | Native, store block delta |
| Persistence | Hard | Easy (path-copying) | Easy (block snapshots) |
| Constant factor | Smallest of the three | ~2-3× Fenwick | Larger but cache-friendly |
Sqrt-decomposition is worth knowing because it's the easiest to extend to weird operations (mode of a range, number of distinct elements with Mo's algorithm) where neither tree structure carries you through.
Pseudo-code
tree = array of size n+1, all zero # 1-indexed
function update(i, delta):
while i <= n:
tree[i] += delta
i += i & -i
function prefixSum(i):
sum = 0
while i > 0:
sum += tree[i]
i -= i & -i
return sum
function rangeSum(l, r):
return prefixSum(r) - prefixSum(l - 1)
JavaScript implementation
class Fenwick {
constructor(n) {
this.n = n;
this.tree = new Int32Array(n + 1); // 1-indexed
}
update(i, delta) {
for (; i <= this.n; i += i & -i) this.tree[i] += delta;
}
prefix(i) {
let s = 0;
for (; i > 0; i -= i & -i) s += this.tree[i];
return s;
}
range(l, r) { return this.prefix(r) - this.prefix(l - 1); }
}
// O(n) build: write values, then propagate each cell into its parent
function buildFenwick(values) {
const n = values.length;
const bit = new Fenwick(n);
for (let i = 1; i <= n; i++) bit.tree[i] = values[i - 1];
for (let i = 1; i <= n; i++) {
const p = i + (i & -i);
if (p <= n) bit.tree[p] += bit.tree[i];
}
return bit;
}
Int32Array matters: V8 specialises tight loops over typed arrays into the same machine code you'd get from C. A regular Array hits boxed-number paths and runs 3-5× slower for the same algorithm.
Python implementation — inversion count
The classic Fenwick interview question: count how many pairs (i, j) with i < j satisfy arr[i] > arr[j]. Brute force is O(n²); merge sort gets O(n log n) but is a chore to inline; a Fenwick tree gets there in a dozen lines.
class Fenwick:
def __init__(self, n):
self.n = n
self.tree = [0] * (n + 1)
def update(self, i, delta=1):
while i <= self.n:
self.tree[i] += delta
i += i & -i
def prefix(self, i):
s = 0
while i > 0:
s += self.tree[i]
i -= i & -i
return s
def count_inversions(arr):
# Coordinate-compress values into 1..k
rank = {v: i + 1 for i, v in enumerate(sorted(set(arr)))}
bit = Fenwick(len(rank))
inv = 0
for x in arr:
v = rank[x]
# Count elements already inserted that are GREATER than v
inv += bit.prefix(len(rank)) - bit.prefix(v)
bit.update(v, 1)
return inv
print(count_inversions([3, 1, 2, 5, 4])) # 3
Each element triggers one query and one update — 2n BIT operations, total time O(n log n). For n = 10^6 that's ~40M cell touches, well under a second in Python.
Variants
1-indexed vs 0-indexed
The arithmetic i & -i only works when i ≥ 1 — for i = 0 it returns 0 and the loop hangs. Two ways to live with this: keep the BIT 1-indexed and call update(i+1, ...) from your 0-indexed array, or shift the indexing internally. The first is simpler; the second saves no time and is a bug magnet.
2D Fenwick tree
For a matrix with point updates and submatrix sums, nest two BIT loops: for r += r&-r ... for c += c&-c. Memory becomes O(rows × cols), update and query are O(log r · log c). Don't use it for sparse 2D — switch to a hashmap-backed BIT or a kd-tree.
Range update + range sum (BIT2)
Maintain two BITs: B1 tracks the additive delta at each position, B2 tracks delta · (left-1). Then prefixSum(i) = i·B1.prefix(i) - B2.prefix(i). Range update at [l, r] becomes 4 BIT updates; range query becomes 2 prefix queries each on 2 BITs. Still all O(log n).
Order statistics (find k-th smallest)
If the BIT counts occurrences (frequencies), you can find the k-th smallest in O(log n) — not O(log² n) — by binary lifting from the highest bit downward. At each step, decide whether the next bit's subtree contains enough cumulative count to include rank k; if yes, descend. This is sometimes called "BIT walk" or "indexed sets".
Persistent / multidimensional variants
Pure Fenwick trees aren't easy to make persistent (path-copy is awkward without explicit nodes). For persistence, switch to a segment tree. For weighted variants — say, the BIT stores a tuple — you can store any group, but inverses must exist (so multiplication mod prime is fine; matrix product where matrices may be singular is not).
Common bugs and edge cases
- Mixing 0- and 1-indexing. One off-by-one and either the loop runs forever or you skip the first element. Pick a convention and document it at the top of the class.
- Allocating
treeat size n instead of n+1. The 1-indexed slot at position n needs to exist; otherwise updates at the last position write out of bounds. - Forgetting coordinate compression on inversion count. If values are up to 10^9 you cannot index a BIT by raw value — compress to ranks 1..k first.
- Signed integer overflow. Sums of up to 10^5 values bounded by 10^9 fit in a signed 32-bit int with no margin. Use 64-bit accumulators when in doubt.
- Range update with the wrong inverse. The 2-BIT range-update trick only works for the "add a constant to every cell" operation, because addition has an inverse. Don't try to extend it to range-multiply or range-assign — those need a segment tree with lazy propagation.
- Querying
prefix(0). By construction this returns 0 — but check that yourrange(l, r)caller passesl - 1, notl, sol = 1doesn't underflow.
Frequently asked questions
What is the difference between a Fenwick tree and a segment tree?
Both give O(log n) updates and queries. A Fenwick tree is half the code, half the constants, half the memory — but it only supports invertible operations like sum and XOR. A segment tree handles arbitrary monoids (min, max, gcd) and supports lazy propagation for range updates, at the cost of 4n memory and roughly 2-3× slower constants.
Why is a Fenwick tree 1-indexed?
The update step moves to i + (i & -i) and the query step moves to i - (i & -i). When i is 0, the lowest-set-bit operation gives 0, so both loops would never advance. Starting at index 1 sidesteps this — the math works cleanly only with 1-based indexing.
What does i & -i compute?
It isolates the lowest set bit of i — the value 2^k where k is the position of the rightmost 1 in the binary representation of i. For i = 12 (binary 1100), the lowest set bit is 4. This single trick is what makes the Fenwick tree's update and query loops walk O(log n) ancestors instead of O(n).
Can a Fenwick tree do range updates?
Yes, with two BITs storing the difference array. One BIT tracks the running delta and another tracks the i·delta correction term, so a range update is O(log n) and a prefix-sum query is also O(log n). For range update + range query you need both — see the BIT2 trick.
Why is Fenwick tree popular in competitive programming?
Two reasons: roughly 12 lines of code (versus 40+ for a segment tree), and constant factors small enough that a 10^7-operation solution fits in a 1-second time limit. For point-update + prefix-sum problems specifically, no other structure matches its cycles-per-op.
How do you count inversions in O(n log n)?
Coordinate-compress the array values into 1..k, then walk left to right. For each element with compressed value v, query the prefix sum [1..v] to count earlier elements that are at most v, subtract from i to get earlier elements greater than v (those are inversions ending here), then add 1 to position v. Total: 2n BIT operations.