Data Structures
2D Fenwick Tree (Binary Indexed Tree)
Nest two BIT loops, get submatrix sums in O(log² N)
A 2D Fenwick tree nests two BIT loops to support point updates and submatrix prefix sums in O(log N · log M). It is the canonical structure for dynamic 2D range-aggregate queries on rectangular grids.
- UpdateO(log N · log M)
- Submatrix queryO(log N · log M)
- BuildO(NM)
- MemoryN · M integers
- 1000×1000 grid~100 cells/op
- OperationAny group (sum, XOR)
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How a 2D Fenwick tree works
The 1D Fenwick tree stores partial sums of an array along ranges whose lengths are powers of two — every bit[i] covers arr[i − lowbit(i) + 1 .. i]. The 2D version applies the same trick to each axis independently. A cell bit[r][c] stores the sum of a rectangular subgrid whose row-span equals lowbit(r) and column-span equals lowbit(c).
Concretely, with N = M = 8, cell bit[4][6] stores the sum of arr[1..4][5..6] — a 4×2 rectangle ending at (4, 6). Cell bit[8][8] stores the entire grid. Cell bit[3][3] stores only arr[3][3] because both rows and columns have lowbit = 1.
To update position (r, c) by δ, walk row-up via r += r & −r until you exceed N. For every row touched, walk column-up via c += c & −c until you exceed M, adding δ at each (r, c) pair. To query the prefix sum over [1..r][1..c], walk both axes downward by subtracting low-bits, accumulating bit[r][c]. Both operations touch O(log N · log M) cells.
To get a submatrix sum from (r1, c1) to (r2, c2), apply 2D inclusion-exclusion:
submatrix(r1,c1,r2,c2) = prefix(r2,c2)
- prefix(r1-1,c2)
- prefix(r2,c1-1)
+ prefix(r1-1,c1-1)
Four prefix queries — each O(log² N) — give the rectangle in total O(log² N) time.
Cost in concrete numbers
On a 1000 × 1000 grid, one BIT operation touches roughly log₂(1000)² ≈ 100 cells. Compare that to the brute-force alternatives:
- Re-scan the grid for every query: O(NM) = 10⁶ cells per query.
- Static 2D prefix-sum array: O(1) query but updates cost O(NM) — useless if the grid mutates.
- 2D Fenwick tree: 100 cells per query and per update — six orders of magnitude better than re-scan when both operations interleave.
For a 10⁶-cell grid with mixed update/query workload, expect roughly 500k operations per second in C++, ~100k in JavaScript with Int32Array. Image-filter passes, dynamic heat-maps and live leaderboards all sit in this range.
When to use a 2D Fenwick tree
- Dynamic submatrix sums where both rows and columns are bounded and dense.
- 2D dominance counting (after coordinate compression) — how many earlier points sit below-left of a new point.
- Online image processing where pixel updates require integral-image refreshes (medical imaging, satellite raster live edits).
- Game boards with up-to-thousands of cells and frequent area scoring.
- Statistical heat-maps where a 2D BIT is cheaper than re-deriving cumulative tables on each tick.
Avoid it when the grid is sparse (use a hashmap-backed nested BIT, or a kd-tree), or when you need range min/max (no group inverse — fall back to 2D sparse tables or a 2D segment tree).
2D Fenwick tree vs 2D segment tree vs 2D prefix-sum
| 2D Fenwick tree | 2D segment tree | Static 2D prefix-sum | |
|---|---|---|---|
| Update | O(log N · log M) | O(log N · log M) | O(NM) — must rebuild |
| Submatrix query | O(log N · log M) | O(log N · log M) | O(1) |
| Memory | N · M ints | ~16 · N · M ints | N · M ints |
| Code length | ~25 lines | ~120 lines | ~10 lines |
| Operations supported | Any group (sum, XOR) | Any monoid (min, gcd) | Any group |
| Range updates | Yes (4 BITs, 2D BIT2 trick) | Yes, with 2D lazy propagation | No (only static) |
| Constant factor | Smallest dynamic option | ~4–8× slower than 2D BIT | Fastest queries; useless if dynamic |
The 2D segment tree's appeal is range min/max, but it eats roughly 16× the memory of a 2D BIT — a 10³ × 10³ board uses 64 MB instead of 4 MB. For sum-only problems, the BIT wins on every axis except code complexity (the BIT is shorter).
Pseudo-code
tree[1..N][1..M] = 0 # 1-indexed
function update(r, c, delta):
i = r
while i <= N:
j = c
while j <= M:
tree[i][j] += delta
j += j & -j
i += i & -i
function prefixSum(r, c):
s = 0
i = r
while i > 0:
j = c
while j > 0:
s += tree[i][j]
j -= j & -j
i -= i & -i
return s
function submatrix(r1, c1, r2, c2):
return prefixSum(r2, c2)
- prefixSum(r1-1, c2)
- prefixSum(r2, c1-1)
+ prefixSum(r1-1, c1-1)
JavaScript implementation
class Fenwick2D {
constructor(n, m) {
this.n = n; this.m = m;
// (n+1) rows × (m+1) cols, flat Int32Array for cache locality
this.tree = new Int32Array((n + 1) * (m + 1));
}
_idx(r, c) { return r * (this.m + 1) + c; }
update(r, c, delta) {
for (let i = r; i <= this.n; i += i & -i) {
for (let j = c; j <= this.m; j += j & -j) {
this.tree[this._idx(i, j)] += delta;
}
}
}
prefix(r, c) {
let s = 0;
for (let i = r; i > 0; i -= i & -i) {
for (let j = c; j > 0; j -= j & -j) {
s += this.tree[this._idx(i, j)];
}
}
return s;
}
submatrix(r1, c1, r2, c2) {
return this.prefix(r2, c2)
- this.prefix(r1 - 1, c2)
- this.prefix(r2, c1 - 1)
+ this.prefix(r1 - 1, c1 - 1);
}
}
// Example: 4×4 grid, set a few cells, query a 2×2 block
const bit = new Fenwick2D(4, 4);
bit.update(1, 1, 3);
bit.update(2, 2, 5);
bit.update(2, 3, 2);
bit.update(3, 1, 7);
console.log(bit.submatrix(2, 2, 3, 3)); // 5 + 2 = 7
The flat Int32Array with a manual index function is roughly 3× faster than an array of arrays in V8, because nested boxed arrays force pointer chasing and bounds checks on every dereference. For very wide grids consider tiling the inner BIT into cache-line-sized rows.
Python implementation
class Fenwick2D:
def __init__(self, n, m):
self.n, self.m = n, m
self.tree = [[0] * (m + 1) for _ in range(n + 1)]
def update(self, r, c, delta):
i = r
while i <= self.n:
j = c
while j <= self.m:
self.tree[i][j] += delta
j += j & -j
i += i & -i
def prefix(self, r, c):
s = 0
i = r
while i > 0:
j = c
while j > 0:
s += self.tree[i][j]
j -= j & -j
i -= i & -i
return s
def submatrix(self, r1, c1, r2, c2):
return (self.prefix(r2, c2)
- self.prefix(r1 - 1, c2)
- self.prefix(r2, c1 - 1)
+ self.prefix(r1 - 1, c1 - 1))
bit = Fenwick2D(4, 4)
for r, c, v in [(1,1,3), (2,2,5), (2,3,2), (3,1,7)]:
bit.update(r, c, v)
print(bit.submatrix(2, 2, 3, 3)) # 7
Variants
Range update + range query (2D BIT2)
Maintain four 2D BITs storing different correction terms — a direct generalisation of the 1D BIT2 trick. A rectangular update at [r1, c1, r2, c2] becomes 16 point updates (four BITs × four inclusion-exclusion corners); a rectangular query becomes 16 prefix queries. Total work stays O(log² N) but constants grow by ~16×.
Offline coordinate compression
For sparse inputs over a large coordinate range, read all updates and queries up front, compress (r, c) values to ranks 1..k, build a k × k dense 2D BIT (or smaller if rectangles allow). Memory becomes O(events²) instead of O(N · M).
BIT-of-BITs versus BIT-of-segtrees
A nested 2D BIT is symmetric — both axes use BIT semantics. A BIT-of-segtrees lets the inner axis support range min/max while the outer remains a BIT. Useful when one axis has point updates and the other needs richer queries.
Persistent 2D BIT (rare)
Pure persistence is awkward — path-copy across both axes inflates memory to O(events · log² N). For persistence problems, switch to a 2D segment tree or to offline sweep-line techniques.
Common bugs and edge cases
- Forgetting 1-indexing on both axes. Either dimension's lowbit hangs at zero. Allocate
(N+1) × (M+1)and shift inputs by +1 when reading from a 0-indexed source. - Submatrix off-by-one. The inclusion-exclusion uses
r1−1andc1−1— notr1andc1. Passprefix(0, c)safely; the function returns 0 by construction. - Overflow. Even on modest grids, a million summed cells with values up to 10⁶ overflow signed 32-bit. Use 64-bit accumulators (BigInt or 64-bit typed arrays) whenever the cell range times the cell count can exceed 2³¹.
- Memory explosion. A 10⁴ × 10⁴ 2D BIT is 100 million ints — 400 MB at 4 bytes each. Coordinate-compress first if your grid is sparser than 10% filled.
- Naïve build instead of O(NM). Calling
updatefor every cell is O(NM log² N) — for a 1000 × 1000 grid that's 10⁸ ops, several seconds. Instead initialise the array of values and propagate each cell to its 2D parent in one pass. - Mismatched n and m in inner loop. Using
j <= ninstead ofj <= min the inner walk works only on a square grid. Non-square grids silently drop columns.
Frequently asked questions
How fast is a 2D Fenwick tree in practice?
On a 1000×1000 grid, each update touches log₂(1000) ≈ 10 rows × 10 columns = 100 cells. Each cell update is a few cycles. A modern CPU handles around half a million 2D BIT updates per second on a million-cell grid — fast enough for interactive image filtering or live-board game state, but not for full-frame video at 60 fps.
Why O(log²) and not O(log)?
Each dimension contributes one logarithmic factor independently. The outer loop walks log N rows; for every row touched, the inner loop walks log M columns. Multiply the two — log N · log M — and that is your work per update or query. The 2D BIT cannot beat this by being clever: rank-and-coordinate-compressed inputs still need both walks.
When is a 2D Fenwick tree the wrong choice?
Sparse grids. If only a few thousand cells are non-zero in a 10⁶ × 10⁶ space, an offline sweep with a 1D BIT (Mo's algorithm in 2D), a kd-tree, or a hash-backed BIT beats the dense O(NM) memory cost. The 2D BIT is also a poor fit for range min/max — no inverse exists, so the trick that powers BIT for sum cannot be ported.
Can a 2D Fenwick tree do range updates plus range queries?
Yes, with four 2D BITs in parallel, generalising the BIT2 trick from 1D. Each rectangular update adds a constant to a submatrix; queries return the rectangular sum. Cost stays O(log² N) per operation but constants quadruple, so check whether your problem really needs it before writing 100 lines of 2D BIT bookkeeping.
How does the inversion-count trick generalise to 2D?
It becomes the 2D dominance problem: count pairs where one point strictly dominates another in both coordinates. Sort by one axis, use a 1D BIT indexed by the other after coordinate compression. The 2D BIT itself appears when both axes are dynamic — for example, counting kings dominating points as both move.
Why is the submatrix sum formula prefix(r2,c2) − prefix(r1−1,c2) − prefix(r2,c1−1) + prefix(r1−1,c1−1)?
Inclusion-exclusion. The large prefix overcounts everything above-left of (r2, c2). Subtracting the two strip prefixes removes that overhang twice in the corner, so you add the corner prefix back. Four 2D queries — still O(log²) per submatrix.