Data Structures

Fenwick Tree (Binary Indexed Tree)

Twelve lines of bit-tricks for O(log n) prefix sums

A Fenwick tree, also called a binary indexed tree, supports point updates and prefix sums on an array in O(log n) per operation using just one array of length n. It is the smallest, fastest range-aggregate structure when sums or XORs are enough.

  • UpdateO(log n)
  • Prefix queryO(log n)
  • Range queryO(log n)
  • BuildO(n)
  • Spacen integers

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How a Fenwick tree works

You have an array of n numbers and a stream of two operations: update(i, delta) adds delta to position i, and prefixSum(i) returns the sum of positions 1 through i. The naive solutions sit at opposite extremes:

  • Plain array: update is O(1), prefix sum is O(n).
  • Maintained prefix-sum array: prefix sum is O(1), update is O(n) (you have to fix every later prefix).

Fenwick noticed that you can split the difference. Each cell tree[i] stores the sum of a range whose length is the value of the lowest set bit of i. Cell 12 (binary 1100, lowest bit = 4) stores arr[9..12]. Cell 8 (binary 1000, lowest bit = 8) stores arr[1..8]. Cell 6 (binary 0110, lowest bit = 2) stores arr[5..6].

The result is a forest of partial sums. To compute prefixSum(13), you visit tree[13] (covers 13..13), then tree[12] (covers 9..12), then tree[8] (covers 1..8), then stop. Three reads. The path always halves: 13 → 12 → 8 → 0 strips one bit per step, so the loop runs at most ⌊log₂ n⌋ + 1 times.

The bit trick that drives both update and query is i & -i, which extracts the lowest set bit of i by exploiting two's complement. To update, you walk up by adding the lowest set bit (i → i + (i & -i)). To query, you walk down by subtracting it (i → i - (i & -i)). Two loops, three lines each, both O(log n).

Why it's fast in practice

A Fenwick tree node is one machine word. Update and query are tight loops with one bitwise AND, one negation, one addition, one subtraction, and one array access per iteration — so roughly 5 cycles per level. On a modern CPU with branch prediction warmed up, a Fenwick tree handles about 50 million updates per second on an array of size 10^6. A segment tree on the same workload runs at maybe 15-20 million.

The build step is also linear: write the array values into the BIT, then for each i propagate tree[i] into tree[i + (i & -i)] in one pass. That's O(n), versus the naive build that does n separate updates for O(n log n).

When to use a Fenwick tree

  • Point updates and prefix or range sums on a dynamic array.
  • Counting how many elements seen so far are less than x — the canonical inversion-count problem.
  • 2D "submatrix sum" with a 2D BIT (one BIT-update operation per dimension nested).
  • Order-statistic queries (k-th smallest) by binary lifting on the BIT.
  • Anywhere you'd reach for a segment tree but only need an invertible operation (sum, XOR, count).

Reach for a segment tree instead when you need range min/max/gcd (no inverse exists) or when you need range updates with arbitrary monoids — Fenwick handles range update + range sum but not range min.

Fenwick tree vs segment tree vs sqrt-decomposition

Fenwick treeSegment treeSqrt-decomposition
UpdateO(log n)O(log n)O(1) + O(√n) for block fixup
Range queryO(log n)O(log n)O(√n)
Memoryn integers≈4n integersn + √n integers
Code length~12 lines~40 lines~25 lines
Operations supportedSum, XOR, any groupAny associative monoidAny associative monoid
Range updatesSum only, with 2-BIT trickNative, with lazy propagationNative, store block delta
PersistenceHardEasy (path-copying)Easy (block snapshots)
Constant factorSmallest of the three~2-3× FenwickLarger but cache-friendly

Sqrt-decomposition is worth knowing because it's the easiest to extend to weird operations (mode of a range, number of distinct elements with Mo's algorithm) where neither tree structure carries you through.

Pseudo-code

tree = array of size n+1, all zero  # 1-indexed

function update(i, delta):
    while i <= n:
        tree[i] += delta
        i += i & -i

function prefixSum(i):
    sum = 0
    while i > 0:
        sum += tree[i]
        i -= i & -i
    return sum

function rangeSum(l, r):
    return prefixSum(r) - prefixSum(l - 1)

JavaScript implementation

class Fenwick {
  constructor(n) {
    this.n = n;
    this.tree = new Int32Array(n + 1);  // 1-indexed
  }
  update(i, delta) {
    for (; i <= this.n; i += i & -i) this.tree[i] += delta;
  }
  prefix(i) {
    let s = 0;
    for (; i > 0; i -= i & -i) s += this.tree[i];
    return s;
  }
  range(l, r) { return this.prefix(r) - this.prefix(l - 1); }
}

// O(n) build: write values, then propagate each cell into its parent
function buildFenwick(values) {
  const n = values.length;
  const bit = new Fenwick(n);
  for (let i = 1; i <= n; i++) bit.tree[i] = values[i - 1];
  for (let i = 1; i <= n; i++) {
    const p = i + (i & -i);
    if (p <= n) bit.tree[p] += bit.tree[i];
  }
  return bit;
}

Int32Array matters: V8 specialises tight loops over typed arrays into the same machine code you'd get from C. A regular Array hits boxed-number paths and runs 3-5× slower for the same algorithm.

Python implementation — inversion count

The classic Fenwick interview question: count how many pairs (i, j) with i < j satisfy arr[i] > arr[j]. Brute force is O(n²); merge sort gets O(n log n) but is a chore to inline; a Fenwick tree gets there in a dozen lines.

class Fenwick:
    def __init__(self, n):
        self.n = n
        self.tree = [0] * (n + 1)

    def update(self, i, delta=1):
        while i <= self.n:
            self.tree[i] += delta
            i += i & -i

    def prefix(self, i):
        s = 0
        while i > 0:
            s += self.tree[i]
            i -= i & -i
        return s

def count_inversions(arr):
    # Coordinate-compress values into 1..k
    rank = {v: i + 1 for i, v in enumerate(sorted(set(arr)))}
    bit = Fenwick(len(rank))
    inv = 0
    for x in arr:
        v = rank[x]
        # Count elements already inserted that are GREATER than v
        inv += bit.prefix(len(rank)) - bit.prefix(v)
        bit.update(v, 1)
    return inv

print(count_inversions([3, 1, 2, 5, 4]))  # 3

Each element triggers one query and one update — 2n BIT operations, total time O(n log n). For n = 10^6 that's ~40M cell touches, well under a second in Python.

Variants

1-indexed vs 0-indexed

The arithmetic i & -i only works when i ≥ 1 — for i = 0 it returns 0 and the loop hangs. Two ways to live with this: keep the BIT 1-indexed and call update(i+1, ...) from your 0-indexed array, or shift the indexing internally. The first is simpler; the second saves no time and is a bug magnet.

2D Fenwick tree

For a matrix with point updates and submatrix sums, nest two BIT loops: for r += r&-r ... for c += c&-c. Memory becomes O(rows × cols), update and query are O(log r · log c). Don't use it for sparse 2D — switch to a hashmap-backed BIT or a kd-tree.

Range update + range sum (BIT2)

Maintain two BITs: B1 tracks the additive delta at each position, B2 tracks delta · (left-1). Then prefixSum(i) = i·B1.prefix(i) - B2.prefix(i). Range update at [l, r] becomes 4 BIT updates; range query becomes 2 prefix queries each on 2 BITs. Still all O(log n).

Order statistics (find k-th smallest)

If the BIT counts occurrences (frequencies), you can find the k-th smallest in O(log n) — not O(log² n) — by binary lifting from the highest bit downward. At each step, decide whether the next bit's subtree contains enough cumulative count to include rank k; if yes, descend. This is sometimes called "BIT walk" or "indexed sets".

Persistent / multidimensional variants

Pure Fenwick trees aren't easy to make persistent (path-copy is awkward without explicit nodes). For persistence, switch to a segment tree. For weighted variants — say, the BIT stores a tuple — you can store any group, but inverses must exist (so multiplication mod prime is fine; matrix product where matrices may be singular is not).

Common bugs and edge cases

  • Mixing 0- and 1-indexing. One off-by-one and either the loop runs forever or you skip the first element. Pick a convention and document it at the top of the class.
  • Allocating tree at size n instead of n+1. The 1-indexed slot at position n needs to exist; otherwise updates at the last position write out of bounds.
  • Forgetting coordinate compression on inversion count. If values are up to 10^9 you cannot index a BIT by raw value — compress to ranks 1..k first.
  • Signed integer overflow. Sums of up to 10^5 values bounded by 10^9 fit in a signed 32-bit int with no margin. Use 64-bit accumulators when in doubt.
  • Range update with the wrong inverse. The 2-BIT range-update trick only works for the "add a constant to every cell" operation, because addition has an inverse. Don't try to extend it to range-multiply or range-assign — those need a segment tree with lazy propagation.
  • Querying prefix(0). By construction this returns 0 — but check that your range(l, r) caller passes l - 1, not l, so l = 1 doesn't underflow.

Frequently asked questions

What is the difference between a Fenwick tree and a segment tree?

Both give O(log n) updates and queries. A Fenwick tree is half the code, half the constants, half the memory — but it only supports invertible operations like sum and XOR. A segment tree handles arbitrary monoids (min, max, gcd) and supports lazy propagation for range updates, at the cost of 4n memory and roughly 2-3× slower constants.

Why is a Fenwick tree 1-indexed?

The update step moves to i + (i & -i) and the query step moves to i - (i & -i). When i is 0, the lowest-set-bit operation gives 0, so both loops would never advance. Starting at index 1 sidesteps this — the math works cleanly only with 1-based indexing.

What does i & -i compute?

It isolates the lowest set bit of i — the value 2^k where k is the position of the rightmost 1 in the binary representation of i. For i = 12 (binary 1100), the lowest set bit is 4. This single trick is what makes the Fenwick tree's update and query loops walk O(log n) ancestors instead of O(n).

Can a Fenwick tree do range updates?

Yes, with two BITs storing the difference array. One BIT tracks the running delta and another tracks the i·delta correction term, so a range update is O(log n) and a prefix-sum query is also O(log n). For range update + range query you need both — see the BIT2 trick.

Why is Fenwick tree popular in competitive programming?

Two reasons: roughly 12 lines of code (versus 40+ for a segment tree), and constant factors small enough that a 10^7-operation solution fits in a 1-second time limit. For point-update + prefix-sum problems specifically, no other structure matches its cycles-per-op.

How do you count inversions in O(n log n)?

Coordinate-compress the array values into 1..k, then walk left to right. For each element with compressed value v, query the prefix sum [1..v] to count earlier elements that are at most v, subtract from i to get earlier elements greater than v (those are inversions ending here), then add 1 to position v. Total: 2n BIT operations.