Data Structures

Sqrt Decomposition

Split the array into √N blocks — queries cost √N, updates cost one

Sqrt decomposition splits an array into √N blocks and stores aggregate-per-block. Updates are O(1), range queries are O(√N), and the code fits in 30 lines. It is the simplest non-trivial range-query structure.

  • Point updateO(1)
  • Range queryO(√N)
  • Range updateO(√N) (lazy add)
  • BuildO(N)
  • MemoryN + √N integers
  • N = 10⁵316 blocks of 316

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How sqrt decomposition works

You have an array of N numbers, point updates, and range-sum queries. Two naive solutions sit at opposite ends:

  • Plain array: update O(1), range query O(N).
  • Maintained prefix-sum array: range query O(1), update O(N).

Sqrt decomposition meets in the middle. Pick a block size B ≈ √N. Slice the array into ⌈N/B⌉ contiguous blocks. For every block, store the precomputed aggregate (sum, in this case). The array itself stays intact.

Point update: change one cell, fix that cell's block aggregate. Two memory writes — O(1).

Range query over [l, r]: walk through three pieces.

  1. The partial left block — cells from l to the end of its block, summed cell-by-cell. At most B cells.
  2. The span of full blocks — every block fully covered by [l, r]. Read each block's aggregate directly. At most N/B blocks.
  3. The partial right block — cells from the start of r's block to r. At most B cells.

Total work: 2B + N/B cells. Setting B = √N minimises this to 2√N + √N = 3√N — that's where the structure's name comes from.

Worked example

Take 16 elements and B = 4, so 4 blocks of 4:

arr   = [3, 1, 4, 1 | 5, 9, 2, 6 | 5, 3, 5, 8 | 9, 7, 9, 3]
block = [    9     |     22      |     21     |     28    ]

Range query [3, 12] (0-indexed, inclusive on both ends):

  • Partial left block (block 0 from index 3): arr[3] = 1.
  • Full blocks: blocks 1 and 2 → 22 + 21 = 43.
  • Partial right block (block 3 from index 12 to 12): arr[12] = 9.

Total: 1 + 43 + 9 = 53. We touched 1 + 2 + 1 = 4 cells instead of 10. With N = 16 and √N = 4, the savings are visible; with N = 10⁵, the savings are dramatic — 316 cells per query versus 100,000.

When to use sqrt decomposition

  • Quick implementations under contest time pressure — you can write it correctly in 5 minutes.
  • Operations that don't compose nicely into a segment-tree node — distinct counts, mode, range gcd with frequencies, k-th smallest in a range.
  • Offline batch queries via Mo's algorithm — sort queries by (left-block, right) and amortise total work.
  • Cache-friendly workloads — 316-element blocks fit comfortably in L1, beating segment-tree pointer chasing.
  • Range updates where lazy propagation is overkill — keep a per-block delta, push when needed.

Reach for a segment tree or Fenwick tree when N exceeds 10⁶ or when the query rate is so high that √N starts to dominate.

Sqrt decomposition vs Fenwick tree vs segment tree

Sqrt-decompFenwick treeSegment tree
Point updateO(1)O(log n)O(log n)
Range queryO(√n)O(log n)O(log n)
Range updateO(√n) (lazy)O(log n) (BIT2 trick, sum only)O(log n) (lazy)
Code length~25 lines~12 lines~40 lines
Supports min/max/gcdYesNo (needs inverse)Yes
Supports distinct/modeYes (frequency block)NoHard (heavy node state)
Constant factorSmallest (cache-friendly)Very small2–3× larger
PersistenceEasy (block snapshots)HardEasy (path-copy)

The trade-off is asymptotic versus practical: √N is worse than log N on paper, but the constant factor and code simplicity often make sqrt decomposition the right tool for medium-N problems.

Pseudo-code

B = ceil(sqrt(N))
blocks = array of ceil(N / B) sums, computed on build

function update(i, value):
    blocks[i / B] += value - arr[i]
    arr[i] = value

function query(l, r):
    s = 0
    blockL = l / B
    blockR = r / B
    if blockL == blockR:
        for i in l..r: s += arr[i]
        return s
    for i in l..(blockL + 1) * B - 1: s += arr[i]
    for b in (blockL + 1)..(blockR - 1): s += blocks[b]
    for i in blockR * B..r: s += arr[i]
    return s

JavaScript implementation

class SqrtDecomposition {
  constructor(arr) {
    this.n = arr.length;
    this.b = Math.max(1, Math.ceil(Math.sqrt(this.n)));
    this.arr = Int32Array.from(arr);
    const numBlocks = Math.ceil(this.n / this.b);
    this.blocks = new Int32Array(numBlocks);
    for (let i = 0; i < this.n; i++) this.blocks[Math.floor(i / this.b)] += this.arr[i];
  }

  update(i, value) {
    const block = Math.floor(i / this.b);
    this.blocks[block] += value - this.arr[i];
    this.arr[i] = value;
  }

  query(l, r) {
    let s = 0;
    const blockL = Math.floor(l / this.b);
    const blockR = Math.floor(r / this.b);
    if (blockL === blockR) {
      for (let i = l; i <= r; i++) s += this.arr[i];
      return s;
    }
    for (let i = l; i < (blockL + 1) * this.b; i++) s += this.arr[i];
    for (let b = blockL + 1; b < blockR; b++) s += this.blocks[b];
    for (let i = blockR * this.b; i <= r; i++) s += this.arr[i];
    return s;
  }
}

const sd = new SqrtDecomposition([3,1,4,1,5,9,2,6,5,3,5,8,9,7,9,3]);
console.log(sd.query(3, 12));  // 53
sd.update(7, 0);
console.log(sd.query(3, 12));  // 53 - 6 = 47

Python implementation — distinct counts with frequency blocks

One reason to learn sqrt decomposition is that it supports operations segment trees can't. Counting distinct elements in a range with point updates is one such problem: each block stores a hashmap of value → count, and the answer is the size of the union of block hashmaps (plus partial-block scans).

import math
from collections import Counter, defaultdict

class DistinctSqrt:
    def __init__(self, arr):
        self.n = len(arr)
        self.b = max(1, int(math.sqrt(self.n)))
        self.arr = list(arr)
        self.freq = []  # one Counter per block
        for i in range(0, self.n, self.b):
            self.freq.append(Counter(arr[i:i + self.b]))

    def update(self, i, value):
        block = i // self.b
        old = self.arr[i]
        self.freq[block][old] -= 1
        if self.freq[block][old] == 0: del self.freq[block][old]
        self.freq[block][value] += 1
        self.arr[i] = value

    def distinct(self, l, r):
        block_l = l // self.b
        block_r = r // self.b
        seen = defaultdict(int)
        if block_l == block_r:
            for v in self.arr[l:r + 1]: seen[v] += 1
            return len(seen)
        for v in self.arr[l:(block_l + 1) * self.b]: seen[v] += 1
        for b in range(block_l + 1, block_r):
            for v in self.freq[b]: seen[v] += 1
        for v in self.arr[block_r * self.b:r + 1]: seen[v] += 1
        return len(seen)

ds = DistinctSqrt([1, 2, 1, 3, 2, 4, 1, 5, 1])
print(ds.distinct(0, 8))   # 5
print(ds.distinct(2, 6))   # 4

The total work is O(√N) per query — exactly the same shape as the sum version. The cost of carrying a Counter per block is O(N) extra memory, but you get an operation no segment tree handles cleanly.

Variants

Range update with lazy add

Store a add[block] per block. A range update touches at most two partial blocks (cell-by-cell apply) and N/B full blocks (bump add[block]). Range queries read both arr[i] + add[block(i)]. All operations stay O(√N).

Mo's algorithm

Offline queries sorted by (block of L, R). When you move the active window, each pointer travels O(N · √Q) total across all queries — sub-O(N) per query amortised. Used for range mode, range distinct, range frequency-gcd. Sqrt decomposition is the block-size choice that makes the sort work.

Persistent sqrt decomposition

Each update snapshots the affected block plus the changed cell — O(√N) memory per version. Useful when you need "what was the sum in version k" alongside live mutation.

Adaptive block sizes

If updates dominate queries 100:1, pick a smaller B so update work shrinks. Theoretical sweet spot is B = √(query_count / update_count · N). Most contest problems just use ⌈√N⌉ and accept the constant-factor loss.

Common bugs and edge cases

  • Single-block ranges. When blockL == blockR the three-piece loop falls apart — the "full blocks" middle is empty and the two partials overlap. Always handle this case first with a direct cell scan from l to r.
  • Block index off-by-one. The right partial starts at blockR * B, not blockR * B + 1. The full-blocks loop goes from blockL + 1 to blockR − 1 inclusive, not blockR.
  • Last-block underflow. If N isn't divisible by B, the last block is short. Bounds-check the partial-right loop with min((blockR+1)*B, N) to avoid reading past the array.
  • Updates that forget to fix the block aggregate. Mutating arr[i] without updating blocks[i/B] produces a structure that silently drifts. Use the delta form blocks[i/B] += new - old and write to arr[i] afterward.
  • Wrong B for the workload. Distinct-count queries often have answer counts proportional to block scans, not block count. Profile and tune. The default ⌈√N⌉ is fine for sum, suboptimal for mode or distinct.
  • Floating-point sqrt. Computing B via Math.sqrt in JavaScript can give one-off values like 9 when you wanted 10. Always wrap with Math.max(1, Math.ceil(Math.sqrt(N))).

Frequently asked questions

Why √N as the block size?

Range queries touch up to two partial blocks (left edge, right edge) and a span of full blocks. The partial-block work is O(block_size); the full-block work is O(N / block_size). Setting these equal gives block_size = √N, the value that minimises total per-query work. For N = 10⁵ that's 316 blocks of size 316.

Is sqrt decomposition slower than a segment tree?

Asymptotically yes — O(√N) versus O(log N). For N = 10⁵, √N ≈ 316 versus log₂N ≈ 17 — roughly 19× more cells per query. But sqrt-decomp has smaller constants, is cache-friendlier, and is shorter to write. For N ≤ 10⁵ with 10⁶ queries, both fit in a second in C++.

When does sqrt decomposition beat segment trees?

When the operation doesn't compose well — counting distinct elements in a range, finding the mode, k-th smallest of a range with updates. These problems don't have an associative merge that fits a segment tree node; storing a frequency table per block does. The Mo's algorithm framework is built on this insight.

What is Mo's algorithm?

An offline technique: sort range queries by (block of left endpoint, right endpoint), then sweep a moving window. Each pointer moves O(N√N) total time across all queries. Used for problems with no easy online query — distinct counts, range mode, range gcd of frequencies. Sqrt decomposition is the engine.

How do you handle updates with sqrt decomposition?

Point update: O(1) — change the cell, change its block's aggregate. Range update with lazy add: store a delta-per-block, apply on query. Range update with arbitrary monoid: O(√N) — flush lazy to affected partials, mark full blocks. The structure is forgiving — almost any operation has a workable variant.

What size of N is the sweet spot?

N around 10⁴ to 10⁵ with 10⁵ to 10⁶ queries. Below 10³ even naive O(N) per query fits; above 10⁶ the √N factor (≈ 1000) blows the time budget. In that window sqrt decomposition is the sweet spot of simplicity and speed.