Data Structures
Sqrt Decomposition
Split the array into √N blocks — queries cost √N, updates cost one
Sqrt decomposition splits an array into √N blocks and stores aggregate-per-block. Updates are O(1), range queries are O(√N), and the code fits in 30 lines. It is the simplest non-trivial range-query structure.
- Point updateO(1)
- Range queryO(√N)
- Range updateO(√N) (lazy add)
- BuildO(N)
- MemoryN + √N integers
- N = 10⁵316 blocks of 316
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How sqrt decomposition works
You have an array of N numbers, point updates, and range-sum queries. Two naive solutions sit at opposite ends:
- Plain array: update O(1), range query O(N).
- Maintained prefix-sum array: range query O(1), update O(N).
Sqrt decomposition meets in the middle. Pick a block size B ≈ √N. Slice the array into ⌈N/B⌉ contiguous blocks. For every block, store the precomputed aggregate (sum, in this case). The array itself stays intact.
Point update: change one cell, fix that cell's block aggregate. Two memory writes — O(1).
Range query over [l, r]: walk through three pieces.
- The partial left block — cells from
lto the end of its block, summed cell-by-cell. At most B cells. - The span of full blocks — every block fully covered by
[l, r]. Read each block's aggregate directly. At most N/B blocks. - The partial right block — cells from the start of
r's block tor. At most B cells.
Total work: 2B + N/B cells. Setting B = √N minimises this to 2√N + √N = 3√N — that's where the structure's name comes from.
Worked example
Take 16 elements and B = 4, so 4 blocks of 4:
arr = [3, 1, 4, 1 | 5, 9, 2, 6 | 5, 3, 5, 8 | 9, 7, 9, 3]
block = [ 9 | 22 | 21 | 28 ]
Range query [3, 12] (0-indexed, inclusive on both ends):
- Partial left block (block 0 from index 3):
arr[3] = 1. - Full blocks: blocks 1 and 2 →
22 + 21 = 43. - Partial right block (block 3 from index 12 to 12):
arr[12] = 9.
Total: 1 + 43 + 9 = 53. We touched 1 + 2 + 1 = 4 cells instead of 10. With N = 16 and √N = 4, the savings are visible; with N = 10⁵, the savings are dramatic — 316 cells per query versus 100,000.
When to use sqrt decomposition
- Quick implementations under contest time pressure — you can write it correctly in 5 minutes.
- Operations that don't compose nicely into a segment-tree node — distinct counts, mode, range gcd with frequencies, k-th smallest in a range.
- Offline batch queries via Mo's algorithm — sort queries by (left-block, right) and amortise total work.
- Cache-friendly workloads — 316-element blocks fit comfortably in L1, beating segment-tree pointer chasing.
- Range updates where lazy propagation is overkill — keep a per-block delta, push when needed.
Reach for a segment tree or Fenwick tree when N exceeds 10⁶ or when the query rate is so high that √N starts to dominate.
Sqrt decomposition vs Fenwick tree vs segment tree
| Sqrt-decomp | Fenwick tree | Segment tree | |
|---|---|---|---|
| Point update | O(1) | O(log n) | O(log n) |
| Range query | O(√n) | O(log n) | O(log n) |
| Range update | O(√n) (lazy) | O(log n) (BIT2 trick, sum only) | O(log n) (lazy) |
| Code length | ~25 lines | ~12 lines | ~40 lines |
| Supports min/max/gcd | Yes | No (needs inverse) | Yes |
| Supports distinct/mode | Yes (frequency block) | No | Hard (heavy node state) |
| Constant factor | Smallest (cache-friendly) | Very small | 2–3× larger |
| Persistence | Easy (block snapshots) | Hard | Easy (path-copy) |
The trade-off is asymptotic versus practical: √N is worse than log N on paper, but the constant factor and code simplicity often make sqrt decomposition the right tool for medium-N problems.
Pseudo-code
B = ceil(sqrt(N))
blocks = array of ceil(N / B) sums, computed on build
function update(i, value):
blocks[i / B] += value - arr[i]
arr[i] = value
function query(l, r):
s = 0
blockL = l / B
blockR = r / B
if blockL == blockR:
for i in l..r: s += arr[i]
return s
for i in l..(blockL + 1) * B - 1: s += arr[i]
for b in (blockL + 1)..(blockR - 1): s += blocks[b]
for i in blockR * B..r: s += arr[i]
return s
JavaScript implementation
class SqrtDecomposition {
constructor(arr) {
this.n = arr.length;
this.b = Math.max(1, Math.ceil(Math.sqrt(this.n)));
this.arr = Int32Array.from(arr);
const numBlocks = Math.ceil(this.n / this.b);
this.blocks = new Int32Array(numBlocks);
for (let i = 0; i < this.n; i++) this.blocks[Math.floor(i / this.b)] += this.arr[i];
}
update(i, value) {
const block = Math.floor(i / this.b);
this.blocks[block] += value - this.arr[i];
this.arr[i] = value;
}
query(l, r) {
let s = 0;
const blockL = Math.floor(l / this.b);
const blockR = Math.floor(r / this.b);
if (blockL === blockR) {
for (let i = l; i <= r; i++) s += this.arr[i];
return s;
}
for (let i = l; i < (blockL + 1) * this.b; i++) s += this.arr[i];
for (let b = blockL + 1; b < blockR; b++) s += this.blocks[b];
for (let i = blockR * this.b; i <= r; i++) s += this.arr[i];
return s;
}
}
const sd = new SqrtDecomposition([3,1,4,1,5,9,2,6,5,3,5,8,9,7,9,3]);
console.log(sd.query(3, 12)); // 53
sd.update(7, 0);
console.log(sd.query(3, 12)); // 53 - 6 = 47
Python implementation — distinct counts with frequency blocks
One reason to learn sqrt decomposition is that it supports operations segment trees can't. Counting distinct elements in a range with point updates is one such problem: each block stores a hashmap of value → count, and the answer is the size of the union of block hashmaps (plus partial-block scans).
import math
from collections import Counter, defaultdict
class DistinctSqrt:
def __init__(self, arr):
self.n = len(arr)
self.b = max(1, int(math.sqrt(self.n)))
self.arr = list(arr)
self.freq = [] # one Counter per block
for i in range(0, self.n, self.b):
self.freq.append(Counter(arr[i:i + self.b]))
def update(self, i, value):
block = i // self.b
old = self.arr[i]
self.freq[block][old] -= 1
if self.freq[block][old] == 0: del self.freq[block][old]
self.freq[block][value] += 1
self.arr[i] = value
def distinct(self, l, r):
block_l = l // self.b
block_r = r // self.b
seen = defaultdict(int)
if block_l == block_r:
for v in self.arr[l:r + 1]: seen[v] += 1
return len(seen)
for v in self.arr[l:(block_l + 1) * self.b]: seen[v] += 1
for b in range(block_l + 1, block_r):
for v in self.freq[b]: seen[v] += 1
for v in self.arr[block_r * self.b:r + 1]: seen[v] += 1
return len(seen)
ds = DistinctSqrt([1, 2, 1, 3, 2, 4, 1, 5, 1])
print(ds.distinct(0, 8)) # 5
print(ds.distinct(2, 6)) # 4
The total work is O(√N) per query — exactly the same shape as the sum version. The cost of carrying a Counter per block is O(N) extra memory, but you get an operation no segment tree handles cleanly.
Variants
Range update with lazy add
Store a add[block] per block. A range update touches at most two partial blocks (cell-by-cell apply) and N/B full blocks (bump add[block]). Range queries read both arr[i] + add[block(i)]. All operations stay O(√N).
Mo's algorithm
Offline queries sorted by (block of L, R). When you move the active window, each pointer travels O(N · √Q) total across all queries — sub-O(N) per query amortised. Used for range mode, range distinct, range frequency-gcd. Sqrt decomposition is the block-size choice that makes the sort work.
Persistent sqrt decomposition
Each update snapshots the affected block plus the changed cell — O(√N) memory per version. Useful when you need "what was the sum in version k" alongside live mutation.
Adaptive block sizes
If updates dominate queries 100:1, pick a smaller B so update work shrinks. Theoretical sweet spot is B = √(query_count / update_count · N). Most contest problems just use ⌈√N⌉ and accept the constant-factor loss.
Common bugs and edge cases
- Single-block ranges. When
blockL == blockRthe three-piece loop falls apart — the "full blocks" middle is empty and the two partials overlap. Always handle this case first with a direct cell scan fromltor. - Block index off-by-one. The right partial starts at
blockR * B, notblockR * B + 1. The full-blocks loop goes fromblockL + 1toblockR − 1inclusive, notblockR. - Last-block underflow. If N isn't divisible by B, the last block is short. Bounds-check the partial-right loop with
min((blockR+1)*B, N)to avoid reading past the array. - Updates that forget to fix the block aggregate. Mutating
arr[i]without updatingblocks[i/B]produces a structure that silently drifts. Use the delta formblocks[i/B] += new - oldand write toarr[i]afterward. - Wrong B for the workload. Distinct-count queries often have answer counts proportional to block scans, not block count. Profile and tune. The default ⌈√N⌉ is fine for sum, suboptimal for mode or distinct.
- Floating-point sqrt. Computing B via
Math.sqrtin JavaScript can give one-off values like 9 when you wanted 10. Always wrap withMath.max(1, Math.ceil(Math.sqrt(N))).
Frequently asked questions
Why √N as the block size?
Range queries touch up to two partial blocks (left edge, right edge) and a span of full blocks. The partial-block work is O(block_size); the full-block work is O(N / block_size). Setting these equal gives block_size = √N, the value that minimises total per-query work. For N = 10⁵ that's 316 blocks of size 316.
Is sqrt decomposition slower than a segment tree?
Asymptotically yes — O(√N) versus O(log N). For N = 10⁵, √N ≈ 316 versus log₂N ≈ 17 — roughly 19× more cells per query. But sqrt-decomp has smaller constants, is cache-friendlier, and is shorter to write. For N ≤ 10⁵ with 10⁶ queries, both fit in a second in C++.
When does sqrt decomposition beat segment trees?
When the operation doesn't compose well — counting distinct elements in a range, finding the mode, k-th smallest of a range with updates. These problems don't have an associative merge that fits a segment tree node; storing a frequency table per block does. The Mo's algorithm framework is built on this insight.
What is Mo's algorithm?
An offline technique: sort range queries by (block of left endpoint, right endpoint), then sweep a moving window. Each pointer moves O(N√N) total time across all queries. Used for problems with no easy online query — distinct counts, range mode, range gcd of frequencies. Sqrt decomposition is the engine.
How do you handle updates with sqrt decomposition?
Point update: O(1) — change the cell, change its block's aggregate. Range update with lazy add: store a delta-per-block, apply on query. Range update with arbitrary monoid: O(√N) — flush lazy to affected partials, mark full blocks. The structure is forgiving — almost any operation has a workable variant.
What size of N is the sweet spot?
N around 10⁴ to 10⁵ with 10⁵ to 10⁶ queries. Below 10³ even naive O(N) per query fits; above 10⁶ the √N factor (≈ 1000) blows the time budget. In that window sqrt decomposition is the sweet spot of simplicity and speed.