Data Structures
Spatial Hashing
Stop comparing everything to everything — only check what's nearby
Spatial hashing buckets objects into a uniform grid by hashing each object's cell coordinates, so collision and neighbor queries only compare objects in the same or adjacent cells — turning an O(n²) all-pairs scan into a near O(n) sweep.
- Build / rebuildO(n)
- Neighbor queryO(1) expected
- All-pairs passO(n) at bounded density
- Worst case (one cell)O(n²)
- Ideal cell size≈ largest object diameter
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How spatial hashing works
Say you have 10,000 particles bouncing around and you need to know which ones touch. The honest answer is to compare every pair — but that's 10000 × 9999 / 2 ≈ 50 million distance checks per frame, and at 60 frames a second your CPU is doing 3 billion comparisons just for collision detection. The world is mostly empty space, yet you keep asking whether the particle in the top-left corner is touching the one in the bottom-right.
Spatial hashing throws out all those impossible pairs cheaply. Overlay an imaginary uniform grid on the world. Every object falls into exactly one cell, found by integer division of its position:
cx = floor(x / cellSize)
cz = floor(z / cellSize)
File each object into a bucket keyed by its cell. The "hash" in the name is literal: instead of allocating a giant 2D array sized to the whole world, you hash the integer pair (cx, cz) into a hash table, so only non-empty cells consume memory. Building the index is one linear pass — hash each object's cell, append it to that bucket's list.
Now the query is local. To find everything that might collide with object A, compute A's cell and look only at that cell plus its 8 neighbors (26 neighbors in 3D). If the cell size is at least the diameter of the largest object, two objects can only overlap when they share a cell or are one cell apart — so this 3×3 window provably contains every candidate. The far side of the grid is never touched.
The result: each object compares against a small, roughly constant number of neighbors instead of all n − 1 of them. The 50-million-check nightmare collapses to a few hundred thousand checks, and the empty regions of the world cost nothing.
When to use spatial hashing
- Real-time physics broad phase — finding collision candidates among thousands of similar-sized rigid bodies or particles that move every frame.
- Boids, flocking, and SPH fluids — each agent needs its neighbors within a fixed radius; the grid hands them over in O(1).
- Uniform object sizes and roughly even spread — the grid's fixed resolution is a perfect fit when objects don't vary wildly in scale.
- Unbounded or huge worlds — hashing the cell coordinate (rather than indexing a dense array) keeps memory proportional to occupied cells, not world size.
Reach for a different structure when objects vary enormously in size (a BVH or loose octree adapts to scale), when the distribution is extremely sparse or clustered (a quadtree subdivides only where needed), or when the data is static and you query it many times without rebuilding (sort-based sweep-and-prune or a built tree amortizes better).
Spatial hashing vs other broad-phase structures
| Spatial hash grid | Quadtree / Octree | BVH | Sweep and prune | k-d tree | |
|---|---|---|---|---|---|
| Build / rebuild | O(n) | O(n log n) | O(n log n) | O(n log n) sort, O(n) resort | O(n log n) |
| Neighbor query | O(1) expected | O(log n) | O(log n) | O(n) worst, near O(1) if sorted | O(log n) |
| Handles size variance | Poorly (one fixed cell) | Well (subdivides) | Very well | Poorly | Moderately |
| Handles non-uniform density | Poorly (hot cells) | Well | Well | Moderately | Well |
| Cache locality | Excellent (flat arrays) | Poor (pointer chasing) | Poor | Excellent | Poor |
| Memory overhead | Hash table + bucket lists | Tree nodes per region | 2n − 1 nodes | 2 endpoint arrays per axis | n internal nodes |
| Best for | Many uniform, moving objects | Sparse / clustered scenes | Varied sizes, ray queries | Few axes of motion, 1D coherence | Nearest-neighbor on points |
The headline trade-off: spatial hashing buys you the cheapest rebuild and the best cache behavior, at the cost of being brittle when object sizes or densities are uneven. A fixed grid has no opinion about where the action is — it spends the same effort on empty cells as on crowded ones — which is exactly why it shines when the action is everywhere and nowhere in particular.
What the numbers actually say
- 10,000 objects, all-pairs vs hash grid. Brute force is ≈ 50 million pair tests per frame. With a grid sized to object diameter and ~1 object per cell, each object probes 9 cells holding on average ~9 objects, so ≈ 10,000 × 9 ≈ 90,000 tests — a ~550× reduction. At ~3 ns per distance check that's the difference between 150 ms and 0.27 ms per frame.
- Build cost is one linear pass. Hashing a cell coordinate and appending to a bucket is a handful of integer ops — for 10,000 objects that's on the order of 50 µs, negligible next to the query savings.
- Memory scales with occupancy, not world size. A 4096 × 4096 cell world is 16.7 million cells; a dense array of empty buckets would burn 134 MB at 8 bytes each. A hash table storing only the ~10,000 occupied cells uses well under 1 MB.
- Cell size is the whole game. Too small (¼ object diameter) and one object spans 16 cells, so you insert it 16 times and probe a larger window to catch neighbors a full diameter away. Too large (4× diameter) and 16× more objects share each bucket, multiplying pair tests 16-fold. The sweet spot — cell ≈ largest object diameter — is the only setting that keeps both insert and query cheap.
JavaScript implementation
A flat hash grid for 2D circles. The key trick is hashing the integer cell pair into a single string (or a packed integer) so the bucket map stays sparse.
class SpatialHash {
constructor(cellSize) {
this.cellSize = cellSize;
this.buckets = new Map(); // "cx,cz" -> array of objects
}
_key(cx, cz) { return cx + ',' + cz; }
_cell(x, z) { return [Math.floor(x / this.cellSize), Math.floor(z / this.cellSize)]; }
clear() { this.buckets.clear(); }
insert(obj) {
const [cx, cz] = this._cell(obj.x, obj.z);
const k = this._key(cx, cz);
let bucket = this.buckets.get(k);
if (!bucket) this.buckets.set(k, bucket = []);
bucket.push(obj);
}
// every object in the 3x3 neighborhood around (x, z)
*near(x, z) {
const [cx, cz] = this._cell(x, z);
for (let dx = -1; dx <= 1; dx++)
for (let dz = -1; dz <= 1; dz++) {
const bucket = this.buckets.get(this._key(cx + dx, cz + dz));
if (bucket) yield* bucket;
}
}
}
// Broad phase: report all overlapping pairs in O(n) at bounded density.
function findCollisions(objects, cellSize) {
const grid = new SpatialHash(cellSize);
for (const o of objects) grid.insert(o);
const hits = [];
for (const a of objects) {
for (const b of grid.near(a.x, a.z)) {
if (a.id >= b.id) continue; // dedupe: test each pair once
const dx = a.x - b.x, dz = a.z - b.z;
const r = a.radius + b.radius;
if (dx * dx + dz * dz <= r * r) hits.push([a, b]);
}
}
return hits;
}
Two details carry the performance. First, the a.id >= b.id guard dedupes pairs — without it every collision is found twice (once from each side) and you do double the comparisons. Second, set cellSize to the largest object's diameter; the 3×3 probe is only correct if no object can reach beyond an adjacent cell.
Python implementation
from collections import defaultdict
from math import floor
class SpatialHash:
def __init__(self, cell_size):
self.cell_size = cell_size
self.buckets = defaultdict(list)
def _cell(self, x, z):
return (floor(x / self.cell_size), floor(z / self.cell_size))
def clear(self):
self.buckets.clear()
def insert(self, obj):
self.buckets[self._cell(obj.x, obj.z)].append(obj)
def near(self, x, z):
cx, cz = self._cell(x, z)
for dx in (-1, 0, 1):
for dz in (-1, 0, 1):
yield from self.buckets.get((cx + dx, cz + dz), ())
def find_collisions(objects, cell_size):
grid = SpatialHash(cell_size)
for o in objects:
grid.insert(o)
hits = []
for a in objects:
for b in grid.near(a.x, a.z):
if a.id >= b.id: # test each pair once
continue
dx, dz = a.x - b.x, a.z - b.z
r = a.radius + b.radius
if dx * dx + dz * dz <= r * r:
hits.append((a, b))
return hits
# Fixed-radius nearest neighbors — the boids/flocking query.
def neighbors_within(grid, x, z, radius):
out = []
for o in grid.near(x, z):
dx, dz = o.x - x, o.z - z
if dx * dx + dz * dz <= radius * radius:
out.append(o)
return out
Using defaultdict(list) as the bucket store means an empty world allocates nothing — the table grows only as cells get occupied. Note the squared-distance comparison: never call sqrt in the inner loop when comparing against a threshold, since squaring the threshold gives the same answer for free.
Variants worth knowing
Hash grid vs dense array grid. If the world is bounded and small, skip the hash and use a flat 2D array indexed by cz * width + cx. Array indexing beats hashing on speed and cache locality. The hash version only earns its keep for large or unbounded worlds where a dense array would waste memory on empty cells.
Hierarchical / multi-resolution grids. When objects span very different sizes, a single cell size can't fit all of them. Keep several grids at doubling resolutions and insert each object into the level whose cell is closest to its size — the idea behind a hierarchical spatial hash and loose octrees. Big objects live in coarse grids, small ones in fine grids.
Spatial hashing as an infinite grid (Teschner's hash). A well-known trick hashes the cell with (cx·p1 XOR cz·p2 XOR cy·p3) mod tableSize using large primes, mapping an unbounded integer grid into a fixed-size table. Distinct cells can collide in the table, so you still verify the actual cell on lookup, but it bounds memory to a constant.
Z-order / Morton-coded cells. Instead of a string key, interleave the bits of cx and cz into a single Morton code. Nearby cells get nearby codes, which improves cache behavior and lets you sort objects into a contiguous, GPU-friendly array — the basis of GPU particle systems.
Incremental update vs full rebuild. If most objects are stationary, remove and reinsert only the ones that crossed a cell boundary this frame. If everything moves, clearing the whole table and rebuilding from scratch is usually faster — one tight O(n) loop with no per-object bookkeeping.
Common bugs and edge cases
- Cell size mismatched to object size. If an object is larger than a cell it can overlap something two cells away, and the 3×3 probe misses it — silent missed collisions. Size cells to the largest object, or insert large objects into every cell they touch.
- Forgetting to dedupe pairs. Without the
a.id < b.idguard you test and resolve each collision twice, doubling work and sometimes double-applying impulses in a physics step. - Negative coordinates and
floor. UseMath.floor, not integer truncation or| 0— truncation rounds toward zero, so positions of−0.5and+0.5both map to cell 0 and collisions across the origin break. - Not clearing the grid between frames. Reusing buckets without clearing accumulates stale entries; objects appear to collide with their own ghosts from previous frames.
- The clustering worst case. If your objects pile into one cell (a settled heap of particles, a tower of boxes), that bucket holds them all and the inner loop is O(n²) again. Watch for pathological scenes and consider a larger cell or a sub-grid for hot cells.
- Hash collisions in the unbounded variant. When two distinct cells map to the same table slot, you must compare the stored cell coordinate, not just the slot — otherwise you'll test objects that are actually far apart.
Frequently asked questions
What cell size should a spatial hash use?
Set the cell size to roughly the diameter of your largest moving object — about twice its radius. Then any two objects that overlap must share a cell or sit in an adjacent one, so a 3×3 (or 3×3×3 in 3D) neighborhood probe is guaranteed to catch every collision. Cells much smaller than that object span many buckets and waste work; cells much larger pack too many objects into one bucket and drift back toward O(n²).
Is spatial hashing faster than a quadtree or BVH?
For uniformly distributed objects of similar size that move every frame, yes — spatial hashing rebuilds in O(n) with no pointer chasing and queries hit a fixed handful of cells. Quadtrees and BVHs win when the distribution is wildly non-uniform or sparse, because a fixed grid either wastes memory on empty cells or piles everything into one. Most real-time physics engines use a hash grid for the broad phase precisely because objects are roughly uniform.
Why hash the cell instead of using a 2D array?
A dense 2D array needs memory proportional to the bounding volume, which explodes for large or unbounded worlds — a 10,000 × 10,000 cell world is 100 million slots even if only 500 are occupied. Hashing the (cx, cz) coordinate into a hash table allocates buckets only for cells that actually contain objects, so memory scales with the number of objects, not the size of the world.
How does spatial hashing avoid reporting the same collision twice?
When you probe a 3×3 neighborhood for object A, you'll find object B; later when you probe B's neighborhood you'll find A again. Deduplicate by only testing pairs where A's id is less than B's id, or by maintaining a per-frame set of tested pairs. Without it you do double the work and may resolve each collision twice.
What happens when all objects cluster into one cell?
If every object falls into a single bucket — a tight pile of particles, a stacked tower — that bucket holds all n objects and testing within it is O(n²) again. Spatial hashing only gives O(n) when density per cell is bounded by a constant. Pathological clustering is the worst case; mitigations include a larger cell size, a secondary structure for hot cells, or a hierarchical grid.
Should I rebuild the grid every frame or update it incrementally?
For thousands of fast-moving objects, clearing and rebuilding from scratch each frame is usually faster and simpler — it's a single O(n) pass with great cache behavior. Incremental updates (removing an object from its old cell and reinserting into the new one) only pay off when most objects are stationary or move slowly, so few of them actually change cells per frame.