Data Structures

Consistent Hashing

Add or remove a server, move only 1/N of the keys

Consistent hashing maps both keys and servers onto a ring so that adding or removing a server moves only 1/N of the keys, not all of them. It's the technique behind every modern distributed cache, sharded database, and CDN edge.

  • LookupO(log N) ring search
  • Add / remove nodeMoves ~K/N keys
  • Virtual nodes (typical)100–500 per server
  • Load std-dev (v=100)≈ ±10%
  • Memory per nodeO(v) ring entries

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why modular hashing breaks at scale

The naive way to shard data across servers is server = hash(key) % N. Pick N=4 servers, hash a key to some integer, take it mod 4, route there. Beautifully simple — and a disaster the first time you scale up.

If you grow from 4 to 5 servers, every key gets remapped: previously hash(k) % 4 = 2, now hash(k) % 5 = 4. Roughly 4/5 of all keys land on a different server. For a cache, that's a global cache miss event. For a database, that's a 4-server-to-5-server data migration. For a CDN, that's every customer suddenly hitting a cold edge.

Consistent hashing was introduced by Karger et al. in 1997 to solve exactly this. The idea: map both keys and servers into the same hash space (say 0 to 2³²−1), arranged conceptually as a circle. Each key is owned by the next server clockwise from its hash position. Adding a server inserts one new "owner" arc; only keys that fall into that arc move. Removing a server hands its arc to the next server clockwise.

Hash ring (32-bit space wrapped into a circle):

         0
         ┊
  S₃ ←──┘├──→ S₁         keys hash to a point on the circle
       k₁ │              and "walk clockwise" to find their server
          │
          ├──→ S₂
          │
          ┊
        k₂│
          │
       S₂ │ ← removing S₂ reassigns only k₂'s arc to S₁; everything else stays put
          ┊
       2³²−1

Virtual nodes — why 100 is better than 1

One position per server gives terrible load balance. With 4 servers placed randomly on a 4-billion-position ring, one server can easily own 40% of the space and another 10%. The fix is to place each server many times (typically 100 to 500), each at a different hashed position. Now the law of large numbers smooths out the arc lengths.

The standard deviation of load per server scales as 1/√v where v is the number of virtual nodes. So:

Virtual nodes per serverStd-dev of load
1~100% (catastrophic)
10~32%
100~10%
500~4.5%
1000~3.2%

Virtual nodes also let you weight servers asymmetrically — a beefier machine gets more virtual nodes. And they decouple the number of replicas from the number of machines, which is essential for fault-tolerant designs that walk the ring forward to find N distinct physical servers for replication.

When to reach for consistent hashing

  • Distributed caches — memcached pools, Redis cluster proxies (Twemproxy, Codis).
  • Sharded databases — Cassandra, DynamoDB, Riak partition by ring position.
  • CDN edge selection — Akamai-style routing of customer requests to consistent edges.
  • Load balancers with affinity — Maglev, Envoy's ring-hash policy keep TCP/HTTP sessions sticky to a backend even as the backend pool changes.
  • Distributed task queues — pin a task ID to a worker so retries land on the same machine.

Consistent vs modular vs rendezvous hashing

Consistent (ring)Modular (key % N)Rendezvous (HRW)Jump hashMaglevMulti-probe
Lookup timeO(log N · v)O(1)O(N)O(log N)O(1) table lookupO(k) probes
MemoryO(N · v)O(N)O(N)O(1)O(M) lookup table (M ≫ N)O(N)
Keys moved on add/remove1/N (with v=∞)~all1/N exactly1/N (only last bucket removable)~1/N1/N
Load balance1/√v deviationPerfect (until reshuffle)ExcellentExcellentTunable via table sizeExcellent
Weighted nodesYes (more vnodes)NoYes (HRW with weights)NoYes (offsets)Limited
Arbitrary node removalYesYes (catastrophic)YesNoYesYes
Used byCassandra, Dynamo, memcachedToy systems, naive shardsHDFS, Ceph CRUSH, Riak CoreGoogle internal, gRPCGoogle L4 LBDiscord, some load balancers

The honest summary: consistent hashing won the early distributed-systems era because virtual nodes are simple and visualisable. Rendezvous hashing is mathematically cleaner and gives perfect 1/N migration with smaller memory but O(N) per lookup. Jump hash is the smallest and fastest but inflexible. Maglev is what Google's L4 load balancer actually runs because lookups must hit cache.

What the numbers actually say

  • Adding 1 node moves K/N keys, not K. If you go from 10 to 11 servers, ~9% of keys move — versus ~91% with modular hashing.
  • Hot keys still happen. Consistent hashing makes distribution uniform, but if one key gets a disproportionate fraction of requests (a celebrity post, a viral product), one server takes the heat. Bounded-load consistent hashing or per-key load shedding mitigates it.
  • Standard deviation drops as 1/√v. 100 virtual nodes per server gets you within ±10% load; the diminishing return is real beyond 500.
  • Memory cost is real on giant clusters. 1000 servers × 200 vnodes × ~32 bytes per ring entry ≈ 6.4 MB of ring metadata. Workable, but Maglev and jump hash exist because some teams want it tighter.

JavaScript implementation

// Consistent hash ring with virtual nodes.
// Uses 32-bit FNV-1a as the hash; a real implementation would use murmur3 or xxhash.

function fnv1a(str) {
  let h = 2166136261 >>> 0;
  for (let i = 0; i < str.length; i++) {
    h ^= str.charCodeAt(i);
    h = Math.imul(h, 16777619) >>> 0;
  }
  return h;
}

class ConsistentHash {
  constructor(vnodes = 150) {
    this.vnodes = vnodes;
    this.ring = [];                // sorted [hash, nodeId] pairs
  }

  addNode(nodeId) {
    for (let i = 0; i < this.vnodes; i++) {
      const h = fnv1a(`${nodeId}#${i}`);
      this._insert(h, nodeId);
    }
  }

  removeNode(nodeId) {
    this.ring = this.ring.filter(([, n]) => n !== nodeId);
  }

  _insert(h, nodeId) {
    let lo = 0, hi = this.ring.length;
    while (lo < hi) {
      const mid = (lo + hi) >>> 1;
      if (this.ring[mid][0] < h) lo = mid + 1;
      else hi = mid;
    }
    this.ring.splice(lo, 0, [h, nodeId]);
  }

  // Walk clockwise from hash(key) to the first ring entry; that's the owner.
  getNode(key) {
    if (this.ring.length === 0) return null;
    const h = fnv1a(key);
    let lo = 0, hi = this.ring.length - 1;
    while (lo <= hi) {
      const mid = (lo + hi) >>> 1;
      if (this.ring[mid][0] >= h) hi = mid - 1;
      else lo = mid + 1;
    }
    return this.ring[lo % this.ring.length][1];   // wrap around
  }

  // Get N distinct physical owners (for replication).
  getNodes(key, n) {
    const seen = new Set();
    const out  = [];
    if (this.ring.length === 0) return out;
    const h = fnv1a(key);
    let lo = 0, hi = this.ring.length - 1;
    while (lo <= hi) {
      const mid = (lo + hi) >>> 1;
      if (this.ring[mid][0] >= h) hi = mid - 1;
      else lo = mid + 1;
    }
    for (let i = 0; i < this.ring.length && out.length < n; i++) {
      const node = this.ring[(lo + i) % this.ring.length][1];
      if (!seen.has(node)) { seen.add(node); out.push(node); }
    }
    return out;
  }
}

Three details worth flagging. First, the ring is kept sorted so lookups are binary search; resorting on every insert would be O(N²). Second, getNodes walks forward past virtual nodes of the same physical server until it finds N distinct ones — that's how replication works on a Dynamo-style ring. Third, FNV-1a is fine for demos but production should use murmur3 or xxhash for better avalanche behaviour.

Python implementation — virtual nodes + jump hash + rendezvous

import bisect
import hashlib

def hash32(s):
    return int.from_bytes(hashlib.md5(s.encode()).digest()[:4], 'big')

class ConsistentHash:
    def __init__(self, vnodes=150):
        self.vnodes = vnodes
        self.ring = []           # sorted (hash, node) pairs

    def add_node(self, node):
        for i in range(self.vnodes):
            h = hash32(f"{node}#{i}")
            bisect.insort(self.ring, (h, node))

    def remove_node(self, node):
        self.ring = [(h, n) for h, n in self.ring if n != node]

    def get_node(self, key):
        if not self.ring: return None
        h = hash32(key)
        i = bisect.bisect_left(self.ring, (h,))
        return self.ring[i % len(self.ring)][1]

    def get_nodes(self, key, n):
        if not self.ring: return []
        h = hash32(key)
        i = bisect.bisect_left(self.ring, (h,))
        seen, out = set(), []
        for j in range(len(self.ring)):
            node = self.ring[(i + j) % len(self.ring)][1]
            if node not in seen:
                seen.add(node); out.append(node)
                if len(out) == n: return out
        return out


# Jump consistent hash (Lamping & Veach, Google 2014).
# Maps a 64-bit key to one of `num_buckets` buckets in O(log n) time, no memory.
def jump_hash(key, num_buckets):
    b, j = -1, 0
    while j < num_buckets:
        b = j
        key = (key * 2862933555777941757 + 1) & 0xFFFFFFFFFFFFFFFF
        j = int((b + 1) * (1 << 31) / ((key >> 33) + 1))
    return b


# Rendezvous (Highest Random Weight) hashing.
# O(N) per lookup but no virtual nodes needed; perfect 1/N migration.
def rendezvous(key, nodes):
    return max(nodes, key=lambda n: hash32(f"{key}|{n}"))

Three flavours, three tradeoffs. ConsistentHash is the workhorse — flexible, supports add/remove/weight. jump_hash is what you reach for when memory and latency are critical and your bucket count is monotonic (you add buckets but never remove arbitrary ones). rendezvous is what you pick when you want better load distribution than ring hashing and N is small enough that O(N) per lookup is fine.

Variants worth knowing

Jump consistent hash (Google 2014). Tiny algorithm, zero memory, fastest production option. Limitation: removing buckets requires renumbering, so it only handles append-style scaling.

Rendezvous hashing (HRW, Thaler & Ravishankar 1996). For each key, hash (key, server) for every server and pick the highest. O(N) per lookup but perfect 1/N migration. Used by Ceph CRUSH, HDFS replica placement, Riak Core.

Maglev hashing (Google 2016). Pre-computes a fixed-size lookup table (typically 65,537 entries) where each table slot maps to a backend. Lookup is one memory read. Designed for L4 load balancers where lookup must be cache-friendly. Used in Google's network LB and some Envoy builds.

Multi-probe consistent hashing (Discord 2017). Hash key with k different functions, pick the server whose ring distance is smallest. Achieves rendezvous-like balance at O(k) cost per lookup — typically k=21.

Bounded-load consistent hashing (Mirrokni et al. 2018). Caps each server at (1+ε) times average load. If the natural owner is full, walk forward to the next under-loaded server. Used inside Vimeo and some Google services.

Anchor hash (2018). Constant memory, O(1) lookups, supports arbitrary add/remove. Newer option seeing some adoption in custom load balancers.

Consistent hashing with weights. All ring schemes naturally support weighting by giving each server's virtual-node count proportional to its capacity. The simplest "more weight = more vnodes" approach works well when capacity ratios are within ~10×.

Common bugs and edge cases

  • Too few virtual nodes. 10 vnodes gives ±32% load deviation — practically random. 100–500 is the sane production range.
  • Bad hash function. A weak hash (CRC32, Java's hashCode) clusters virtual nodes and ruins load balance. Use murmur3, xxhash, or SipHash.
  • Forgetting to wrap. If hash(key) exceeds the largest ring position, you must wrap to the smallest. Off-by-one here causes intermittent miss-routes.
  • Replication walking ring positions, not physical nodes. Walking forward N positions can hit the same physical server through different virtual nodes. Production code skips duplicates explicitly.
  • Hot keys overwhelming one server. Consistent hashing balances keys, not traffic. A single celebrity user can hot-spot one shard. Use bounded-load variants or per-key sharding.
  • Adding nodes during a hot read window. Mid-rebalance, some keys map old, some map new. Most production systems pre-warm the new node's caches before flipping the routing.
  • Mixing client-side and server-side rings. Different clients with different node lists route inconsistently. Either ship the ring centrally (etcd, ZooKeeper) or use a deterministic algorithm seeded the same way everywhere.

Frequently asked questions

What problem does consistent hashing solve?

Modulo hashing (key % N) breaks catastrophically when N changes — adding or removing a server reshuffles nearly every key. Consistent hashing keeps the mapping stable: adding a node moves only the keys that fall into its new arc on the ring (≈1/N of all keys).

Why are virtual nodes needed?

Without virtual nodes, key distribution depends on lucky hash placement — one server can end up with twice the load of another. Each physical node is given many ring positions (typically 100–500), which evens out load and decouples replica count from physical machine count.

Is consistent hashing the same as rendezvous hashing?

No. Consistent hashing puts servers on a ring; rendezvous (highest-random-weight) computes hash(key, server) for every server and picks the highest. Rendezvous gives better load balance without virtual nodes, but each lookup is O(N) instead of O(log N).

What's jump hash and when do I use it?

Jump hash (Google, 2014) is a tiny algorithm that maps a 64-bit key to one of N buckets in O(log N) time and zero memory. It's faster and more uniform than ring hashing but doesn't support arbitrary node removal — only the rightmost bucket can be dropped.

How is consistent hashing used in real systems?

DynamoDB and Cassandra partition by ring position; Discord, Akamai, and Riak use it for sharding; memcached's libketama and Twemproxy use it for cache routing; load balancers like Maglev and Envoy use variants for connection affinity.

How many virtual nodes per physical node?

Typically 100–500 for production. With v vnodes per physical node, the standard deviation of load is roughly 1/√v. 100 vnodes → ±10% per server; 500 vnodes → ±4.5%. The cost is memory for the ring; adding a node is O(v log Nv) for v positions.