Sampling Algorithms

Reservoir Sampling

k uniformly random items from a stream of unknown length

Reservoir sampling picks k random samples from a stream of unknown length — each item ends with probability k/n. O(n) time, O(k) space. Vitter's classic Algorithm R (1985).

Time (Algorithm R)O(n) — one random per item
Time (Algorithm L)O(k (1 + log(n/k)))
SpaceO(k) — reservoir + counter
Probability of inclusionexactly k/n per item
Stream typeUnknown length, one pass
PublishedVitter 1985 (k≥1), Knuth 1981 (k=1)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How Algorithm R works

You have a stream of items arriving one at a time. You don't know how many there will be — could be 10, could be 10 billion. You want to keep k items chosen uniformly at random from the entire stream when it ends. The catch: you must decide what to keep as items arrive, with only O(k) memory available.

Algorithm R (Vitter 1985):

Fill the reservoir with the first k items.
For each subsequent item i (where i > k), pick a random integer j uniformly from [0, i).
If j < k, replace reservoir[j] with item i; otherwise discard item i.

That's it — the entire algorithm. When the stream ends, the reservoir contains k items chosen uniformly at random from the entire stream.

Why every item has probability k/n at the end

Consider the k=1 case (single sample). At step i, we keep the current item with probability 1/i. For any earlier item j (j < i), the probability it's still in the reservoir is:

P(j survives to step i) = (1/j) · ∏_{t=j+1}^{i} (1 − 1/t)
                       = (1/j) · (j/i) = 1/i

So at every step i, each of the i items seen so far has probability 1/i of being in the reservoir. At the end (i = n), every item has probability 1/n. Uniform.

For k ≥ 1, the proof generalizes: each existing reservoir slot survives the next item with probability (i−1)/i; multiplied through, each item ends with probability k/n.

Worked example — k = 3, n = 7

Stream: A, B, C, D, E, F, G. Reservoir size k = 3.

Step	Item	Action	Reservoir after
1	A	Fill slot 0	[A, _, _]
2	B	Fill slot 1	[A, B, _]
3	C	Fill slot 2	[A, B, C]
4	D	j = rand(0, 4) = 1 → replace slot 1	[A, D, C]
5	E	j = rand(0, 5) = 4 (≥ 3) → discard	[A, D, C]
6	F	j = rand(0, 6) = 0 → replace slot 0	[F, D, C]
7	G	j = rand(0, 7) = 5 (≥ 3) → discard	[F, D, C]

Final reservoir: {F, D, C}. Each of A, B, C, D, E, F, G had probability 3/7 ≈ 0.43 of ending here — independent of when it arrived.

Cost in real terms

On a 10-billion-item stream at 1 million items per second (a typical analytics or telemetry feed), Algorithm R uses 10 billion random-number generations — about 10 seconds of pure RNG cost in modern CPUs. Algorithm L reduces this to roughly k · log(n/k) random generations — for k = 100 and n = 10⁹, that's ~2,300 random numbers, finishing in microseconds. The reservoir itself fits in cache regardless of stream size.

Reservoir sampling vs alternatives

	Algorithm R (Vitter)	Algorithm L (Li)	Algorithm A-Res / weighted	Bernoulli sampling	Two-pass uniform
Time	O(n)	O(k (1 + log(n/k)))	O(n log k)	O(n)	O(n)
Random calls	n	k log(n/k)	n	n	k
Sample size	Exact k	Exact k	Exact k (weighted)	Random, mean ≈ p·n	Exact k
Needs n upfront	No	No	No	No	Yes
Memory	O(k)	O(k)	O(k) heap	O(p·n)	O(k)
Best for	Default streams	Huge streams	Weighted items	Approx fraction	Random access available

When to use reservoir sampling

Streaming analytics. Datadog, Honeycomb, Jaeger, and Prometheus use reservoir sampling to keep a representative subset of metric / trace data when the stream rate exceeds the storage budget.
Database TABLESAMPLE. BigQuery, Snowflake, PostgreSQL TABLESAMPLE BERNOULLI all wrap reservoir sampling for "give me 1000 random rows from this trillion-row table" queries that won't scan twice.
Distributed system telemetry. Per-request trace sampling that keeps "approximately 1 in N" without coordination — each node runs its own reservoir.
Machine-learning data sampling. Online learners that need a uniformly-random validation set from a streaming source — Spotify's and YouTube's recommendation pipelines use weighted variants.
Reservoir-based unit tests. "Verify some random subset of cases pass" without enumerating cases upfront.

JavaScript implementation

// Algorithm R — basic reservoir sampling.
class Reservoir {
  constructor(k) { this.k = k; this.reservoir = []; this.i = 0; }
  observe(item) {
    this.i++;
    if (this.reservoir.length < this.k) {
      this.reservoir.push(item);
    } else {
      const j = Math.floor(Math.random() * this.i);
      if (j < this.k) this.reservoir[j] = item;
    }
  }
  sample() { return this.reservoir.slice(); }
}

// Usage on a stream
const sampler = new Reservoir(3);
for (const item of ['A', 'B', 'C', 'D', 'E', 'F', 'G']) sampler.observe(item);
console.log(sampler.sample());  // 3 items, each with probability 3/7

// ────────────────────────────────────────────────────────────────
// Algorithm L — Vitter 1987 / Li 1994. Faster on huge streams.
class FastReservoir {
  constructor(k) {
    this.k = k; this.reservoir = []; this.i = 0;
    this.W = Math.exp(Math.log(Math.random()) / k);
    this.next = k + Math.floor(Math.log(Math.random()) / Math.log(1 - this.W)) + 1;
  }
  observe(item) {
    this.i++;
    if (this.reservoir.length < this.k) {
      this.reservoir.push(item);
    } else if (this.i === this.next) {
      const j = Math.floor(Math.random() * this.k);
      this.reservoir[j] = item;
      this.W *= Math.exp(Math.log(Math.random()) / this.k);
      this.next = this.i + Math.floor(Math.log(Math.random()) / Math.log(1 - this.W)) + 1;
    }
  }
  sample() { return this.reservoir.slice(); }
}

Python implementation

import random, math

class Reservoir:
    def __init__(self, k):
        self.k = k
        self.reservoir = []
        self.i = 0

    def observe(self, item):
        self.i += 1
        if len(self.reservoir) < self.k:
            self.reservoir.append(item)
        else:
            j = random.randint(0, self.i - 1)
            if j < self.k:
                self.reservoir[j] = item

    def sample(self):
        return list(self.reservoir)

# Streaming usage
sampler = Reservoir(3)
for item in 'ABCDEFG':
    sampler.observe(item)
print(sampler.sample())

# Algorithm L — exponentially fewer random calls
class FastReservoir:
    def __init__(self, k):
        self.k = k; self.reservoir = []; self.i = 0
        self.W = math.exp(math.log(random.random()) / k)
        self.next = k + int(math.log(random.random()) / math.log(1 - self.W)) + 1

    def observe(self, item):
        self.i += 1
        if len(self.reservoir) < self.k:
            self.reservoir.append(item)
        elif self.i == self.next:
            j = random.randrange(self.k)
            self.reservoir[j] = item
            self.W *= math.exp(math.log(random.random()) / self.k)
            self.next = self.i + int(math.log(random.random()) / math.log(1 - self.W)) + 1

Variants and extensions

Weighted reservoir sampling (Efraimidis–Spirakis 2006). Each item has weight w_i; the goal is to sample with probability proportional to weight. For each item draw r_i = U(0,1)^(1/w_i); keep the k items with the largest r_i. One pass, O(n log k) using a min-heap.
Distributed reservoir merge. Spark, Flink, and BigQuery partition the stream into shards, run a reservoir per shard, then merge. Merging two reservoirs of size k with counts (a, b) into one reservoir of size k: include each candidate with probability proportional to its origin's count.
Sliding-window variants. "Sample over the last w items only." The textbook reservoir doesn't support deletion; Babcock-Datar-Motwani (2002) give a different algorithm using priority-sampling.
Stratified reservoir. Sample k from each of multiple strata (e.g., per user, per region) — run k parallel reservoirs keyed by stratum.
Reservoir for distinct counts. Combined with HyperLogLog, you can estimate both the count of distinct items AND a sample of them — used in advertising bid-stream analysis.

Famous reservoir-sampling problems

LeetCode 382 — Linked List Random Node

Given the head of a singly linked list, return a random node uniformly at random. The list length isn't known, can't be cached. Classic k = 1 reservoir: walk the list, keep the current node with probability 1/i at each step. O(n) time per call, O(1) space — exact uniform sample without computing length first.

Distributed-trace sampling at Twitter

Twitter's Finagle / Zipkin / Census tracing libraries sample roughly 1 trace per 10,000 requests using reservoir sampling. Without it, recording every trace at peak load (~200k req/s) would produce 17 billion traces per day — impossible to store. With reservoir sampling, the storage budget stays fixed (say 100k traces/day) and the sample remains uniform across the entire day, even though traffic spikes and dips.

Common bugs and edge cases

Off-by-one in the random call. Use rand(0, i) inclusive of 0, exclusive of i (range [0, i)). Including i shifts the probability; excluding 0 breaks the uniform guarantee for slot 0.
Confusing i with the index of the current item. i is the 1-indexed count of items seen so far (1, 2, 3, ...), not the array index. Off-by-one here silently bias the sample.
Bias from poor RNG. If Math.random() has a short period or a non-uniform distribution, the sample inherits the bias. Use a cryptographic RNG (window.crypto.getRandomValues) or a known-good PRNG for high-stakes sampling.
Re-sampling on every observation. The standard algorithm is INCREMENTAL — you call observe() once per item and the reservoir is always up to date. Don't rerun from scratch on every new item.
Forgetting that without-replacement is built in. Algorithm R samples without replacement (each item appears at most once in the reservoir). If you want with-replacement, run k independent k=1 reservoirs.
Treating Algorithm L as approximate. Algorithm L gives EXACTLY the same distribution as Algorithm R — it just skips intermediate steps using a geometric distribution. Don't use it as a "fast approximation" — it's exact and 100× faster.

Frequently asked questions

Why does keeping the current item with probability 1/i give a uniform sample?

Inductive proof for k = 1: at step i, item i is kept with probability 1/i. For any earlier item j (j < i), its probability of still being in the reservoir is (1/j) · ∏_{t=j+1..i} (1 − 1/t) = (1/j) · j/i = 1/i. So every item — past and present — has probability 1/i of being the chosen one. At the end (i = n), every item has probability 1/n. Uniform sampling proven.

How does the k > 1 version work?

Fill the reservoir with the first k items. For each subsequent item i (i > k), draw a random integer j in [0, i). If j < k, replace reservoir[j] with item i — otherwise discard. Each item ends up in the reservoir with probability exactly k/n at the end of the stream. The proof generalizes the k=1 case: each existing slot survives each new item with probability (i−1)/i, giving k/n by induction.

What's the difference between Algorithm R and Algorithm L?

Algorithm R (the basic version) calls the random generator once per stream item — O(n) random numbers for n items. Algorithm L (Li 1994) skips ahead by computing how many items can be safely ignored before the next reservoir replacement. It calls the random generator O(k(1 + log(n/k))) times — orders of magnitude faster on large streams. Same end distribution, different inner loop.

Can I use reservoir sampling on a parallel stream?

Yes — partition the stream into shards, run a reservoir per shard, then merge. Merging two reservoirs of sizes m and n (with counts a and b items seen) into one reservoir of size m: include each of the m+n items with probability proportional to weights a/(a+b) and b/(a+b). Each shard reports its reservoir and item count; a merge step gives the final uniform sample. Used in Spark, Flink, and BigQuery for TABLESAMPLE-style operations.

What's weighted reservoir sampling?

Each item i has weight w_i; the goal is to sample with probability proportional to weight. Efraimidis–Spirakis (2006): for each item draw r_i = U(0,1)^(1/w_i), keep the k items with the largest r_i. This sample has the right distribution and works in one pass with a min-heap of size k — O(n log k). Used in machine-learning subset selection where some examples matter more than others.

Where is reservoir sampling used in practice?

Hadoop / Spark for TABLESAMPLE — produces uniform random subsets of giant tables without scanning twice. Database query optimizers for cardinality estimation. Online machine-learning systems for prediction-time evaluation. Stream-processing telemetry — Twitter, Datadog, and Prometheus all do reservoir sampling for trace-rate control. Distributed-tracing systems (Jaeger, Honeycomb) use it to keep one trace per request budget.

Does the stream need to fit in memory?

No — that's the whole point. Reservoir sampling uses O(k) memory regardless of stream length n. The stream can be larger than RAM, larger than disk, or even infinite (in which case "the sample at any time" is the meaningful output). This is the canonical algorithm for sampling from streams that arrive faster than they can be stored.