Probability
Hypergeometric Distribution
Counting successes when sampling without replacement from a finite population
Hypergeometric(N, K, n) counts successes in n draws WITHOUT replacement from a population of N items with K successes. Mass function C(K, k)·C(N−K, n−k)/C(N, n). Differs from binomial (with replacement) when n/N is large. Foundation of Fisher's exact test, lottery odds, and capture-recapture estimation.
- PMFP(X = k) = C(K, k) C(N − K, n − k) / C(N, n)
- Mean / variancenK/N; nK(N−K)(N−n) / (N²(N−1))
- Binomial limitn/N < 5% → Binomial(n, K/N) good approx
- FPC factor(N − n)/(N − 1) — finite-population correction
- Famous testsFisher's exact, capture-recapture, lottery
- Range of Xmax(0, n − (N−K)) ≤ X ≤ min(n, K)
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How the hypergeometric distribution works
Setup. A finite population of N items contains K "successes" and N − K "failures". You sample n items without replacement — each item drawn is removed, so successive draws are not independent. Let X be the number of successes among the n drawn. Then X follows the hypergeometric distribution with parameters (N, K, n), and
P(X = k) = C(K, k) · C(N − K, n − k) / C(N, n),
where C(a, b) = a!/(b!(a − b)!) is the binomial coefficient. The intuition: out of C(N, n) equally likely n-subsets of the population, the number that include exactly k successes is "choose k of the K successes" times "choose n − k of the N − K failures". Divide by the total.
The support of X is max(0, n − (N − K)) ≤ k ≤ min(n, K). The lower bound says you cannot have fewer successes than the n-sample's surplus over failures; the upper bound says you cannot have more successes than either the sample size or the population's success count.
Mean, variance, and the finite-population correction
The mean of the hypergeometric is
E[X] = n · K / N.
Identical to a binomial with p = K/N. Order and replacement do not affect the marginal probability that any specific draw is a success — each draw, regardless of position, has probability K/N of being a success (this is a symmetry argument, easy to prove by counting orderings).
The variance is
Var(X) = n · (K/N) · (1 − K/N) · (N − n)/(N − 1).
The first three factors are the binomial variance n·p·(1 − p). The last is the finite-population correction (FPC), equal to 1 when n = 1 and 0 when n = N. The hypergeometric is always at most as variable as the corresponding binomial, and strictly less variable when n > 1. Intuition: each draw without replacement removes uncertainty about what is left, tightening the count.
Worked example — quality control
A lot of N = 100 microchips contains K = 8 defective units (mean defect rate 8%). The QA team randomly samples n = 20 chips for destructive testing. What is the probability that X = 0 defective chips are found — i.e. the lot passes inspection?
P(X = 0) = C(8, 0) C(92, 20) / C(100, 20)
= 1 · C(92, 20) / C(100, 20)
≈ 0.1697.
About a 17% chance the lot passes even though 8% are defective. Compare to a binomial approximation (sampling with replacement, p = 0.08, n = 20): P(X = 0) = 0.92²⁰ ≈ 0.1887 — slightly higher because binomial overstates variability. The sampling fraction n/N = 20/100 = 20% > 5%, so hypergeometric matters here.
If instead n/N were small — say N = 10,000 with K = 800 defective and n = 20 sampled — the binomial approximation P(X = 0) ≈ 0.1887 is accurate to 4 decimal places, because removing 20 chips from 10,000 barely affects subsequent probabilities.
Worked example — lottery odds
A 6/49 lottery: N = 49 numbered balls, K = 6 of them are your chosen numbers, and the draw is n = 6 without replacement. The number X of your numbers in the draw is Hypergeometric(49, 6, 6).
P(X = 6) = C(6, 6) C(43, 0) / C(49, 6) = 1 / 13,983,816 ≈ 7.15 × 10⁻⁸.
P(X = 5) = C(6, 5) C(43, 1) / C(49, 6) = 6 · 43 / 13,983,816 ≈ 1.84 × 10⁻⁵.
P(X = 4) = C(6, 4) C(43, 2) / C(49, 6) = 15 · 903 / 13,983,816 ≈ 9.69 × 10⁻⁴.
P(X = 3) = C(6, 3) C(43, 3) / C(49, 6) = 20 · 12341 / 13,983,816 ≈ 0.01765.
Expected number matched is 6 · 6 / 49 ≈ 0.735. About 1 in 13 tickets matches 3, the cheapest payout tier; jackpot 1 in 14 million.
Variants and relatives
- Negative hypergeometric. Sample without replacement until r failures occur; let X count successes drawn. Models "draw until you've rejected r times". Useful when the stopping criterion is failure count rather than sample size.
- Multivariate hypergeometric. Population partitioned into c categories with counts K₁, …, K_c summing to N; draw n without replacement, count successes X₁, …, X_c in each category. PMF involves a product of binomial coefficients over a single big denominator. Used in genetics (allele counts), urn models, multi-class enrichment.
- Fisher's noncentral hypergeometric. Generalises by adding a parameter ω that gives different sampling weights to successes vs failures. Used to model retrospective sampling in case-control studies.
- Wallenius noncentral hypergeometric. Sequential sampling without replacement where the relative odds change at each step. Different from Fisher's noncentral.
- Beta-binomial. The compound distribution if the success probability has a Beta prior — used when the population's K/N itself is uncertain. Approaches hypergeometric in the conjugate-posterior limit.
JavaScript — hypergeometric PMF, mean, and sampling
// Log binomial coefficient for numerical stability with large N
function lnChoose(n, k) {
if (k < 0 || k > n) return -Infinity;
let s = 0;
for (let i = 0; i < k; i++) s += Math.log(n - i) - Math.log(i + 1);
return s;
}
function hypergeoPMF(N, K, n, k) {
return Math.exp(lnChoose(K, k) + lnChoose(N - K, n - k) - lnChoose(N, n));
}
function hypergeoMean(N, K, n) { return n * K / N; }
function hypergeoVar(N, K, n) {
return n * (K / N) * (1 - K / N) * (N - n) / (N - 1);
}
// Sample one draw without replacement
function sampleHypergeo(N, K, n) {
let successes = K, total = N, x = 0;
for (let i = 0; i < n; i++) {
if (Math.random() < successes / total) { x++; successes--; }
total--;
}
return x;
}
// Lottery probabilities
console.log(hypergeoPMF(49, 6, 6, 6)); // ≈ 7.15e-8 (jackpot)
console.log(hypergeoPMF(49, 6, 6, 5)); // ≈ 1.84e-5 (match-5)
// Quality control: 100 chips, 8 defective, sample 20
console.log(hypergeoPMF(100, 8, 20, 0)); // ≈ 0.1697 (P(no defectives))
console.log(hypergeoMean(100, 8, 20)); // 1.6 expected defectives
Why the PMF formula works
Count the favourable outcomes and divide by the total. There are C(N, n) equally likely unordered subsets of size n drawn from the population. To count subsets with exactly k successes: choose which k of the K successes are included (C(K, k) ways), then choose which n − k of the N − K failures are included (C(N − K, n − k) ways). Multiply, divide by C(N, n) — that is the PMF.
An equivalent ordered derivation: each ordering of n draws is equally likely. There are P(N, n) = N!/(N − n)! orderings. Fix a particular pattern of k successes (e.g. SSSFF for k = 3, n − k = 2) — this has K · (K − 1) · (K − 2) · (N − K) · (N − K − 1) orderings. There are C(n, k) such patterns. Divide P(N, n) into ordered count gives the same result. Either viewpoint works; the unordered count is cleaner.
When to reach for hypergeometric
- Sampling without replacement from a fixed population. Quality control by destructive testing, audit sampling, capture-recapture surveys — anywhere the population is finite and items are not returned.
- 2×2 contingency tables with small or fixed margins. Fisher's exact test uses the hypergeometric exact null distribution; the only valid test when expected cell counts are below 5.
- Card games and combinatorial probability. Poker, Bridge, Magic: the Gathering — drawing without replacement is the rule. Hypergeometric gives exact hand probabilities.
- Survey sampling with high sampling fraction. Polling a small town (n/N > 5%), auditing a hospital ward, sampling a small focus group — finite-population correction matters.
- Capture-recapture population estimation. Tag and release, then recapture; the count of tagged animals in the recapture is hypergeometric. Lincoln-Petersen estimator inverts this.
- Gene-set enrichment analysis. Does a gene set overlap a query list more than chance? Test with a hypergeometric tail probability (one-tailed Fisher's exact).
Common pitfalls
- Using binomial when n/N is large. Binomial overestimates variance and undercounts tight cells. Always check the sampling fraction; if n/N > 5%, use hypergeometric.
- Forgetting the finite-population correction. Standard errors for sample proportions in survey sampling include √(FPC). Omit it and you overstate uncertainty by up to 100% when n approaches N.
- Confusing K with the unknown. In quality control, K is the (unknown) number of defectives we want to infer; in lottery, K is known. Make sure you know which parameter is the inferential target.
- Range errors. X must satisfy max(0, n − (N − K)) ≤ X ≤ min(n, K). The lower bound is non-trivial when n is large relative to the failures. A negative lower bound makes no sense.
- Asymptotic approximations without checking conditions. Hypergeometric → binomial only when n/N is small; → Poisson when n is large but np is bounded; → Normal when all of n, K, N − K are large. Use the right limit for the regime.
- Symmetry confusion. Hypergeometric(N, K, n) and Hypergeometric(N, n, K) have the same distribution — drawing n with K successes vs drawing K from a population of N with n successes labelled. Useful for code re-use but a frequent source of mislabelling.
Applications
Fisher's exact test — 2×2 contingency tables
Suppose 100 patients (50 treated, 50 control) experience outcomes. The treated group sees 12 successes, the control 6. Is treatment effective? Under the null hypothesis of independence, conditioning on margins, the count in the treated-success cell is Hypergeometric(100, 18, 50). The p-value is the tail probability P(X ≥ 12) — a finite sum of hypergeometric PMFs. Exact regardless of sample size; the gold standard when chi-squared's approximation is unreliable (expected cell count < 5).
Capture-recapture (Lincoln-Petersen estimator)
Tag K = 100 fish in a lake and release. Later catch n = 150 fish; find X = 20 tagged. Under random mixing, X ~ Hypergeometric(N, 100, 150). The MLE for the unknown lake population is N̂ ≈ n · K / X = 150 · 100 / 20 = 750 fish. Confidence intervals from hypergeometric tails. The Schnabel multi-sample generalisation and Cormack-Jolly-Seber survival models extend this. Used in wildlife biology, epidemiology (estimating disease prevalence from positivity rates), and human population estimation.
Quality control — acceptance sampling
A lot of N items is accepted if a random sample of n contains at most c defectives. Under hypergeometric assumption, the lot's operating characteristic curve — P(accept) as a function of true defect rate K/N — is a sum of hypergeometric PMFs over k = 0 to c. Tables in MIL-STD-105 (now ANSI/ASQ Z1.4) for accept-reject criteria are entirely built on hypergeometric calculations.
Bridge and Poker hand probabilities
Drawing without replacement from a deck. P(four-of-a-kind in 5-card hand) = C(13, 1) · C(4, 4) · C(48, 1) / C(52, 5) = 624 / 2,598,960 ≈ 0.000240 — that's the multivariate hypergeometric with the appropriate cell structure. All canonical poker probabilities reduce to hypergeometric calculations.
Gene-set enrichment analysis (bioinformatics)
From an experiment, you identify n significant genes out of a genome of N. A gene set of interest contains K genes; X of the n significant genes fall in the set. Test enrichment with a one-tailed Fisher's exact P(X ≥ x_obs) — a hypergeometric tail. Tools like DAVID, GSEA-CLI, and enrichr all use this.
Frequently asked questions
What's the difference between hypergeometric and binomial?
Both count successes in n draws from a population with success probability p = K/N. Binomial assumes WITH replacement — every draw is independent, p stays constant. Hypergeometric assumes WITHOUT replacement — each draw reduces both the population and the count of remaining successes (if the draw was a success). The two coincide when n is much smaller than N (the population is effectively infinite from the sample's perspective). The rule of thumb is n/N < 5%: below that threshold, binomial is a good approximation to hypergeometric; above it, hypergeometric is meaningfully tighter. The variance ratio is the finite-population correction (N − n)/(N − 1) — equal to 1 when n = 1 and approaching 0 when n = N (no uncertainty if you sample everything).
How is hypergeometric used in lottery analysis?
A standard 6/49 lottery has N = 49 balls, K = 6 "winning" numbers (the ones you picked), and you draw n = 6 without replacement. The number X of your numbers matched in the draw is Hypergeometric(49, 6, 6). P(X = 6) = C(6, 6) C(43, 0) / C(49, 6) = 1/13,983,816 ≈ 7.15 × 10⁻⁸ — the standard "roughly one in fourteen million" jackpot odds. P(X = 5) = C(6, 5) C(43, 1) / C(49, 6) = 258 / 13,983,816 ≈ 1.84 × 10⁻⁵. Every lottery odds calculation reduces to a hypergeometric formula because draws are explicitly without replacement.
What's Fisher's exact test, and how does hypergeometric power it?
Fisher's exact test answers "is the association in a 2×2 contingency table statistically significant?" for small samples. Under the null hypothesis of independence, with row and column margins fixed, the count in any cell is Hypergeometric(N, K, n) where N is the total, K and n are the marginal totals. The exact p-value is the hypergeometric tail probability of observing a count at least as extreme as the data. Used in epidemiology (case-control studies with small samples), bioinformatics (gene-enrichment analysis), and any 2×2 table where chi-squared's large-sample approximation isn't trusted (expected cell count below 5). R: fisher.test().
What's the mean and variance of the hypergeometric?
Mean: E[X] = n · K / N. Same as binomial with p = K/N — sampling order doesn't affect the average. Variance: Var(X) = n · (K/N) · (1 − K/N) · (N − n)/(N − 1). The first three factors are binomial variance; the last is the finite-population correction (FPC). When n = 1, FPC = 1; when n = N, FPC = 0 (no variance — you sampled the whole population). Standard error of the sample proportion X/n in survey sampling uses √(FPC × p(1−p)/n) — this is why pollsters can use n = 1500 to estimate national opinion with ~2.5% margin, FPC matters when sampling a large fraction of a finite population (a small town, a hospital ward).
How does capture-recapture work with hypergeometric?
To estimate N (unknown population size of wildlife): tag K animals, release them, later catch n animals and count X tagged. Under random mixing, X ~ Hypergeometric(N, K, n). Maximum likelihood estimate: N̂ ≈ n · K / X (Lincoln-Petersen estimator). Example — tag 100 fish, later catch 150, find 20 tagged — N̂ ≈ 150 · 100 / 20 = 750 fish in the lake. Confidence intervals come from hypergeometric tails. Modern variants (Schnabel multi-sample, Cormack-Jolly-Seber for survival) generalise this to multiple capture occasions. Foundation of population biology.
When does the hypergeometric reduce to the Poisson?
Three limits, three distributions. (1) Binomial limit: when n is small relative to N (n/N → 0), Hypergeometric(N, K, n) → Binomial(n, K/N) — sampling without replacement looks like independence. (2) Poisson limit: when n is large, K is large, but K/N is small AND n·K/N stays bounded — Hypergeometric → Poisson(n·K/N). Same conditions as binomial-to-Poisson. (3) Normal limit: when n, K, N − K are all large and the variance is large, X is approximately Normal with mean nK/N and variance from the FPC formula — used to compute Fisher exact p-values for large tables. So the hypergeometric sits at the centre of a family of approximations.
Binomial vs hypergeometric vs Poisson — and friends
When to use which discrete count distribution.
| Distribution | Replacement? | Population | Mean | Variance | When to use |
|---|---|---|---|---|---|
| Binomial(n, p) | With (independent trials) | Effectively infinite | np | np(1−p) | Coin flips; n/N < 5% sampling |
| Hypergeometric(N, K, n) | Without | Finite N, K successes | nK/N | nK(N−K)(N−n) / (N²(N−1)) | Quality control, lottery, Fisher's test |
| Poisson(λ) | n/a (count of rare events) | Continuous-time origin | λ | λ | Rare events, n large p small, np ≈ λ |
| Negative binomial(r, p) | With | Effectively infinite | r(1−p)/p | r(1−p)/p² | Count of trials until r successes |
| Negative hypergeometric(N, K, r) | Without | Finite | r(N−K+1)/(K+1) | Closed form (complex) | Without-replacement stop after r failures |
| Multinomial(n, p) | With (c categories) | Effectively infinite | np_i | np_i(1−p_i) | Multi-category counts, polls |
| Multivariate hypergeometric | Without (c categories) | Finite | nK_i/N | nK_i(N−K_i)(N−n)/(N²(N−1)) | Genetics, urn models, multi-class enrichment |