Statistics
Chi-Squared Distribution
Sum of k squared standard normals — mean k, variance 2k
The chi-squared distribution χ²(k) is the sum of k independent squared standard-normal variables. Mean k, variance 2k. Powers goodness-of-fit tests, contingency-table independence tests, and variance estimation.
- Definitionχ²(k) = Z₁² + Z₂² + ... + Z_k², Zᵢ ~ N(0,1)
- Meank
- Variance2k
- Densityx^(k/2−1) e^(−x/2) / (2^(k/2) Γ(k/2))
- Special case ofGamma(k/2, 1/2)
- AuthorsHelmert 1876, Pearson 1900
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The definition
Let Z₁, Z₂, …, Z_k be independent standard normal random variables (mean 0, variance 1). The sum of their squares:
X = Z₁² + Z₂² + ... + Z_k²
X ~ χ²(k) "chi-squared with k degrees of freedom"
k is the degrees of freedom. The density is:
f(x) = x^(k/2 − 1) e^(−x/2) / (2^(k/2) Γ(k/2)) for x > 0
= 0 for x ≤ 0
This is exactly a Gamma distribution with shape α = k/2 and rate β = 1/2 — chi-squared is a special case of Gamma.
Mean and variance
For Z standard normal: E[Z²] = Var(Z) = 1. So summing k independent squared standards:
E[χ²(k)] = k · 1 = k
Var(χ²(k)) = k · Var(Z²) = k · 2 = 2k
(using E[Z⁴] = 3 ⇒ Var(Z²) = 3 − 1 = 2)
Standard deviation is √(2k) — grows slower than mean. Concretely:
| k | Mean | Variance | Mode (k≥2) | Std dev |
|---|---|---|---|---|
| 1 | 1 | 2 | 0 | 1.41 |
| 2 | 2 | 4 | 0 | 2.00 |
| 5 | 5 | 10 | 3 | 3.16 |
| 10 | 10 | 20 | 8 | 4.47 |
| 30 | 30 | 60 | 28 | 7.75 |
| 100 | 100 | 200 | 98 | 14.14 |
For k ≥ 2 the mode is at k − 2 (positive only when k > 2). For k = 1 and k = 2 the density is monotone decreasing.
Pearson's goodness-of-fit test
Suppose you observe counts O₁, O₂, …, O_k across k categories and want to test whether they match expected counts E₁, E₂, …, E_k under some hypothesis. The chi-squared statistic:
X² = Σ (Oᵢ − Eᵢ)² / Eᵢ
Under the null hypothesis, X² approximately follows χ²(k − 1 − p) where p is the number of free parameters estimated from the data.
Worked example — testing dice fairness. Roll a die 60 times, observe (8, 12, 9, 11, 7, 13) hits per face. Expected count under uniformity: 10 per face. Compute:
X² = (8-10)²/10 + (12-10)²/10 + (9-10)²/10 + (11-10)²/10 + (7-10)²/10 + (13-10)²/10
= 4/10 + 4/10 + 1/10 + 1/10 + 9/10 + 9/10
= 28/10 = 2.8
df = 6 - 1 = 5
χ² critical at α=0.05 with 5 df: 11.07
X² = 2.8 < 11.07 → fail to reject; consistent with fair die.
If the observed X² had been 12, we'd reject at the 0.05 level — strong evidence the die is biased.
Contingency tables — testing independence
For a contingency table with r rows and c columns, expected count Eᵢⱼ = (row total)(col total)/(grand total). The statistic is the same:
X² = Σᵢ,ⱼ (Oᵢⱼ − Eᵢⱼ)² / Eᵢⱼ
df = (r − 1)(c − 1)
For a 2×3 table, df = (2−1)(3−1) = 2. For a 5×5 table, df = 16. Each constraint (row sums, column sums) costs degrees of freedom.
Chi-squared vs related distributions
| Distribution | Form | Mean | Variance | Use case |
|---|---|---|---|---|
| χ²(k) | Σ Zᵢ², Z ~ N(0,1) | k | 2k | Goodness-of-fit, variance ratio |
| χ²(k, λ) | Σ Zᵢ², Z ~ N(μᵢ, 1) | k + λ | 2(k + 2λ) | Power calculations |
| Gamma(α, β) | — | α/β | α/β² | Generic positive RV; chi² = Gamma(k/2, 1/2) |
| Exponential(λ) | — | 1/λ | 1/λ² | χ²(2) ~ Exp(1/2) |
| t(ν) | Z/√(χ²(ν)/ν) | 0 (ν>1) | ν/(ν−2) | Small-sample means |
| F(m, n) | (χ²(m)/m)/(χ²(n)/n) | n/(n−2) | — | Variance ratio, ANOVA |
Chi-squared is the foundation; t and F are constructed from chi-squareds and standard normals.
Connection to sample variance
If X₁, …, Xₙ are i.i.d. N(μ, σ²) and s² = Σ(Xᵢ − X̄)²/(n − 1) is the sample variance, then:
(n − 1) s² / σ² ~ χ²(n − 1)
Why n − 1, not n? Because one degree of freedom is used estimating μ via X̄. This relationship is what makes chi-squared central to confidence intervals for σ²:
95% CI for σ²: [(n−1)s²/χ²_{0.975}(n−1), (n−1)s²/χ²_{0.025}(n−1)]
The interval is asymmetric because chi-squared is skewed. For n = 20 and s² = 4, with χ²_{0.025}(19) = 32.85 and χ²_{0.975}(19) = 8.91, the 95% CI for σ² is [19·4/32.85, 19·4/8.91] = [2.31, 8.53].
Where chi-squared shows up
- Pearson's goodness-of-fit. Test whether categorical data follows a hypothesized distribution. The most-used statistical test ever invented.
- Contingency tables. Test independence of two categorical variables — used everywhere from medicine to A/B testing.
- Likelihood ratio tests (Wilks's theorem). Twice the log-likelihood ratio under the null is asymptotically χ²(df) where df is the difference in parameter counts.
- Variance estimation. Confidence intervals for σ² use the chi-squared distribution of the scaled sample variance.
- ANOVA. Sum-of-squares decomposition; treatment and error sums of squares both have chi-squared distributions under the null.
- Linear regression diagnostics. Residual sum of squares divided by σ² is chi-squared with n − p degrees of freedom (p is number of estimated coefficients).
- Mahalanobis distance. For multivariate normal data, (x − μ)ᵀΣ⁻¹(x − μ) is χ²(d) — used for outlier detection.
- Hidden Markov models. Likelihood ratio chi-squared tests compare nested HMM architectures.
Useful approximations
For large k, chi-squared approaches Normal by the central limit theorem:
(χ²(k) − k) / √(2k) → N(0, 1) as k → ∞
A sharper approximation due to Fisher: √(2χ²(k)) − √(2k − 1) is approximately N(0, 1). This is more accurate for moderate k (say k ≥ 10) than the direct Normal approximation.
Common pitfalls
- Expected counts too small. The chi-squared approximation fails when Eᵢ < 5. Use Fisher's exact test or pool categories with sparse counts.
- Forgetting to adjust df for estimated parameters. If you estimate p parameters from the data (e.g., fitting a Normal's μ and σ before testing fit), subtract p from the degrees of freedom.
- Confusing one-sided and two-sided tests. Chi-squared goodness-of-fit is inherently one-sided (reject only for large X²); low X² values indicate good fit, not bad.
- Using chi-squared for ordered categories. Chi-squared treats categories as nominal — ordered alternatives (e.g., Cochran-Armitage trend test) are more powerful when ordering is meaningful.
- Yates's continuity correction. For 2×2 tables, Yates's correction subtracts 0.5 from |O − E| before squaring — useful for small n, conservative for large n.
- Mistaking p-value direction. Reject the null when X² is large (top tail). A small p-value comes from a large test statistic.
History
Friedrich Helmert derived the distribution of the sample variance from normal data in 1876 — the first appearance of chi-squared in the literature. Karl Pearson rediscovered and named it in his 1900 paper "On the criterion that a given system of deviations from the probable...", introducing chi-squared goodness-of-fit testing in the same paper. R. A. Fisher's later work (1922, 1925) clarified the degrees-of-freedom adjustment when parameters are estimated. Chi-squared is one of the few distributions named after a Greek letter rather than a person — the "chi" came from Pearson's notation, χ², not from any mathematician's surname.
Frequently asked questions
What does "degrees of freedom" mean for chi-squared?
Degrees of freedom k is the number of independent standard normals being squared and summed. For k = 1, χ² is the distribution of Z² for a single standard normal — sharply peaked at 0. For k = 10, χ² is the sum of 10 squared standard normals — mean 10, variance 20, much more spread. In statistical tests, degrees of freedom usually equals (number of categories) − (number of parameters estimated) − 1, capturing how much the test statistic is constrained by the data.
How is chi-squared used in goodness-of-fit testing?
Pearson's chi-squared statistic is X² = Σ (Oᵢ − Eᵢ)² / Eᵢ summed over categories, where Oᵢ is observed count and Eᵢ is expected count under the null hypothesis. Under the null, X² approximately follows χ²(k − 1 − p) where k is the number of categories and p is the number of parameters estimated. If X² exceeds the critical value (e.g., 11.07 at α = 0.05 for 5 df), reject the null. Famous example: testing whether dice are fair — if the chi-squared with 5 df exceeds 11.07, the dice are biased with 95% confidence.
Why mean k and variance 2k?
If Z ~ N(0,1) then E[Z²] = 1 and Var(Z²) = E[Z⁴] − (E[Z²])² = 3 − 1 = 2 (using the fact that the fourth moment of a standard normal is 3). Summing k independent squared standards: E[χ²ₖ] = k · 1 = k, Var(χ²ₖ) = k · 2 = 2k. Note that the standard deviation grows like √(2k), much slower than the mean — so as k grows, χ²(k) becomes relatively more concentrated around k. By the CLT, (χ²(k) − k)/√(2k) → N(0,1) as k → ∞.
How does chi-squared relate to the t and F distributions?
Student's t with ν degrees of freedom is Z / √(χ²(ν)/ν) where Z is standard normal independent of the chi-squared. Snedecor's F with (m, n) degrees of freedom is (χ²(m)/m) / (χ²(n)/n) — a ratio of independent chi-squareds divided by their degrees of freedom. So chi-squared is the building block: t for testing means with unknown variance, F for testing ratios of variances or comparing nested models. Chi-squared, t, and F together form the trinity of classical statistical testing.
When does the chi-squared approximation break down?
When expected counts in any cell are too small (rule of thumb: Eᵢ < 5). With small expected counts, the binomial counts are far from Normal, so their standardized squares don't sum to a true chi-squared. Use Fisher's exact test for small contingency tables, or pool sparse cells, or use Monte Carlo simulation to get the exact p-value. The chi-squared approximation is asymptotic; n must be large enough for the CLT to kick in. For 2×2 tables, Yates's continuity correction subtracts 0.5 from |O − E| before squaring.
What is the noncentral chi-squared?
If Z₁, ..., Z_k are independent with Zᵢ ~ N(μᵢ, 1), then Σ Zᵢ² ~ χ²(k, λ) — noncentral chi-squared with noncentrality λ = Σ μᵢ². When all μᵢ = 0 it reduces to the standard (central) chi-squared. Used for power calculations in hypothesis testing: under the alternative hypothesis, the test statistic follows a noncentral chi-squared, and the noncentrality parameter measures effect size. Bigger λ = more power = easier to reject the null.
How does chi-squared connect to sample variance?
If X₁, ..., Xₙ are i.i.d. N(μ, σ²) and s² is the sample variance with n − 1 in the denominator, then (n − 1)s²/σ² ~ χ²(n − 1). The denominator n − 1 is the degrees of freedom — one is lost to estimating the mean. This is the basis of confidence intervals for σ²: with probability 1 − α the true variance lies in [(n−1)s²/χ²_{α/2}(n−1), (n−1)s²/χ²_{1−α/2}(n−1)]. Asymmetric interval because χ² is skewed.