Statistics

Chi-Squared Distribution

Sum of k squared standard normals — mean k, variance 2k

The chi-squared distribution χ²(k) is the sum of k independent squared standard-normal variables. Mean k, variance 2k. Powers goodness-of-fit tests, contingency-table independence tests, and variance estimation.

  • Definitionχ²(k) = Z₁² + Z₂² + ... + Z_k², Zᵢ ~ N(0,1)
  • Meank
  • Variance2k
  • Densityx^(k/2−1) e^(−x/2) / (2^(k/2) Γ(k/2))
  • Special case ofGamma(k/2, 1/2)
  • AuthorsHelmert 1876, Pearson 1900

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The definition

Let Z₁, Z₂, …, Z_k be independent standard normal random variables (mean 0, variance 1). The sum of their squares:

X = Z₁² + Z₂² + ... + Z_k²

X ~ χ²(k)    "chi-squared with k degrees of freedom"

k is the degrees of freedom. The density is:

f(x) = x^(k/2 − 1) e^(−x/2) / (2^(k/2) Γ(k/2))    for x > 0
     = 0                                            for x ≤ 0

This is exactly a Gamma distribution with shape α = k/2 and rate β = 1/2 — chi-squared is a special case of Gamma.

Mean and variance

For Z standard normal: E[Z²] = Var(Z) = 1. So summing k independent squared standards:

E[χ²(k)] = k · 1 = k
Var(χ²(k)) = k · Var(Z²) = k · 2 = 2k

(using E[Z⁴] = 3 ⇒ Var(Z²) = 3 − 1 = 2)

Standard deviation is √(2k) — grows slower than mean. Concretely:

kMeanVarianceMode (k≥2)Std dev
11201.41
22402.00
551033.16
10102084.47
303060287.75
1001002009814.14

For k ≥ 2 the mode is at k − 2 (positive only when k > 2). For k = 1 and k = 2 the density is monotone decreasing.

Pearson's goodness-of-fit test

Suppose you observe counts O₁, O₂, …, O_k across k categories and want to test whether they match expected counts E₁, E₂, …, E_k under some hypothesis. The chi-squared statistic:

X² = Σ (Oᵢ − Eᵢ)² / Eᵢ

Under the null hypothesis, X² approximately follows χ²(k − 1 − p) where p is the number of free parameters estimated from the data.

Worked example — testing dice fairness. Roll a die 60 times, observe (8, 12, 9, 11, 7, 13) hits per face. Expected count under uniformity: 10 per face. Compute:

X² = (8-10)²/10 + (12-10)²/10 + (9-10)²/10 + (11-10)²/10 + (7-10)²/10 + (13-10)²/10
   = 4/10 + 4/10 + 1/10 + 1/10 + 9/10 + 9/10
   = 28/10 = 2.8

df = 6 - 1 = 5
χ² critical at α=0.05 with 5 df: 11.07
X² = 2.8 < 11.07 → fail to reject; consistent with fair die.

If the observed X² had been 12, we'd reject at the 0.05 level — strong evidence the die is biased.

Contingency tables — testing independence

For a contingency table with r rows and c columns, expected count Eᵢⱼ = (row total)(col total)/(grand total). The statistic is the same:

X² = Σᵢ,ⱼ (Oᵢⱼ − Eᵢⱼ)² / Eᵢⱼ

df = (r − 1)(c − 1)

For a 2×3 table, df = (2−1)(3−1) = 2. For a 5×5 table, df = 16. Each constraint (row sums, column sums) costs degrees of freedom.

Chi-squared vs related distributions

DistributionFormMeanVarianceUse case
χ²(k)Σ Zᵢ², Z ~ N(0,1)k2kGoodness-of-fit, variance ratio
χ²(k, λ)Σ Zᵢ², Z ~ N(μᵢ, 1)k + λ2(k + 2λ)Power calculations
Gamma(α, β)α/βα/β²Generic positive RV; chi² = Gamma(k/2, 1/2)
Exponential(λ)1/λ1/λ²χ²(2) ~ Exp(1/2)
t(ν)Z/√(χ²(ν)/ν)0 (ν>1)ν/(ν−2)Small-sample means
F(m, n)(χ²(m)/m)/(χ²(n)/n)n/(n−2)Variance ratio, ANOVA

Chi-squared is the foundation; t and F are constructed from chi-squareds and standard normals.

Connection to sample variance

If X₁, …, Xₙ are i.i.d. N(μ, σ²) and s² = Σ(Xᵢ − X̄)²/(n − 1) is the sample variance, then:

(n − 1) s² / σ² ~ χ²(n − 1)

Why n − 1, not n? Because one degree of freedom is used estimating μ via X̄. This relationship is what makes chi-squared central to confidence intervals for σ²:

95% CI for σ²:  [(n−1)s²/χ²_{0.975}(n−1),  (n−1)s²/χ²_{0.025}(n−1)]

The interval is asymmetric because chi-squared is skewed. For n = 20 and s² = 4, with χ²_{0.025}(19) = 32.85 and χ²_{0.975}(19) = 8.91, the 95% CI for σ² is [19·4/32.85, 19·4/8.91] = [2.31, 8.53].

Where chi-squared shows up

  • Pearson's goodness-of-fit. Test whether categorical data follows a hypothesized distribution. The most-used statistical test ever invented.
  • Contingency tables. Test independence of two categorical variables — used everywhere from medicine to A/B testing.
  • Likelihood ratio tests (Wilks's theorem). Twice the log-likelihood ratio under the null is asymptotically χ²(df) where df is the difference in parameter counts.
  • Variance estimation. Confidence intervals for σ² use the chi-squared distribution of the scaled sample variance.
  • ANOVA. Sum-of-squares decomposition; treatment and error sums of squares both have chi-squared distributions under the null.
  • Linear regression diagnostics. Residual sum of squares divided by σ² is chi-squared with n − p degrees of freedom (p is number of estimated coefficients).
  • Mahalanobis distance. For multivariate normal data, (x − μ)ᵀΣ⁻¹(x − μ) is χ²(d) — used for outlier detection.
  • Hidden Markov models. Likelihood ratio chi-squared tests compare nested HMM architectures.

Useful approximations

For large k, chi-squared approaches Normal by the central limit theorem:

(χ²(k) − k) / √(2k) → N(0, 1)    as k → ∞

A sharper approximation due to Fisher: √(2χ²(k)) − √(2k − 1) is approximately N(0, 1). This is more accurate for moderate k (say k ≥ 10) than the direct Normal approximation.

Common pitfalls

  • Expected counts too small. The chi-squared approximation fails when Eᵢ < 5. Use Fisher's exact test or pool categories with sparse counts.
  • Forgetting to adjust df for estimated parameters. If you estimate p parameters from the data (e.g., fitting a Normal's μ and σ before testing fit), subtract p from the degrees of freedom.
  • Confusing one-sided and two-sided tests. Chi-squared goodness-of-fit is inherently one-sided (reject only for large X²); low X² values indicate good fit, not bad.
  • Using chi-squared for ordered categories. Chi-squared treats categories as nominal — ordered alternatives (e.g., Cochran-Armitage trend test) are more powerful when ordering is meaningful.
  • Yates's continuity correction. For 2×2 tables, Yates's correction subtracts 0.5 from |O − E| before squaring — useful for small n, conservative for large n.
  • Mistaking p-value direction. Reject the null when X² is large (top tail). A small p-value comes from a large test statistic.

History

Friedrich Helmert derived the distribution of the sample variance from normal data in 1876 — the first appearance of chi-squared in the literature. Karl Pearson rediscovered and named it in his 1900 paper "On the criterion that a given system of deviations from the probable...", introducing chi-squared goodness-of-fit testing in the same paper. R. A. Fisher's later work (1922, 1925) clarified the degrees-of-freedom adjustment when parameters are estimated. Chi-squared is one of the few distributions named after a Greek letter rather than a person — the "chi" came from Pearson's notation, χ², not from any mathematician's surname.

Frequently asked questions

What does "degrees of freedom" mean for chi-squared?

Degrees of freedom k is the number of independent standard normals being squared and summed. For k = 1, χ² is the distribution of Z² for a single standard normal — sharply peaked at 0. For k = 10, χ² is the sum of 10 squared standard normals — mean 10, variance 20, much more spread. In statistical tests, degrees of freedom usually equals (number of categories) − (number of parameters estimated) − 1, capturing how much the test statistic is constrained by the data.

How is chi-squared used in goodness-of-fit testing?

Pearson's chi-squared statistic is X² = Σ (Oᵢ − Eᵢ)² / Eᵢ summed over categories, where Oᵢ is observed count and Eᵢ is expected count under the null hypothesis. Under the null, X² approximately follows χ²(k − 1 − p) where k is the number of categories and p is the number of parameters estimated. If X² exceeds the critical value (e.g., 11.07 at α = 0.05 for 5 df), reject the null. Famous example: testing whether dice are fair — if the chi-squared with 5 df exceeds 11.07, the dice are biased with 95% confidence.

Why mean k and variance 2k?

If Z ~ N(0,1) then E[Z²] = 1 and Var(Z²) = E[Z⁴] − (E[Z²])² = 3 − 1 = 2 (using the fact that the fourth moment of a standard normal is 3). Summing k independent squared standards: E[χ²ₖ] = k · 1 = k, Var(χ²ₖ) = k · 2 = 2k. Note that the standard deviation grows like √(2k), much slower than the mean — so as k grows, χ²(k) becomes relatively more concentrated around k. By the CLT, (χ²(k) − k)/√(2k) → N(0,1) as k → ∞.

How does chi-squared relate to the t and F distributions?

Student's t with ν degrees of freedom is Z / √(χ²(ν)/ν) where Z is standard normal independent of the chi-squared. Snedecor's F with (m, n) degrees of freedom is (χ²(m)/m) / (χ²(n)/n) — a ratio of independent chi-squareds divided by their degrees of freedom. So chi-squared is the building block: t for testing means with unknown variance, F for testing ratios of variances or comparing nested models. Chi-squared, t, and F together form the trinity of classical statistical testing.

When does the chi-squared approximation break down?

When expected counts in any cell are too small (rule of thumb: Eᵢ < 5). With small expected counts, the binomial counts are far from Normal, so their standardized squares don't sum to a true chi-squared. Use Fisher's exact test for small contingency tables, or pool sparse cells, or use Monte Carlo simulation to get the exact p-value. The chi-squared approximation is asymptotic; n must be large enough for the CLT to kick in. For 2×2 tables, Yates's continuity correction subtracts 0.5 from |O − E| before squaring.

What is the noncentral chi-squared?

If Z₁, ..., Z_k are independent with Zᵢ ~ N(μᵢ, 1), then Σ Zᵢ² ~ χ²(k, λ) — noncentral chi-squared with noncentrality λ = Σ μᵢ². When all μᵢ = 0 it reduces to the standard (central) chi-squared. Used for power calculations in hypothesis testing: under the alternative hypothesis, the test statistic follows a noncentral chi-squared, and the noncentrality parameter measures effect size. Bigger λ = more power = easier to reject the null.

How does chi-squared connect to sample variance?

If X₁, ..., Xₙ are i.i.d. N(μ, σ²) and s² is the sample variance with n − 1 in the denominator, then (n − 1)s²/σ² ~ χ²(n − 1). The denominator n − 1 is the degrees of freedom — one is lost to estimating the mean. This is the basis of confidence intervals for σ²: with probability 1 − α the true variance lies in [(n−1)s²/χ²_{α/2}(n−1), (n−1)s²/χ²_{1−α/2}(n−1)]. Asymmetric interval because χ² is skewed.