Question 1

Why use t instead of Normal for small samples?

Accepted Answer

Because you're estimating two things, not one. The standardized mean (X̄ − μ)/(σ/√n) is Normal — but σ is unknown, so you replace it with the sample standard deviation s. The ratio (X̄ − μ)/(s/√n) now has TWO sources of randomness: the numerator and the denominator. The denominator s is itself a random variable; small n makes s wobbly; the wobbliness shows up as heavier tails in the resulting distribution. That's the t distribution. For ν = 5, the 2.5% critical value is 2.57 versus the Normal's 1.96 — significantly wider intervals, reflecting the genuine extra uncertainty.

Question 2

When does t become essentially Normal?

Accepted Answer

By df = 30, t and Normal differ by less than 1% at typical critical values. By df = 100, they're identical to four decimal places. Rule of thumb: use t for n ≤ 30, switch to Normal (Z-tables) for n > 30. The transition is gradual: at df = 10, the 95% critical value is 2.23 vs Normal's 1.96 (12% wider). At df = 30, it's 2.04 (4% wider). At df = 60, it's 2.00 (2% wider). The convergence is monotone — t tails always thicker than Normal but shrinking with df.

Question 3

Who was "Student" and why the pseudonym?

Accepted Answer

William Sealy Gosset (1876–1937), a chemist at the Guinness brewery in Dublin. He developed t-tests to handle small-batch quality control on beer ingredients — Guinness's trade secret. Guinness forbade publication of trade-related research, so Gosset published under the pseudonym 'Student' in Biometrika (1908). Karl Pearson edited the journal and oversaw publication. Fisher generalized Gosset's work in the 1920s and gave us the modern t-test framework. The brewery's prohibition on publication is the reason an entire branch of statistics is named after an anonymous student rather than a real person.

Question 4

What is the variance of the t-distribution?

Accepted Answer

Var(t(ν)) = ν/(ν − 2) for ν > 2; undefined for ν ≤ 2. The variance blows up as ν → 2 from above, reflecting the heavier tails. For ν = 3, variance is 3; for ν = 5, variance is 5/3 ≈ 1.67; for ν = 30, variance is 30/28 ≈ 1.07 (close to Normal's 1). The mean is 0 only for ν > 1 — for ν = 1 (Cauchy distribution!) the mean doesn't exist due to heavy tails. So t is fully well-defined only for ν > 2 if you want both finite mean and variance.

Question 5

What's the t-test, exactly?

Accepted Answer

One-sample t-test for mean μ₀: compute t = (X̄ − μ₀) / (s/√n). Under the null (true mean is μ₀), t follows t(n − 1). Reject the null if |t| exceeds the critical value (e.g., for n = 10 and α = 0.05 two-tailed, critical value is 2.26 from t(9) tables). Variants: paired t-test (differences for matched pairs), independent two-sample t-test (Welch's or Student's pooled-variance versions). The t-test is the workhorse of small-sample mean comparison — biology, psychology, A/B testing, clinical trials.

Question 6

How is t(ν) constructed from Normal and chi-squared?

Accepted Answer

Let Z be standard normal and V be independent chi-squared with ν degrees of freedom. Then T = Z / √(V/ν) has the t(ν) distribution. Verify: in (X̄ − μ)/(s/√n), the numerator standardized is Z ~ N(0,1), and (n − 1)s²/σ² ~ χ²(n − 1). So (X̄ − μ)/(s/√n) = Z / √(χ²(n−1)/(n−1)) — exactly t(n − 1). The chi-squared in the denominator captures the wobbliness of the sample standard deviation; the longer ν, the more concentrated the denominator, the closer to Normal.

Question 7

What's the difference between t and Cauchy?

Accepted Answer

t(1) IS the Cauchy distribution. At ν = 1 the t-distribution has so-heavy tails that even the mean doesn't exist — the integral ∫t · f(t) dt diverges. As ν increases the tails progressively lighten: ν = 2 has mean 0 but infinite variance; ν > 2 has finite variance; ν → ∞ converges to Normal. The t family is a continuous interpolation from Cauchy (ν = 1, fully heavy-tailed) to Normal (ν = ∞, fully light-tailed). The degrees of freedom parameter literally tunes the tail weight.

Quantity	Formula	ν=3	ν=10	ν=30
Mean	0 (for ν > 1)	0	0	0
Variance	ν/(ν − 2) (for ν > 2)	3	1.25	1.07
Skewness	0 (symmetric)	0	0	0
Excess kurtosis	6/(ν − 4)	—	1.0	0.23
97.5% critical (2-sided)	—	3.18	2.23	2.04

ν	Variance	97.5% critical	Heavier-tailed than Normal by
1 (Cauchy)	∞	12.71	6.5× wider
2	∞	4.30	2.2× wider
5	1.67	2.57	31% wider
10	1.25	2.23	14% wider
30	1.07	2.04	4% wider
100	1.02	1.98	1% wider
∞ (Normal)	1.00	1.96	—

Distribution	Tails	Mean	Variance	Use when
Normal N(0,1)	e^{−x²/2} (very light)	0	1	Large samples, known σ
t(30)	~Normal (slightly heavier)	0	30/28	n > 30, σ unknown
t(5)	polynomial decay, ν=5	0	5/3	n ≈ 6–10, σ unknown
t(2)	polynomial, infinite var	0	∞	Heavy-tailed test stat, robust inference
t(1) = Cauchy	1/(π(1+x²)), mean undefined	—	∞	Pathological / sub-CLT models
Laplace	e^{−\|x\|} (medium-heavy)	0	2	L1 regression, robust estimators

Student's t-Distribution

Watch the 60-second explainer

The setup — and why we need t at all

Construction from Normal and chi-squared

Density and moments

t → Normal as ν → ∞

The t-test — by example

t vs Normal vs Cauchy

Confidence intervals — wider for small n

Where the t-distribution shows up

Common pitfalls

The Guinness story

Frequently asked questions