Statistics

Normal Distribution

The bell curve — N(μ, σ²) — emerges from sums of anything

The normal distribution (bell curve) is the most common probability distribution in nature and statistics. Defined by mean μ and variance σ², its density is e^(−(x−μ)²/(2σ²)) / (σ√(2π)). The 68-95-99.7 rule says that 68% of values fall within 1σ of μ, 95% within 2σ, 99.7% within 3σ. The Central Limit Theorem makes it appear everywhere — sums of many small effects.

  • PDFf(x) = (1/(σ√(2π))) · exp(−(x−μ)²/(2σ²))
  • NotationX ~ N(μ, σ²)
  • 68-95-99.7 rule68% within 1σ; 95% within 2σ; 99.7% within 3σ
  • Standard normalμ = 0, σ = 1 — Z = (X − μ) / σ
  • Why ubiquitousCentral Limit Theorem — sums approach normal
  • First studied byDe Moivre (1733), Gauss (1809), Laplace

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The probability density function

The normal distribution with mean μ and variance σ² has density:

f(x) = (1 / (σ · √(2π))) · exp(−(x − μ)² / (2σ²))

The bell curve is symmetric about μ, with width controlled by σ. The factor 1/(σ·√(2π)) ensures the integral over all x is 1 (a valid probability density).

The standard normal distribution N(0, 1) — mean 0, variance 1 — has the simpler form:

φ(x) = (1/√(2π)) · exp(−x²/2)

Any normal X ~ N(μ, σ²) can be transformed to standard normal Z by:

Z = (X − μ) / σ

Z follows the standard normal distribution. This transformation is called standardization or computing a Z-score.

The 68-95-99.7 rule

For a normal distribution, the proportion of values within k standard deviations of the mean is approximately:

RangeProbabilityUseful for
Within 1σ (μ − σ, μ + σ)68.3%"Typical" range
Within 2σ95.4%"Almost all" — common alpha = 0.05 cutoff
Within 3σ99.7%Six Sigma quality (rare deviations)
Within 4σ99.9937%Very rare — 1 in 16,000
Within 5σ99.99994%Particle physics discovery threshold
Within 6σ99.9999998%1 in 500 million — Six Sigma quality

Useful for quick mental statistics. If you know μ and σ, you can estimate the probability of any value range.

Why the normal appears everywhere

The Central Limit Theorem — sums of many independent random variables tend toward normal, regardless of the originating distributions (provided they have finite mean and variance).

Concrete examples:

  • Heights. A height is determined by many genes plus environmental factors. The combined effect, by CLT, is approximately normal. Adult human heights follow a normal distribution closely.
  • Test scores. A score is a sum of many question results. Each is binary; the sum becomes normal for many questions. Standardized tests are designed to produce normal-shaped score distributions.
  • Measurement errors. Each measurement error is a sum of many small independent factors. Their cumulative effect is normal — Gauss derived the formula assuming this.
  • Brownian motion. A particle's position is the sum of many small random kicks (collisions with surrounding molecules). Position over time is normally distributed.
  • Sample means. Even from a non-normal underlying distribution, the average of n samples is approximately normal for n large enough (n ≥ 30 is the rule of thumb). This is why hypothesis testing for sample means uses normal-based math.

Worked examples

Example 1 — IQ scores (μ=100, σ=15)

P(IQ > 130)? IQ 130 is 2σ above mean. P(Z > 2) = (1 − P(Z ≤ 2)) ≈ (1 − 0.9772) = 0.0228 = 2.28%.

P(IQ < 70)? Symmetrically, 2σ below — same 2.28%.

P(IQ in [85, 115])? Within 1σ — 68.3%.

Example 2 — Quality control

A factory produces bolts with mean diameter 10 mm, σ = 0.1 mm. Bolts outside 9.7 to 10.3 mm fail inspection. What fraction fail?

9.7 is 3σ below; 10.3 is 3σ above. By the 99.7% rule, 99.7% pass. So 0.3% fail.

If you tightened to ±2σ (9.8 to 10.2), 95.4% pass — 4.6% fail. Tighter standards mean more rejects.

Example 3 — Z-score in hypothesis testing

Average human heart rate is 72 bpm. Standard deviation 12 bpm. Someone has heart rate 90 bpm. How unusual?

Z = (90 − 72)/12 = 1.5σ above mean. P(Z > 1.5) ≈ 0.0668 = 6.68%. Not extreme; about 1 in 15 people have heart rates above 90 bpm.

JavaScript — normal distribution functions

// Probability density function
function normalPDF(x, mu = 0, sigma = 1) {
  const z = (x - mu) / sigma;
  return Math.exp(-z*z/2) / (sigma * Math.sqrt(2 * Math.PI));
}

// Cumulative distribution function — using error function approximation
function normalCDF(x, mu = 0, sigma = 1) {
  const z = (x - mu) / sigma;
  return 0.5 * (1 + erf(z / Math.SQRT2));
}

// Error function — approximation accurate to ~6 decimals
function erf(x) {
  const sign = x >= 0 ? 1 : -1;
  x = Math.abs(x);
  const t = 1 / (1 + 0.3275911 * x);
  const a1 =  0.254829592;
  const a2 = -0.284496736;
  const a3 =  1.421413741;
  const a4 = -1.453152027;
  const a5 =  1.061405429;
  const y = 1 - (((((a5*t + a4)*t) + a3)*t + a2)*t + a1)*t * Math.exp(-x*x);
  return sign * y;
}

// Generate samples — Box-Muller transform
function normalSample(mu = 0, sigma = 1) {
  const u1 = Math.random();
  const u2 = Math.random();
  const z = Math.sqrt(-2 * Math.log(u1)) * Math.cos(2 * Math.PI * u2);
  return mu + sigma * z;
}

// Verify 68-95-99.7
const samples = Array.from({length: 100000}, () => normalSample(0, 1));
const within1 = samples.filter(z => Math.abs(z) < 1).length / samples.length;
const within2 = samples.filter(z => Math.abs(z) < 2).length / samples.length;
const within3 = samples.filter(z => Math.abs(z) < 3).length / samples.length;
console.log(within1, within2, within3);  // ≈ 0.683, 0.954, 0.997

When to use the normal distribution

  • Sums of many factors. Heights, weights, test scores, measurement errors, sample means — anything that's a sum of many small contributions tends toward normal.
  • Hypothesis testing. Z-tests, t-tests, ANOVA — all assume normality (or normality of the test statistic via the Central Limit Theorem).
  • Quality control. Process variation modeling, control charts, Six Sigma. Manufacturing aims for normal-distributed deviations from target.
  • Finance. Daily stock returns are approximately normal in central regions (though heavy tails complicate this; modern finance uses fat-tailed models).
  • Machine learning. Many algorithms (linear regression assumptions, Gaussian mixtures, Gaussian processes) are normal-based.

When NOT to use — heavy-tailed distributions (income, file sizes), bounded distributions (probabilities, percentages), counts (use Poisson), small sample sizes where the normality assumption is unclear.

Common mistakes

  • Assuming normality without checking. Many real distributions look "normal-ish" but aren't. Check with a histogram, Q-Q plot, or Shapiro-Wilk test before applying normal-based methods.
  • Confusing standard deviation with standard error. SD describes spread of individual values. SE describes spread of the sample mean (= SD/√n). For confidence intervals on means, you use SE; for individual prediction intervals, SD.
  • Treating the 68-95-99.7 rule as a 100% guarantee. For non-normal distributions, the rule doesn't hold. Heavy-tailed distributions have more extreme values than normal predicts.
  • Wrong interpretation of "outside 3σ is rare." Rare under normal assumption. Real-world distributions often have many more extreme values than normal predicts. Black Swan events (Taleb) are about the heavy tail of the actual distribution vs the normal idealization.
  • Using normal CDF without standardizing. The CDF table is for the standard normal N(0, 1). For X ~ N(μ, σ²), compute Z = (X−μ)/σ first, then look up Φ(Z).
  • Forgetting that variance is in squared units. Var(height in meters) is in m². For interpretation, use σ = √Var (in meters). Confusing the two leads to wrong magnitudes.

Frequently asked questions

Why does the normal distribution appear so often?

The Central Limit Theorem. When you sum many independent random variables (with finite variance), the sum tends toward a normal distribution regardless of the original distributions. Heights, test scores, measurement errors, particle motion — all are sums of many small effects, and their distributions approach normal. The bell curve is the universal limit of "lots of small additive randomness."

What's the 68-95-99.7 rule?

For a normal distribution — 68.3% of values fall within 1 standard deviation of the mean; 95.4% within 2σ; 99.7% within 3σ. Useful for quick mental estimates. If your data is normal with μ = 100 and σ = 15, about 68% of values are between 85 and 115, 95% between 70 and 130. Outside 3σ is rare (0.27% probability).

What's a Z-score?

A standardized value — Z = (X − μ) / σ. It expresses how many standard deviations X is from the mean. Allows comparing across distributions with different scales. Z = 2 means "2σ above mean" regardless of whether σ = 1 or σ = 100. Used in hypothesis testing — Z &gt; 1.96 corresponds to two-tailed p-value &lt; 0.05.

Why does the normal distribution have that exact formula?

It's the unique distribution that maximizes entropy given specified mean and variance. It's the limit of the binomial distribution as n → ∞ (De Moivre 1733). It's the solution to the heat equation. It's the distribution closed under addition and convolution. Multiple deep characterizations all yield the same bell curve — there's no other "natural" choice for finite-mean-and-variance maximum-entropy distributions on the real line.

When does the normal NOT apply?

Skewed distributions (income, file sizes, web traffic — heavy right tails). Bounded distributions (proportions between 0 and 1). Discrete distributions (counts of events). When sums aren't representative — financial returns have heavier tails than normal (Mandelbrot showed this). For these, log-normal, Poisson, beta, or t-distributions fit better.

What's the standard normal distribution?

N(0, 1) — mean 0, standard deviation 1. Any normal X ~ N(μ, σ²) can be standardized to Z = (X − μ) / σ ~ N(0, 1). Standard normal is what tables and software functions are typically tabulated for; transform back as needed. Z-scores are positions in the standard normal.

How do you compute probabilities for normal distributions?

For a normal X ~ N(μ, σ²) and a value x, compute Z = (x − μ) / σ, then look up Φ(Z) — the cumulative standard normal up to Z — in a table or via Math.erf and similar functions. P(X &lt; x) = Φ(Z). For ranges, P(a &lt; X &lt; b) = Φ((b−μ)/σ) − Φ((a−μ)/σ). The CDF Φ has no closed-form expression in elementary functions but is well-tabulated and approximable.