Probability

Gamma Distribution

The waiting time for the α-th event in a Poisson process

The Gamma distribution Gamma(α, β) is a continuous distribution on (0, ∞) with shape α and rate β. Mean α/β, variance α/β². Contains exponential, Erlang and chi-squared as special cases.

  • Densityf(x) = β^α x^(α−1) e^(−βx) / Γ(α)
  • Supportx > 0
  • Mean / Varianceα/β · α/β²
  • α = 1exponential(β)
  • α = k/2, β = 1/2chi-squared(k)
  • Conjugate toPoisson rate

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The density

Gamma(α, β) has probability density function:

f(x; α, β) = (β^α / Γ(α)) · x^(α − 1) · e^(−βx)     for x > 0
           = 0                                          for x ≤ 0

Here α > 0 is the shape parameter and β > 0 is the rate parameter. Γ(α) is the gamma function — the normalizing constant that makes the density integrate to 1. For positive integers Γ(n) = (n − 1)!.

Some textbooks use scale θ = 1/β instead of rate β. Same distribution; just a different parametrization. The choice usually depends on the application: Bayesian statisticians prefer rate (it's the natural parameter), while engineers often prefer scale (it's the mean time per event).

Interpretation — waiting time

If events arrive according to a Poisson process with rate β (so the expected wait between events is 1/β), then the time T until the α-th event satisfies:

T = T₁ + T₂ + ... + T_α   where each Tᵢ ~ Exp(β) independently

T ~ Gamma(α, β)

This works for integer α (Erlang case). For non-integer α the distribution still makes sense as a smooth extension via the gamma function. Concrete example: in a process with rate β = 2 events/minute, the wait for the third event is Gamma(3, 2) with mean 3/2 = 1.5 minutes.

Moments

QuantityFormulaNumerical (α = 3, β = 2)
Meanα / β3/2 = 1.5
Varianceα / β²3/4 = 0.75
Std deviation√α / β√3/2 ≈ 0.866
Mode (α > 1)(α − 1) / β2/2 = 1.0
Skewness2/√α2/√3 ≈ 1.155
Excess kurtosis6/α2
MGF(1 − t/β)^(−α), t < β

Skewness and kurtosis both decrease as α grows — large-shape Gammas approach a Normal distribution (consequence of the central limit theorem on the sum of α exponentials).

Special cases — what Gamma contains

Special caseParametersDomain it dominates
Exponential(λ)α = 1, β = λMemoryless waiting time; first-event timing
Erlang(k, λ)α = k (integer), β = λk-th event in Poisson process; queueing
Chi-squared(k)α = k/2, β = 1/2Sum of k squared standard Normals; hypothesis tests
Wishart(scalar)1D specializationCovariance matrix priors in Bayesian stats
Beta(α, β) (related)X/(X+Y) where X,Y ~ GammaProportions, fractions, ratios

Five named distributions under one roof. Whenever you see exponential, Erlang, or chi-squared, you're looking at a Gamma.

Gamma vs Chi-squared vs Exponential vs Erlang

DistributionPdf formMeanVarUse case
Gamma(α, β)x^(α−1) e^(−βx) · β^α / Γ(α)α/βα/β²Generic waiting time, scale-shape
Exponential(λ)λ e^(−λx)1/λ1/λ²Memoryless waiting; lifetime
Erlang(k, λ)λ^k x^(k−1) e^(−λx)/(k−1)!k/λk/λ²k-th event in Poisson; M/M/c queues
Chi-squared(k)x^(k/2−1) e^(−x/2)/(2^(k/2)Γ(k/2))k2kSum of k squared Normals; tests
Weibull(k, λ)(k/λ)(x/λ)^(k−1) e^(−(x/λ)^k)λΓ(1+1/k)Failure analysis; heavier tails
Lognormal(μ, σ)(1/(xσ√2π))exp(−(ln x − μ)²/2σ²)e^(μ+σ²/2)Multiplicative processes; income

Exponential, Erlang, and chi-squared are nested inside Gamma. Weibull and Lognormal are alternative right-skewed distributions with different tail behavior.

Conjugacy with Poisson — Bayesian update

Place a Gamma(α, β) prior on the rate λ of a Poisson process. Observe n events in time T (data X ~ Poisson(λT)):

Prior:      λ ~ Gamma(α, β)
Likelihood: X ~ Poisson(λT)
Posterior:  λ | X = n  ~  Gamma(α + n, β + T)

Posterior mean: (α + n) / (β + T)

The shape gains the count of observed events; the rate gains the total observation time. Closed-form Bayesian inference — no MCMC. The posterior mean interpolates between prior mean α/β and the MLE n/T, with weights determined by α relative to n and β relative to T.

Sum of independent Gammas

If X₁ ~ Gamma(α₁, β) and X₂ ~ Gamma(α₂, β) are independent (with the same rate β), then:

X₁ + X₂ ~ Gamma(α₁ + α₂, β)

The shapes add; the rate stays. This is "infinite divisibility" and makes Gamma natural for additive processes. It also gives the cleanest derivation that Gamma(n, β) is the sum of n independent Exp(β) variables for integer n — known as the Erlang special case.

Where the Gamma distribution shows up

  • Queueing theory. Total service time for k customers, each served exponentially: Erlang. M/E_k/1 queues model multi-stage processing pipelines.
  • Reliability engineering. Time to k-th failure in redundant systems; component lifetime distributions when failures are Poisson.
  • Bayesian rate inference. Conjugate prior for Poisson rates — count data in clinical trials, web analytics, server load.
  • Hypothesis testing. The chi-squared special case underlies goodness-of-fit, contingency tables, variance ratio tests.
  • Hierarchical models. Gamma distributions on variance parameters (precision = 1/variance) appear throughout Bayesian hierarchical regression.
  • Insurance. Claim size distributions are often Gamma-fitted; claim aggregation in compound Poisson models.
  • Meteorology. Rainfall amounts at fixed time intervals are often right-skewed positive — Gamma fits well empirically.
  • Telecommunications. Inter-arrival times in network packet flows; Gamma-mixed exponentials capture bursty traffic.

Parameter estimation

Method of moments — fast and simple:

α̂ = (sample mean)² / (sample variance)
β̂ = sample mean / sample variance

Maximum likelihood — more efficient but requires solving for α̂:

ψ(α̂) − ln α̂ = ln(geometric mean) − ln(sample mean)
β̂ = α̂ / sample mean

(ψ is the digamma function; solve for α̂ numerically — Newton-Raphson converges fast.)

In Python, scipy.stats.gamma.fit(data) handles MLE automatically. In R, MASS::fitdistr(data, "gamma") does the same.

Common pitfalls

  • Rate vs scale confusion. R's dgamma uses rate β by default but accepts scale = 1/β. Python's scipy uses scale by default. Always check the parametrization before passing parameters across tools.
  • Trying to fit Gamma to data including zeros. Gamma is supported on (0, ∞), strict. Zero-inflated or hurdle models handle "zero plus continuous positive" data more honestly.
  • Forgetting α > 1 mode. The mode is at (α − 1)/β only if α > 1. For α ≤ 1 the mode is at 0; the density is monotonically decreasing.
  • Sum of Gammas with different rates. The sum has no closed form unless rates match. Mixed-rate sums require generalized hypergeometric functions or numerical convolution.
  • Confusing Gamma distribution with Gamma function. The function Γ(α) is the normalizing constant in the distribution's pdf, but the function itself is a separate object — Γ(α) = ∫x^{α−1}e^{−x}dx.
  • Heavy-tailed alternative confusion. Gamma has exponential tail decay. Real-world "heavy-tailed" data (income, file sizes) often need lognormal or power-law fits instead — Gamma is too light-tailed for those.

Frequently asked questions

Why are there two parameters (shape and rate)?

α (shape) controls the form of the curve; β (rate) controls the horizontal scale. For α = 1 the density is purely decreasing (exponential); for α > 1 it's unimodal with a peak at (α − 1)/β; for α between 0 and 1 it's infinite at zero and decreasing. Increasing β squeezes the distribution toward zero; decreasing β stretches it out. The mean α/β shows the trade-off: bigger α extends the peak right, bigger β pulls it left. Some textbooks parametrize by scale θ = 1/β instead — same distribution, different parameter convention.

How does Gamma relate to the exponential distribution?

Gamma(1, β) = Exponential(β). At α = 1 the density β^1 x^0 e^{−βx}/Γ(1) reduces to βe^{−βx}, the exponential pdf. Interpretation: the time until the first event in a Poisson(β) process is exponential; the time until the α-th event is Gamma(α, β). A sum of α independent Exp(β) variables is exactly Gamma(α, β) when α is a positive integer — this is the Erlang special case, used in queueing theory and reliability engineering.

How does Gamma reduce to chi-squared?

Chi-squared with k degrees of freedom is exactly Gamma(α = k/2, β = 1/2). Verify: the chi-squared pdf is (1/2)^{k/2} x^{k/2 − 1} e^{−x/2} / Γ(k/2), which is Gamma(k/2, 1/2). For k = 2 this is Exp(1/2) — mean 2. For k = 4 it's Gamma(2, 1/2), mean 4. The connection runs deep: a chi-squared is a sum of squared standard normals, and Gamma(α, β) is the conjugate prior for the rate parameter of a Poisson — both perspectives point at sums of squared/transformed quantities.

What does "conjugate prior" mean for Gamma and Poisson?

If you place a Gamma(α, β) prior on the rate λ of a Poisson process and observe n events in time T (which is Poisson(λT) data), the posterior for λ is Gamma(α + n, β + T). Same family, updated parameters — conjugate. This makes Bayesian updating closed-form for Poisson rate inference: no MCMC, no numerical integration. The shape gains the count, the rate gains the observation time. Posterior mean (α + n)/(β + T) interpolates between prior mean α/β and MLE n/T.

What is the Gamma function Γ(α)?

The normalizing constant in the Gamma pdf. Γ(α) = ∫₀^∞ x^{α−1} e^{−x} dx. For positive integers Γ(n) = (n − 1)!, generalizing factorial to real (and complex) arguments. Γ(α + 1) = α · Γ(α) is the recursion. Important values: Γ(1) = 1, Γ(1/2) = √π, Γ(2) = 1, Γ(3) = 2. The Gamma function is the natural continuous extension of factorial — proven by Bohr-Mollerup characterization to be the unique log-convex extension. Computing Γ for non-integer α requires Stirling-like approximations or numerical libraries.

How do you fit a Gamma distribution to data?

Method of moments: α̂/β̂ = sample mean, α̂/β̂² = sample variance, so α̂ = (mean)²/var, β̂ = mean/var. Quick but biased for small samples. Maximum likelihood: no closed form for α̂; solve ψ(α) − ln α = ln(geometric mean) − ln(sample mean) numerically (ψ is the digamma function), then β̂ = α̂/sample mean. ML is more efficient than method of moments. The likelihood is unimodal in α, so Newton-Raphson converges quickly. Statistical libraries (scipy.stats.gamma.fit, R fitdistr) implement this automatically.

What real-world phenomena follow a Gamma distribution?

Service times in queueing (M/G/1 queues). Time-to-failure in reliability engineering when failures follow a Poisson process. Rainfall amounts in meteorology (positive, right-skewed). Insurance claim sizes. Latency distributions in network engineering (often Gamma-mixed with other parametric forms). Gene expression levels in single-cell sequencing. Anywhere you have a positive continuous quantity that's right-skewed with a clear scale and shape, Gamma is the first parametric family to try.