Probability

Gamma Distribution

The waiting time for the α-th event in a Poisson process

The Gamma distribution Gamma(α, β) is a continuous distribution on (0, ∞) with shape α and rate β. Mean α/β, variance α/β². Contains exponential, Erlang and chi-squared as special cases.

Densityf(x) = β^α x^(α−1) e^(−βx) / Γ(α)
Supportx > 0
Mean / Varianceα/β · α/β²
α = 1exponential(β)
α = k/2, β = 1/2chi-squared(k)
Conjugate toPoisson rate

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The density

Gamma(α, β) has probability density function:

f(x; α, β) = (β^α / Γ(α)) · x^(α − 1) · e^(−βx)     for x > 0
           = 0                                          for x ≤ 0

Here α > 0 is the shape parameter and β > 0 is the rate parameter. Γ(α) is the gamma function — the normalizing constant that makes the density integrate to 1. For positive integers Γ(n) = (n − 1)!.

Some textbooks use scale θ = 1/β instead of rate β. Same distribution; just a different parametrization. The choice usually depends on the application: Bayesian statisticians prefer rate (it's the natural parameter), while engineers often prefer scale (it's the mean time per event).

Interpretation — waiting time

If events arrive according to a Poisson process with rate β (so the expected wait between events is 1/β), then the time T until the α-th event satisfies:

T = T₁ + T₂ + ... + T_α   where each Tᵢ ~ Exp(β) independently

T ~ Gamma(α, β)

This works for integer α (Erlang case). For non-integer α the distribution still makes sense as a smooth extension via the gamma function. Concrete example: in a process with rate β = 2 events/minute, the wait for the third event is Gamma(3, 2) with mean 3/2 = 1.5 minutes.

Moments

Quantity	Formula	Numerical (α = 3, β = 2)
Mean	α / β	3/2 = 1.5
Variance	α / β²	3/4 = 0.75
Std deviation	√α / β	√3/2 ≈ 0.866
Mode (α > 1)	(α − 1) / β	2/2 = 1.0
Skewness	2/√α	2/√3 ≈ 1.155
Excess kurtosis	6/α	2
MGF	(1 − t/β)^(−α), t < β	—

Skewness and kurtosis both decrease as α grows — large-shape Gammas approach a Normal distribution (consequence of the central limit theorem on the sum of α exponentials).

Special cases — what Gamma contains

Special case	Parameters	Domain it dominates
Exponential(λ)	α = 1, β = λ	Memoryless waiting time; first-event timing
Erlang(k, λ)	α = k (integer), β = λ	k-th event in Poisson process; queueing
Chi-squared(k)	α = k/2, β = 1/2	Sum of k squared standard Normals; hypothesis tests
Wishart(scalar)	1D specialization	Covariance matrix priors in Bayesian stats
Beta(α, β) (related)	X/(X+Y) where X,Y ~ Gamma	Proportions, fractions, ratios

Five named distributions under one roof. Whenever you see exponential, Erlang, or chi-squared, you're looking at a Gamma.

Gamma vs Chi-squared vs Exponential vs Erlang

Distribution	Pdf form	Mean	Var	Use case
Gamma(α, β)	x^(α−1) e^(−βx) · β^α / Γ(α)	α/β	α/β²	Generic waiting time, scale-shape
Exponential(λ)	λ e^(−λx)	1/λ	1/λ²	Memoryless waiting; lifetime
Erlang(k, λ)	λ^k x^(k−1) e^(−λx)/(k−1)!	k/λ	k/λ²	k-th event in Poisson; M/M/c queues
Chi-squared(k)	x^(k/2−1) e^(−x/2)/(2^(k/2)Γ(k/2))	k	2k	Sum of k squared Normals; tests
Weibull(k, λ)	(k/λ)(x/λ)^(k−1) e^(−(x/λ)^k)	λΓ(1+1/k)	—	Failure analysis; heavier tails
Lognormal(μ, σ)	(1/(xσ√2π))exp(−(ln x − μ)²/2σ²)	e^(μ+σ²/2)	—	Multiplicative processes; income

Exponential, Erlang, and chi-squared are nested inside Gamma. Weibull and Lognormal are alternative right-skewed distributions with different tail behavior.

Conjugacy with Poisson — Bayesian update

Place a Gamma(α, β) prior on the rate λ of a Poisson process. Observe n events in time T (data X ~ Poisson(λT)):

Prior:      λ ~ Gamma(α, β)
Likelihood: X ~ Poisson(λT)
Posterior:  λ | X = n  ~  Gamma(α + n, β + T)

Posterior mean: (α + n) / (β + T)

The shape gains the count of observed events; the rate gains the total observation time. Closed-form Bayesian inference — no MCMC. The posterior mean interpolates between prior mean α/β and the MLE n/T, with weights determined by α relative to n and β relative to T.

Sum of independent Gammas

If X₁ ~ Gamma(α₁, β) and X₂ ~ Gamma(α₂, β) are independent (with the same rate β), then:

X₁ + X₂ ~ Gamma(α₁ + α₂, β)

The shapes add; the rate stays. This is "infinite divisibility" and makes Gamma natural for additive processes. It also gives the cleanest derivation that Gamma(n, β) is the sum of n independent Exp(β) variables for integer n — known as the Erlang special case.

Where the Gamma distribution shows up

Queueing theory. Total service time for k customers, each served exponentially: Erlang. M/E_k/1 queues model multi-stage processing pipelines.
Reliability engineering. Time to k-th failure in redundant systems; component lifetime distributions when failures are Poisson.
Bayesian rate inference. Conjugate prior for Poisson rates — count data in clinical trials, web analytics, server load.
Hypothesis testing. The chi-squared special case underlies goodness-of-fit, contingency tables, variance ratio tests.
Hierarchical models. Gamma distributions on variance parameters (precision = 1/variance) appear throughout Bayesian hierarchical regression.
Insurance. Claim size distributions are often Gamma-fitted; claim aggregation in compound Poisson models.
Meteorology. Rainfall amounts at fixed time intervals are often right-skewed positive — Gamma fits well empirically.
Telecommunications. Inter-arrival times in network packet flows; Gamma-mixed exponentials capture bursty traffic.

Parameter estimation

Method of moments — fast and simple:

α̂ = (sample mean)² / (sample variance)
β̂ = sample mean / sample variance

Maximum likelihood — more efficient but requires solving for α̂:

ψ(α̂) − ln α̂ = ln(geometric mean) − ln(sample mean)
β̂ = α̂ / sample mean

(ψ is the digamma function; solve for α̂ numerically — Newton-Raphson converges fast.)

In Python, scipy.stats.gamma.fit(data) handles MLE automatically. In R, MASS::fitdistr(data, "gamma") does the same.

Common pitfalls

Rate vs scale confusion. R's dgamma uses rate β by default but accepts scale = 1/β. Python's scipy uses scale by default. Always check the parametrization before passing parameters across tools.
Trying to fit Gamma to data including zeros. Gamma is supported on (0, ∞), strict. Zero-inflated or hurdle models handle "zero plus continuous positive" data more honestly.
Forgetting α > 1 mode. The mode is at (α − 1)/β only if α > 1. For α ≤ 1 the mode is at 0; the density is monotonically decreasing.
Sum of Gammas with different rates. The sum has no closed form unless rates match. Mixed-rate sums require generalized hypergeometric functions or numerical convolution.
Confusing Gamma distribution with Gamma function. The function Γ(α) is the normalizing constant in the distribution's pdf, but the function itself is a separate object — Γ(α) = ∫x^{α−1}e^{−x}dx.
Heavy-tailed alternative confusion. Gamma has exponential tail decay. Real-world "heavy-tailed" data (income, file sizes) often need lognormal or power-law fits instead — Gamma is too light-tailed for those.

Frequently asked questions

Why are there two parameters (shape and rate)?

α (shape) controls the form of the curve; β (rate) controls the horizontal scale. For α = 1 the density is purely decreasing (exponential); for α > 1 it's unimodal with a peak at (α − 1)/β; for α between 0 and 1 it's infinite at zero and decreasing. Increasing β squeezes the distribution toward zero; decreasing β stretches it out. The mean α/β shows the trade-off: bigger α extends the peak right, bigger β pulls it left. Some textbooks parametrize by scale θ = 1/β instead — same distribution, different parameter convention.

How does Gamma relate to the exponential distribution?

Gamma(1, β) = Exponential(β). At α = 1 the density β^1 x^0 e^{−βx}/Γ(1) reduces to βe^{−βx}, the exponential pdf. Interpretation: the time until the first event in a Poisson(β) process is exponential; the time until the α-th event is Gamma(α, β). A sum of α independent Exp(β) variables is exactly Gamma(α, β) when α is a positive integer — this is the Erlang special case, used in queueing theory and reliability engineering.

How does Gamma reduce to chi-squared?

Chi-squared with k degrees of freedom is exactly Gamma(α = k/2, β = 1/2). Verify: the chi-squared pdf is (1/2)^{k/2} x^{k/2 − 1} e^{−x/2} / Γ(k/2), which is Gamma(k/2, 1/2). For k = 2 this is Exp(1/2) — mean 2. For k = 4 it's Gamma(2, 1/2), mean 4. The connection runs deep: a chi-squared is a sum of squared standard normals, and Gamma(α, β) is the conjugate prior for the rate parameter of a Poisson — both perspectives point at sums of squared/transformed quantities.

What does "conjugate prior" mean for Gamma and Poisson?

If you place a Gamma(α, β) prior on the rate λ of a Poisson process and observe n events in time T (which is Poisson(λT) data), the posterior for λ is Gamma(α + n, β + T). Same family, updated parameters — conjugate. This makes Bayesian updating closed-form for Poisson rate inference: no MCMC, no numerical integration. The shape gains the count, the rate gains the observation time. Posterior mean (α + n)/(β + T) interpolates between prior mean α/β and MLE n/T.

What is the Gamma function Γ(α)?

The normalizing constant in the Gamma pdf. Γ(α) = ∫₀^∞ x^{α−1} e^{−x} dx. For positive integers Γ(n) = (n − 1)!, generalizing factorial to real (and complex) arguments. Γ(α + 1) = α · Γ(α) is the recursion. Important values: Γ(1) = 1, Γ(1/2) = √π, Γ(2) = 1, Γ(3) = 2. The Gamma function is the natural continuous extension of factorial — proven by Bohr-Mollerup characterization to be the unique log-convex extension. Computing Γ for non-integer α requires Stirling-like approximations or numerical libraries.

How do you fit a Gamma distribution to data?

Method of moments: α̂/β̂ = sample mean, α̂/β̂² = sample variance, so α̂ = (mean)²/var, β̂ = mean/var. Quick but biased for small samples. Maximum likelihood: no closed form for α̂; solve ψ(α) − ln α = ln(geometric mean) − ln(sample mean) numerically (ψ is the digamma function), then β̂ = α̂/sample mean. ML is more efficient than method of moments. The likelihood is unimodal in α, so Newton-Raphson converges quickly. Statistical libraries (scipy.stats.gamma.fit, R fitdistr) implement this automatically.

What real-world phenomena follow a Gamma distribution?

Service times in queueing (M/G/1 queues). Time-to-failure in reliability engineering when failures follow a Poisson process. Rainfall amounts in meteorology (positive, right-skewed). Insurance claim sizes. Latency distributions in network engineering (often Gamma-mixed with other parametric forms). Gene expression levels in single-cell sequencing. Anywhere you have a positive continuous quantity that's right-skewed with a clear scale and shape, Gamma is the first parametric family to try.