Question 1

Why are there two parameters (shape and rate)?

Accepted Answer

α (shape) controls the form of the curve; β (rate) controls the horizontal scale. For α = 1 the density is purely decreasing (exponential); for α > 1 it's unimodal with a peak at (α − 1)/β; for α between 0 and 1 it's infinite at zero and decreasing. Increasing β squeezes the distribution toward zero; decreasing β stretches it out. The mean α/β shows the trade-off: bigger α extends the peak right, bigger β pulls it left. Some textbooks parametrize by scale θ = 1/β instead — same distribution, different parameter convention.

Question 2

How does Gamma relate to the exponential distribution?

Accepted Answer

Gamma(1, β) = Exponential(β). At α = 1 the density β^1 x^0 e^{−βx}/Γ(1) reduces to βe^{−βx}, the exponential pdf. Interpretation: the time until the first event in a Poisson(β) process is exponential; the time until the α-th event is Gamma(α, β). A sum of α independent Exp(β) variables is exactly Gamma(α, β) when α is a positive integer — this is the Erlang special case, used in queueing theory and reliability engineering.

Question 3

How does Gamma reduce to chi-squared?

Accepted Answer

Chi-squared with k degrees of freedom is exactly Gamma(α = k/2, β = 1/2). Verify: the chi-squared pdf is (1/2)^{k/2} x^{k/2 − 1} e^{−x/2} / Γ(k/2), which is Gamma(k/2, 1/2). For k = 2 this is Exp(1/2) — mean 2. For k = 4 it's Gamma(2, 1/2), mean 4. The connection runs deep: a chi-squared is a sum of squared standard normals, and Gamma(α, β) is the conjugate prior for the rate parameter of a Poisson — both perspectives point at sums of squared/transformed quantities.

Question 4

What does "conjugate prior" mean for Gamma and Poisson?

Accepted Answer

If you place a Gamma(α, β) prior on the rate λ of a Poisson process and observe n events in time T (which is Poisson(λT) data), the posterior for λ is Gamma(α + n, β + T). Same family, updated parameters — conjugate. This makes Bayesian updating closed-form for Poisson rate inference: no MCMC, no numerical integration. The shape gains the count, the rate gains the observation time. Posterior mean (α + n)/(β + T) interpolates between prior mean α/β and MLE n/T.

Question 5

What is the Gamma function Γ(α)?

Accepted Answer

The normalizing constant in the Gamma pdf. Γ(α) = ∫₀^∞ x^{α−1} e^{−x} dx. For positive integers Γ(n) = (n − 1)!, generalizing factorial to real (and complex) arguments. Γ(α + 1) = α · Γ(α) is the recursion. Important values: Γ(1) = 1, Γ(1/2) = √π, Γ(2) = 1, Γ(3) = 2. The Gamma function is the natural continuous extension of factorial — proven by Bohr-Mollerup characterization to be the unique log-convex extension. Computing Γ for non-integer α requires Stirling-like approximations or numerical libraries.

Question 6

How do you fit a Gamma distribution to data?

Accepted Answer

Method of moments: α̂/β̂ = sample mean, α̂/β̂² = sample variance, so α̂ = (mean)²/var, β̂ = mean/var. Quick but biased for small samples. Maximum likelihood: no closed form for α̂; solve ψ(α) − ln α = ln(geometric mean) − ln(sample mean) numerically (ψ is the digamma function), then β̂ = α̂/sample mean. ML is more efficient than method of moments. The likelihood is unimodal in α, so Newton-Raphson converges quickly. Statistical libraries (scipy.stats.gamma.fit, R fitdistr) implement this automatically.

Question 7

What real-world phenomena follow a Gamma distribution?

Accepted Answer

Service times in queueing (M/G/1 queues). Time-to-failure in reliability engineering when failures follow a Poisson process. Rainfall amounts in meteorology (positive, right-skewed). Insurance claim sizes. Latency distributions in network engineering (often Gamma-mixed with other parametric forms). Gene expression levels in single-cell sequencing. Anywhere you have a positive continuous quantity that's right-skewed with a clear scale and shape, Gamma is the first parametric family to try.

Quantity	Formula	Numerical (α = 3, β = 2)
Mean	α / β	3/2 = 1.5
Variance	α / β²	3/4 = 0.75
Std deviation	√α / β	√3/2 ≈ 0.866
Mode (α > 1)	(α − 1) / β	2/2 = 1.0
Skewness	2/√α	2/√3 ≈ 1.155
Excess kurtosis	6/α	2
MGF	(1 − t/β)^(−α), t < β	—

Special case	Parameters	Domain it dominates
Exponential(λ)	α = 1, β = λ	Memoryless waiting time; first-event timing
Erlang(k, λ)	α = k (integer), β = λ	k-th event in Poisson process; queueing
Chi-squared(k)	α = k/2, β = 1/2	Sum of k squared standard Normals; hypothesis tests
Wishart(scalar)	1D specialization	Covariance matrix priors in Bayesian stats
Beta(α, β) (related)	X/(X+Y) where X,Y ~ Gamma	Proportions, fractions, ratios

Distribution	Pdf form	Mean	Var	Use case
Gamma(α, β)	x^(α−1) e^(−βx) · β^α / Γ(α)	α/β	α/β²	Generic waiting time, scale-shape
Exponential(λ)	λ e^(−λx)	1/λ	1/λ²	Memoryless waiting; lifetime
Erlang(k, λ)	λ^k x^(k−1) e^(−λx)/(k−1)!	k/λ	k/λ²	k-th event in Poisson; M/M/c queues
Chi-squared(k)	x^(k/2−1) e^(−x/2)/(2^(k/2)Γ(k/2))	k	2k	Sum of k squared Normals; tests
Weibull(k, λ)	(k/λ)(x/λ)^(k−1) e^(−(x/λ)^k)	λΓ(1+1/k)	—	Failure analysis; heavier tails
Lognormal(μ, σ)	(1/(xσ√2π))exp(−(ln x − μ)²/2σ²)	e^(μ+σ²/2)	—	Multiplicative processes; income

Gamma Distribution

Watch the 60-second explainer

The density

Interpretation — waiting time

Moments

Special cases — what Gamma contains

Gamma vs Chi-squared vs Exponential vs Erlang

Conjugacy with Poisson — Bayesian update

Sum of independent Gammas

Where the Gamma distribution shows up

Parameter estimation

Common pitfalls

Frequently asked questions