Asymptotic Analysis

Stirling's Approximation

n! ≈ √(2πn)(n/e)ⁿ — relative error 0.83% at n = 10, 0.083% at n = 100

Stirling's approximation states n! ≈ √(2πn)(n/e)ⁿ as n → ∞ — an asymptotic formula whose relative error drops as 1/(12n) and is already below 1% by n = 10. Abraham de Moivre proved the leading exponential factor (1733); James Stirling identified the constant √(2π) (1730, Methodus Differentialis). The full Stirling series log n! = n log n − n + ½ log(2πn) + 1/(12n) − 1/(360n³) + … is a divergent asymptotic expansion — partial sums improve up to an optimal truncation, then blow up. Stirling is the asymptotic engine of combinatorics (binomial coefficients in the large-N limit), information theory (Shannon entropy from multinomials), and statistical mechanics (Boltzmann S = k log W, partition functions, ideal-gas thermodynamics).

  • Formulan! ≈ √(2πn)(n/e)ⁿ
  • Discoverersde Moivre 1733, Stirling 1730
  • Error at n = 100.83%
  • Error at n = 1000.083%
  • Series typedivergent asymptotic
  • Used inentropy, partition fn, large-N combinatorics

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How it works

The factorial n! = 1·2·3·…·n grows faster than any polynomial and faster than any simple exponential. Stirling's approximation reveals its true growth rate: a power factor (n/e)ⁿ that dominates, multiplied by a square-root correction √(2πn) that smooths the answer to the right order. Once you accept the formula, the reason it works becomes intuitive — it is exactly the saddle-point approximation to the gamma-function integral.

The most useful form for analytical work is the logarithm:

log n! = n log n − n + ½ log(2πn) + 1/(12n) − 1/(360n³) + 1/(1260n⁵) − …

The first two terms n log n − n are the workhorse: that is the form used in every entropy calculation, every Boltzmann derivation, and every large-N combinatorial estimate. The ½ log(2πn) correction is enough for almost any numerical purpose. The 1/(12n) term is the first algebraic correction; beyond it the coefficients grow factorially and the series diverges, but the partial sums still give superb numerical approximations if truncated optimally.

Proof sketch — Laplace's method on Γ(n + 1)

Start from Euler's gamma integral n! = Γ(n + 1) = ∫₀^∞ tⁿ e^(−t) dt. Write the integrand as exp(n log t − t) and look for its maximum: differentiating n log t − t and setting to zero gives t = n. The integrand is sharply peaked there, so Taylor-expand the exponent around t = n with the substitution t = n(1 + u/√n):

n log t − t = n log n − n − u²/2 + O(u³/√n)

Plug this in, change variable, and the integral becomes:

n! ≈ nⁿ e^(−n) √n · ∫_{−∞}^∞ e^(−u²/2) du = nⁿ e^(−n) √n · √(2π)

which is exactly Stirling's formula. The √(2π) is the Gaussian normalization constant. Carrying higher-order terms in the Taylor expansion gives the 1/(12n) and subsequent corrections — the formal asymptotic series. This is the canonical example of Laplace's method or saddle-point integration in one dimension; the same technique gives asymptotics for Bessel functions, the central limit theorem, and a host of other integrals dominated by a sharp maximum.

Variants and refinements

  • One-term form. n! ≈ (n/e)ⁿ — captures the dominant growth rate but is off by a factor of √(2πn), so the relative error grows with n. Useful for showing log n! ~ n log n − n, useless for numerics.
  • Standard Stirling. n! ≈ √(2πn)(n/e)ⁿ. The default. Error 1/(12n) for the ratio, 1/(12n) for log n!.
  • One-correction form. n! ≈ √(2πn)(n/e)ⁿ · (1 + 1/(12n)). Error 1/(288n²). Good enough for nearly anything; one extra multiplication.
  • Two-correction form. n! ≈ √(2πn)(n/e)ⁿ · (1 + 1/(12n) + 1/(288n²) − 139/(51840n³)). Error ~ 1/n⁴. Used in libraries that need full double precision for moderate n.
  • Nemes' form. n! ≈ √(2πn) · ((n + 1/(12n − 1/(10n)))/e)ⁿ. Closed form with similar accuracy to two-correction but only one expression.
  • Ramanujan's form. n! ≈ √π · (n/e)ⁿ · (8n³ + 4n² + n + 1/30)^(1/6). Closed form, no division, error 10⁻⁵ at n = 5.
  • Lanczos approximation. A rational approximation to Γ(z + 1) accurate to double precision for all moderate z. Used internally by most numerical libraries instead of plain Stirling because it converges, doesn't oscillate, and gives ≈ 15 correct digits with ≈ 10 terms.

Numerical examples

Computing the relative error of the leading Stirling formula for several values of n:

n          n!                          Stirling                    relative error
 1         1                           0.92214                     7.79%
 2         2                           1.91900                     4.05%
 5         120                         118.019                     1.65%
10         3,628,800                   3,598,696                   0.83%
20         2.43290 × 10¹⁸              2.42279 × 10¹⁸              0.416%
50         3.04141 × 10⁶⁴              3.03635 × 10⁶⁴              0.166%
100        9.33262 × 10¹⁵⁷             9.32485 × 10¹⁵⁷             0.0833%
1000       4.02387 × 10²⁵⁶⁷            4.02354 × 10²⁵⁶⁷            0.00833%

The relative error closely tracks 1/(12n), which is the leading term of the Stirling series. Adding the (1 + 1/(12n)) correction kills the 1/n term and leaves error ~ 1/(288n²) — at n = 10 the error drops from 0.83% to 0.0035%.

Entropy example: the multinomial 100! / (60! 30! 10!) counts ways to split 100 items into groups of sizes 60, 30, 10. Direct computation is fine here, but with Stirling:

log(100!/(60! 30! 10!)) ≈ −100 [0.6 log 0.6 + 0.3 log 0.3 + 0.1 log 0.1] − ½ log(2π·100·0.6·0.3·0.1)
                        ≈ 89.74 − 1.65 = 88.09 nats
(exact: 88.090…)

The Stirling estimate is correct to four significant figures and used as the leading term in Shannon's mutual information bounds, the Sanov large-deviation theorem, and free-energy expansions in statistical mechanics.

Common pitfalls

  • Quoting it as exact. Stirling is an asymptotic approximation — the symbol is ≈ or ~, never =. At n = 1 it gives 0.92214, not 1. The error is small but never zero.
  • Forgetting √(2πn). Many textbooks abbreviate to n! ≈ (n/e)ⁿ for the log form. That's fine when computing log n! − n log n + n (used in entropy), but if you need a numerical estimate of n!, you must include the √(2πn) — its omission is a factor-of-2.5 error at n = 10.
  • Truncating the series too far. The Stirling series diverges. Adding terms past the smallest one makes the approximation worse. The optimal truncation is k* ≈ 2πn for given n; past that, the asymptotic series is no longer asymptotic.
  • Using it for small n. The formula is meant for n ≫ 1. For n = 0 it gives nonsense (the formula doesn't define 0! = 1; you have to extend by the gamma function). For n = 1 the error is ~8%. For finite combinatorics with n ≤ 20, just multiply out the factorial.
  • Confusing log n! with log Γ(n). n! = Γ(n + 1), so log n! and log Γ(n + 1) are equal. Some sources write Stirling for log Γ(z) which is the same formula with z replacing n + 1; an off-by-one shift trips up first-time readers.
  • Misapplying in entropy of small samples. Sanov-type large-deviation bounds use Stirling to convert log(N!/Πnᵢ!) into −N Σ pᵢ log pᵢ. For small N the correction terms matter; ignoring them leads to over-confident estimates of rare-event probabilities.

Where Stirling shows up

  • Combinatorics in the large-N limit. Binomial coefficient (n choose k) for k = αn has Stirling form ≈ 2^(nH(α)) where H is the binary entropy. Used in coding theory, random graphs, and Erdős-Ko-Rado style extremal arguments.
  • Information theory. Shannon entropy of a discrete distribution is the Stirling limit of log multinomial / N. The asymptotic equipartition property, the Sanov large-deviation theorem, and rate-distortion theorems all use Stirling at their core.
  • Statistical mechanics. Boltzmann entropy S = k log W counts microstates W; for an N-particle system, W ~ N! / Π Nᵢ!, and Stirling gives the thermodynamic limit S/N ≈ −Σ pᵢ log pᵢ. Same calculation derives the Boltzmann distribution by Lagrange multipliers, the Sackur-Tetrode formula for ideal-gas entropy, and the free energy of mixing.
  • Quantum many-body systems. Partition functions for fermions and bosons involve products and sums that, in the thermodynamic limit, reduce to Gaussian integrals via Stirling. The Hartree-Fock and BCS mean-field theories use this.
  • Random matrix theory. Wigner's semicircle law for the spectral density of large random matrices follows from a saddle-point evaluation of a determinantal expression that produces n! terms, simplified by Stirling.
  • Number theory. The prime counting function π(x) ~ x / log x has a sharper form involving log of Stirling-style sums; the Selberg-Erdős elementary proof of the prime number theorem uses log n! asymptotics directly.
  • Asymptotic analysis broadly. Stirling is the prototypical Laplace/saddle-point example. Any integral of the form ∫ e^(N S(x)) dx admits a similar Gaussian expansion around the saddle point of S — Stirling is the canonical worked example, and the techniques generalize to multi-dimensional and complex saddles.

Frequently asked questions

How accurate is Stirling's approximation in practice?

The leading-order formula n! ≈ √(2πn)(n/e)ⁿ has relative error ≈ 1/(12n) — small, fast, and a one-line drop-in for the factorial when n is even modestly large. Concrete numbers: at n = 5 the error is 1.65%; at n = 10 it is 0.83%; at n = 50 it is 0.166%; at n = 100 it is 0.083%; at n = 1000 it is 0.0083%. The error halves whenever n doubles only because 1/(12n) decays slowly — but the relative accuracy is already enough for nearly every physical or statistical use by n = 20. For logarithms (log n!) the error is far better — comparing log(n!) to n log n − n + ½ log(2πn) gives error 1/(12n) which is tiny even at n = 4.

What is the Stirling series and why does it diverge?

The full asymptotic series is log n! = n log n − n + ½ log(2πn) + 1/(12n) − 1/(360n³) + 1/(1260n⁵) − … . It is a divergent series — the coefficients grow factorially — but it is an excellent asymptotic series in the sense of Poincaré: for fixed n, partial sums get more accurate up to an optimal truncation k* ≈ 2πn, then start diverging. In practice you stop adding terms before the smallest term and your error is bounded by that smallest term. This is the same behaviour as the Stirling series for the gamma function and is typical of asymptotic expansions arising from saddle-point integration.

How is Stirling derived from the gamma integral?

Use the Euler representation n! = Γ(n + 1) = ∫₀^∞ tⁿ e^(−t) dt and substitute t = n(1 + u/√n) to centre the integrand on its maximum at t = n. Expand the exponent and apply Laplace's method (saddle-point integration): the integrand is sharply peaked, the leading-order Gaussian gives √(2πn), the dominant exponential factor is (n/e)ⁿ, and higher-order corrections give the 1/(12n) and beyond series. This is the rigorous derivation and explains both where √(2π) comes from (Gaussian integral) and why the corrections form an asymptotic — not convergent — series.

Why does Stirling appear in entropy and statistical mechanics?

Entropy of a multinomial S = log(N! / (n₁! n₂! … n_k!)) reduces, via Stirling, to S ≈ −N Σ pᵢ log pᵢ where pᵢ = nᵢ/N. The Shannon entropy formula falls out cleanly only because Stirling converts log n! into n log n − n, killing the linear terms when you take a difference. The same calculation underlies Boltzmann's H-theorem (S = k log W), the derivation of the Boltzmann distribution by Lagrange multipliers, and every "counting microstates" argument in statistical mechanics. Without Stirling, you cannot get from microstate counting to thermodynamic entropy.

What's the difference between Stirling's formula and Stirling numbers?

Two unrelated objects sharing James Stirling's name. Stirling's approximation is the asymptotic n! ≈ √(2πn)(n/e)ⁿ. Stirling numbers (first and second kind) are combinatorial coefficients counting cycle structures of permutations and partitions of sets, respectively. They appear in falling factorials (xⁿ = Σ S(n, k) x_(k)) and ordinary factorials, but the link to the asymptotic formula is incidental — both bear Stirling's name because he wrote about both in Methodus Differentialis (1730).

How do programming languages compute log(n!) for large n?

Direct computation of n! overflows at n = 21 in 64-bit integers and at n ≈ 170 in IEEE 754 doubles. For large n, libraries use log Γ(n + 1) via the Stirling series — e.g., scipy.special.gammaln, math.lgamma in Python, lgamma in C's math.h. Internally these usually evaluate the Lanczos approximation (a closed-form rational approximation accurate to double precision) for moderate n, and switch to the explicit Stirling asymptotic for very large n. The cost is O(1), independent of n, in dramatic contrast to the O(n) cost of multiplying out the factorial.

Stirling variants compared

FormExpressionRelative errorUse case
One-term(n/e)ⁿO(√n) — gets worse with nOrder-of-magnitude only
Standard Stirling√(2πn)(n/e)ⁿ1/(12n) ≈ 0.83% at n=10Default, most uses
+ first correction·(1 + 1/(12n))1/(288n²) ≈ 0.0035% at n=10Double-precision moderate n
Two corrections·(1 + 1/12n + 1/288n²)O(1/n³)Full double precision n ≥ 5
Ramanujan√π (n/e)ⁿ (8n³+4n²+n+1/30)^(1/6)~10⁻⁵ at n=5Closed form, no series
Lanczosrational approximation~10⁻¹⁵ for all moderate zLibrary implementations