Inequalities
AM-GM Inequality
(Σxᵢ)/n ≥ (Πxᵢ)^(1/n) — arithmetic mean ≥ geometric mean
For non-negative real numbers x₁, …, xₙ, the arithmetic mean is always at least the geometric mean: (x₁ + … + xₙ)/n ≥ (x₁ · … · xₙ)^(1/n). Equality holds if and only if all the xᵢ are equal. For n = 2 it is the classic (a + b)/2 ≥ √(ab). Known to Euclid (Elements VI.27, geometric form: among rectangles of fixed perimeter, the square has the largest area). The standard proof is via Jensen's inequality applied to log: log is concave, so the log of the AM ≥ AM of the logs = log of the GM. Foundation of optimization (fixed sum ⇒ max product when equal), the proof of Young's inequality, and many bounds in information theory and probability. Lifts to the weighted AM-GM: Σwᵢ xᵢ ≥ Πxᵢ^{wᵢ} for weights summing to 1.
- StatementAM ≥ GM
- For n = 2(a + b)/2 ≥ √(ab)
- Equality iffall xᵢ equal
- ProofJensen applied to log
- Slack (n = 2)(√a − √b)²/2
- First statedEuclid (geometric), ~300 BC
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The statement and the special case n = 2
For non-negative real numbers x₁, …, xₙ:
AM(x) = (x₁ + x₂ + … + xₙ) / n
GM(x) = (x₁ · x₂ · … · xₙ)^(1/n)
AM(x) ≥ GM(x), with equality iff x₁ = x₂ = … = xₙ.
For n = 2, the statement becomes (a + b)/2 ≥ √(ab), or equivalently a + b ≥ 2√(ab). The proof in one line: from (√a − √b)² ≥ 0 (any squared real is non-negative) expand and rearrange. The slack — how much AM exceeds GM — is exactly
(a + b)/2 − √(ab) = (√a − √b)² / 2 for a, b ≥ 0
This is the cleanest closed form of the AM-GM gap. The square is zero only when √a = √b, i.e., a = b; otherwise it is strictly positive. Geometrically: arrange a and b along a number line; AM is the midpoint, GM is the side length of the square with the same area as the a × b rectangle. The Babylonian geometric mean is "what side length would a square need to match this rectangle's area?".
Proof via Jensen's inequality (concavity of log)
The function log : (0, ∞) → ℝ is strictly concave (its second derivative −1/x² is negative). By Jensen's inequality, for any non-negative weights wᵢ summing to 1 and any positive xᵢ:
log(Σ wᵢ xᵢ) ≥ Σ wᵢ log(xᵢ) = log(Π xᵢ^{wᵢ})
Exponentiate: Σ wᵢ xᵢ ≥ Π xᵢ^{wᵢ}. The equal-weight case wᵢ = 1/n gives AM ≥ GM. Equality in Jensen requires all xᵢ equal (since log is strictly concave). This proof handles the weighted version simultaneously, which is essential for downstream applications like Young's inequality.
To handle xᵢ = 0: if any xᵢ = 0 the GM is 0 and the AM is non-negative, so the inequality is trivial.
Proof via Cauchy's forward-backward induction
An elegant induction proof avoids Jensen. Step 1 (base): n = 2 from (√a − √b)² ≥ 0. Step 2 (doubling): if AM-GM holds for n = k, it holds for n = 2k by pairing:
AM_{2k}(x₁, …, x₂ₖ) = (AM_k(x₁, …, x_k) + AM_k(x_{k+1}, …, x_{2k})) / 2
≥ √(AM_k(left) · AM_k(right)) [by n = 2 case]
≥ √(GM_k(left) · GM_k(right)) [by induction]
= GM_{2k}(x₁, …, x_{2k}).
Step 3 (descent): given AM-GM holds for n, prove for n − 1. Take any x₁, …, x_{n−1} and set xₙ = AM(x₁, …, x_{n−1}). Then AM(x₁, …, xₙ) = AM(x₁, …, x_{n−1}), and AM-GM for n gives the same AM ≥ (Πxᵢ)^(1/n). Solve for the (n−1)-term GM and you recover AM-GM for n − 1. Cauchy's elegant 1821 trick: prove on powers of 2 first, then descend.
Worked examples with numbers
Example 1 (n = 2):
a = 4, b = 9
AM = (4 + 9)/2 = 6.5
GM = √(4 · 9) = 6
Slack = (√4 − √9)² / 2 = (2 − 3)² / 2 = 0.5
6 ≤ 6.5 ✓ (strict; slack 0.5)
Example 2 (equality):
a = b = 7
AM = 7, GM = 7
7 = 7 EQUALITY (as expected)
Example 3 (n = 3):
x = (1, 4, 8)
AM = 13/3 ≈ 4.333
GM = (32)^(1/3) ≈ 3.175
4.333 ≥ 3.175 ✓
Example 4 (n = 4, equality):
x = (5, 5, 5, 5)
AM = 5, GM = 5 EQUALITY
Example 5 (optimization — fixed sum 6, three terms):
maximize xyz subject to x + y + z = 6, x, y, z ≥ 0
AM-GM: (x + y + z)/3 = 2 ≥ (xyz)^(1/3)
so xyz ≤ 8, attained iff x = y = z = 2.
Example 6 (weighted AM-GM):
weights w = (1/3, 2/3), values x = (8, 1)
weighted AM = (1/3)·8 + (2/3)·1 = 8/3 + 2/3 = 10/3 ≈ 3.333
weighted GM = 8^(1/3) · 1^(2/3) = 2
3.333 ≥ 2 ✓
The weighted AM-GM inequality
For non-negative weights w₁, …, wₙ summing to 1, and non-negative xᵢ:
w₁ x₁ + w₂ x₂ + … + wₙ xₙ ≥ x₁^{w₁} · x₂^{w₂} · … · xₙ^{wₙ}
This is exactly Jensen's inequality applied to log on the convex combination. It generalizes the equal-weight case wᵢ = 1/n. The two-variable weighted form with w₁ = 1/p, w₂ = 1/q (where 1/p + 1/q = 1), x₁ = a^p, x₂ = b^q yields:
(1/p) a^p + (1/q) b^q ≥ (a^p)^{1/p} · (b^q)^{1/q} = ab
This is Young's inequality — the engine that drives Hölder's inequality. AM-GM is the grandfather of the L^p hierarchy.
AM-GM in the power-mean hierarchy
The power mean of order p is defined for non-negative xᵢ as
M_p(x) = ((1/n) Σ xᵢ^p)^(1/p) for p ≠ 0
M_0(x) = lim_{p → 0} M_p(x) = (Π xᵢ)^(1/n) (geometric mean)
The power-mean inequality says M_p is non-decreasing in p:
M_{−∞}(x) = min(xᵢ) ≤ HM = M_{−1} ≤ GM = M_0 ≤ AM = M_1 ≤ QM = M_2 ≤ … ≤ M_{∞} = max(xᵢ)
AM ≥ GM is the M_1 ≥ M_0 step in this chain. The general power-mean inequality is proved by combining AM-GM with Jensen applied to x ↦ x^(p/q). Equality holds throughout the chain iff all xᵢ are equal.
Common pitfalls
- Forgetting non-negativity. AM-GM requires all xᵢ ≥ 0. With negative inputs, the geometric mean may not be real, and the inequality breaks. For (−1, 1) the AM is 0 and the GM is undefined (or imaginary).
- Assuming equality without all-equal. AM = GM iff every xᵢ takes the same value. Two values being equal but a third different still gives strict AM > GM.
- Confusing weighted with unweighted. The standard AM-GM is the equal-weight case wᵢ = 1/n. Generalizing to unequal weights requires the weighted form Σwᵢ xᵢ ≥ Πxᵢ^{wᵢ}; the unweighted form is a special case.
- Using AM-GM where Cauchy-Schwarz is sharper. Sometimes the desired bound is squared norms; CS gives a tighter constant in those cases. AM-GM is the workhorse but not always the sharpest tool.
- Treating GM as easier to compute. The arithmetic mean is robust under noise; the geometric mean is sensitive to any zero (multiplied product becomes zero). Outlier behaviour differs.
- Forgetting "equality iff" means the equality case is a single point. The set where AM = GM is a measure-zero slice (the diagonal). For generic data, AM > GM strictly.
Where AM-GM shows up
- Optimization with sum-constraint. Maximizing the product of variables subject to a fixed sum is solved at the symmetric point by AM-GM. Used in inventory, pricing, and economic equilibrium models.
- Isoperimetric problems. Among all rectangles of fixed perimeter 2p, the square (sides p/2 each) has the maximum area p²/4 — direct AM-GM application. Generalizes to the continuous isoperimetric inequality.
- Young's inequality and L^p theory. Weighted AM-GM with weights 1/p, 1/q gives Young; Young drives Hölder; Hölder drives Minkowski. AM-GM is the foundational tier of L^p analysis.
- Concentration inequalities. Cauchy-Schwarz |⟨u, v⟩|² ≤ ‖u‖² · ‖v‖² is AM-GM applied to (‖u‖²‖v‖² ≥ |⟨u, v⟩|²). Jensen's inequality is the general framework.
- Information theory. Entropy bounds, Gibbs' inequality, and KL divergence non-negativity all rest on AM-GM (or equivalent log-concavity arguments).
- Finance — Kelly criterion. The Kelly betting strategy maximizes geometric mean of growth (long-run wealth) rather than arithmetic mean (single-period expected value). AM-GM quantifies the cost of volatility in compounding returns.
- Algorithmic mean computation. The arithmetic-geometric mean (AGM) algorithm iterates aₙ₊₁ = (aₙ + bₙ)/2, bₙ₊₁ = √(aₙ bₙ); convergence is quadratic, used in fast computation of π via Gauss-Legendre.
- Quantitative analysis. Volatility drag, expected vs. realized portfolio returns, geometric vs. arithmetic Sharpe ratios — every "averaging in time" problem in finance touches AM-GM.
Frequently asked questions
What does the AM-GM inequality say?
For non-negative real numbers x₁, …, xₙ, the arithmetic mean is at least the geometric mean: (x₁ + x₂ + … + xₙ)/n ≥ (x₁ · x₂ · … · xₙ)^(1/n). Equality holds if and only if x₁ = x₂ = … = xₙ. For n = 2: (a + b)/2 ≥ √(ab). For positive numbers it is a strict inequality unless all values are equal. AM-GM is the most-used inequality in elementary optimization.
Why does AM-GM hold for n = 2?
Start from (√a − √b)² ≥ 0 — a square is non-negative. Expand: a − 2√(ab) + b ≥ 0, rearrange to (a + b) ≥ 2√(ab), divide by 2: (a + b)/2 ≥ √(ab). Equality holds iff √a = √b, i.e., a = b. The two-variable case is a one-line argument that says "the square of any real difference is non-negative" — same intuition as completing the square. For a ≥ b: (a + b)/2 − √(ab) = (√a − √b)²/2, so the slack equals one half the squared difference of square roots. This is the cleanest closed-form expression of how much AM exceeds GM.
How is AM-GM proved for general n?
Two clean proofs. (1) Jensen's inequality: log is concave on (0, ∞), so log((Σxᵢ)/n) ≥ Σ(log xᵢ)/n = log((Πxᵢ)^(1/n)). Exponentiating gives AM ≥ GM. (2) Cauchy's forward-backward induction: prove AM-GM for n = 2 directly, lift to n = 2^k by pairing, then descend to arbitrary n by averaging xₙ₊₁ = (x₁ + … + xₙ)/n. A third proof: apply the rearrangement-style argument using Schur-convexity. All three give the same equality condition: all xᵢ equal.
How does AM-GM solve optimization problems?
Whenever you fix the sum and want to maximize the product, AM-GM says the maximum is attained when all variables are equal. Example: maximize xyz subject to x + y + z = 6, x, y, z > 0. AM-GM gives 2 = 6/3 ≥ (xyz)^(1/3), so xyz ≤ 8, attained at x = y = z = 2. Equally, minimizing a sum given a fixed product: AM-GM gives a tight lower bound. Used in isoperimetric problems (square minimizes perimeter for fixed area), in inventory and pricing problems, and in deriving the convergence rate of geometric series methods (AM-GM is how you prove the geometric mean of step sizes is at most the arithmetic mean).
What is the weighted AM-GM inequality?
For non-negative weights w₁, …, wₙ summing to 1 and non-negative xᵢ: Σwᵢ xᵢ ≥ Πxᵢ^{wᵢ}. The standard AM-GM is the equal-weight case wᵢ = 1/n. Weighted AM-GM is exactly Jensen's inequality applied to log: log(Σwᵢ xᵢ) ≥ Σwᵢ log xᵢ = log(Πxᵢ^{wᵢ}). It underlies the proof of Young's inequality ab ≤ a^p/p + b^q/q (set w₁ = 1/p, w₂ = 1/q, x₁ = a^p, x₂ = b^q), which in turn drives Hölder's inequality. AM-GM is therefore the unprivileged ancestor of much of L^p theory.
How does AM-GM compare with harmonic and quadratic means?
For non-negative reals, the chain HM ≤ GM ≤ AM ≤ QM holds: harmonic mean = n/(Σ 1/xᵢ), geometric mean = (Πxᵢ)^(1/n), arithmetic mean = (Σxᵢ)/n, quadratic mean = √(Σxᵢ²/n). All four agree iff all xᵢ are equal. Each inequality is a special case of the power-mean inequality M_p(x) = ((Σxᵢ^p)/n)^(1/p) is non-decreasing in p, with HM = M₋₁, GM = M₀ (limit), AM = M₁, QM = M₂. The wider the spread of the xᵢ, the wider the gaps between successive means.
Where does AM-GM appear in machine learning and probability?
Entropy bounds: for a probability distribution, the negative entropy H(p) = Σpᵢ log pᵢ is the Jensen gap in AM-GM applied to 1/pᵢ. The KL divergence D(p ‖ q) ≥ 0 follows from Jensen / AM-GM applied to the ratio pᵢ/qᵢ. AM-GM also gives the Hoeffding-type concentration bound for products: E[Πxᵢ] ≤ (E[Σxᵢ]/n)^n for non-negative xᵢ. The Cauchy-Schwarz inequality reduces to AM-GM applied to ‖u‖² · ‖v‖² ≥ |⟨u, v⟩|². And the geometric vs arithmetic returns trade-off in finance — Kelly criterion — is an AM-GM optimization.
Means hierarchy and inequality chain
| Mean | Order p | Formula | Position | Equality iff | Used in |
|---|---|---|---|---|---|
| min | p → −∞ | min(xᵢ) | smallest | — | Worst-case analysis |
| Harmonic mean (HM) | p = −1 | n / Σ(1/xᵢ) | ≤ GM | all xᵢ equal | Average rates, parallel resistance |
| Geometric mean (GM) | p = 0 (limit) | (Πxᵢ)^(1/n) | ≤ AM | all xᵢ equal | Compounded growth, Kelly criterion |
| Arithmetic mean (AM) | p = 1 | (Σxᵢ)/n | ≤ QM | all xᵢ equal | Average value, expectation |
| Quadratic mean (QM, RMS) | p = 2 | √(Σxᵢ²/n) | between AM and max | all xᵢ equal | Signal magnitudes, statistics |
| max | p → +∞ | max(xᵢ) | largest | — | Upper bound, sup-norm |