Inequalities

Young's Inequality

ab ≤ a^p/p + b^q/q for 1/p + 1/q = 1 — the scalar engine of Hölder

Young's inequality (scalar form) states that for non-negative reals a, b ≥ 0 and conjugate exponents p, q > 1 with 1/p + 1/q = 1: ab ≤ a^p/p + b^q/q. Equality holds iff a^p = b^q. The case p = q = 2 reduces to ab ≤ a²/2 + b²/2 — equivalent to (a − b)² ≥ 0 by rearrangement. Young's inequality is the one-variable engine that powers Hölder's inequality (apply Young pointwise, integrate), and transitively Minkowski's inequality and the entire theory of L^p spaces. The standard proof is concavity of log, or equivalently the weighted AM-GM with weights 1/p, 1/q. Young's convolution inequality ‖f * g‖_r ≤ ‖f‖_p · ‖g‖_q (with 1/p + 1/q = 1 + 1/r) extends it to L^p smoothing estimates. Named after William Henry Young (1912).

Scalar formab ≤ a^p/p + b^q/q
Conjugate1/p + 1/q = 1
p = q = 2ab ≤ a²/2 + b²/2
Equality iffa^p = b^q
Named afterW. H. Young, 1912
Convolution form‖f*g‖_r ≤ ‖f‖_p ‖g‖_q

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The statement and the case p = q = 2

For a, b ≥ 0 and conjugate exponents p, q > 1 with 1/p + 1/q = 1:

ab ≤ a^p / p + b^q / q

Equality holds iff a^p = b^q. The case p = q = 2 (1/2 + 1/2 = 1) gives the cleanest specialisation:

ab ≤ a²/2 + b²/2          (rearranges to (a − b)² ≥ 0)

And the parameterized version with ε > 0 (also commonly called "Young's inequality with ε"):

ab ≤ ε a^p / p + ε^(−q/p) b^q / q       (rescale a → ε^(1/p) a, b → ε^(−1/p) b)

The ε version is the everyday workhorse in PDE estimates: it lets you absorb a small piece of a cross-term ab into εa^p/p (a small fraction of the dominant energy term) while paying with the rest as ε^(−q/p) b^q/q.

Proof via concavity of log

For a, b > 0 (the case a = 0 or b = 0 is trivial), use the weighted average:

log(ab) = log a + log b = (1/p) · p log a + (1/q) · q log b
        = (1/p) log(a^p) + (1/q) log(b^q)

Since log : (0, ∞) → ℝ is strictly concave, Jensen gives

(1/p) log(a^p) + (1/q) log(b^q) ≤ log( (1/p) a^p + (1/q) b^q )

Combine and exponentiate: ab ≤ (1/p) a^p + (1/q) b^q. Equality holds iff a^p = b^q (the two arguments of log agree). The proof is exactly weighted AM-GM with weights 1/p, 1/q applied to x₁ = a^p, x₂ = b^q. Young is weighted AM-GM in disguise.

Geometric proof — the rectangle and the curve

Consider the curve y = x^(p−1) in the first quadrant. Inverting: x = y^(1/(p−1)) = y^(q−1), where q = p/(p−1). The curve passes through the origin, is increasing, and divides the first quadrant into two regions.

Pick any point (a, b) in the first quadrant. The rectangle [0, a] × [0, b] has area ab. The area under the curve from 0 to a is:

∫₀ᵃ x^(p−1) dx = a^p / p

The area to the left of the curve from 0 to b (integrating the inverse) is:

∫₀ᵇ y^(q−1) dy = b^q / q

If (a, b) lies on the curve, i.e., b = a^(p−1) ⟺ a^p = b^q, the rectangle is tiled exactly by these two regions; their areas sum to ab. If (a, b) lies off the curve (above or below), the two regions either overlap or leave a gap — either way their combined area is at least ab. So:

ab ≤ a^p / p + b^q / q

with equality exactly when (a, b) is on the curve. This is the picture every textbook draws.

Worked numerical examples

Example 1 (p = q = 2):
  a = 3, b = 4
  ab = 12
  a²/2 + b²/2 = 4.5 + 8 = 12.5
  12 ≤ 12.5            ✓  (slack = (3−4)²/2 = 0.5)

Example 2 (p = q = 2, equality):
  a = b = 5
  ab = 25
  a²/2 + b²/2 = 12.5 + 12.5 = 25
  25 = 25              EQUALITY (a² = b²)

Example 3 (p = 3, q = 3/2):
  a = 2, b = 4
  ab = 8
  a^p/p = 8/3 ≈ 2.667
  b^q/q = 4^(3/2) / (3/2) = 8 / 1.5 ≈ 5.333
  a^p/p + b^q/q ≈ 8.000
  8 ≤ 8.000            EQUALITY (since a^p = 2³ = 8 = 4^(3/2) = b^q)

Example 4 (p = 4, q = 4/3):
  a = 1, b = 2
  ab = 2
  a^p/p = 1/4 = 0.25
  b^q/q = 2^(4/3) / (4/3) ≈ 1.890
  a^p/p + b^q/q ≈ 2.140
  2 ≤ 2.140            ✓  (slack ≈ 0.140)

Example 5 (parameterized Young, ε = 0.1, p = q = 2):
  a = 5, b = 5
  ab = 25
  ε a²/2 + ε^(−1) b²/2 = 0.1·12.5 + 10·12.5 = 1.25 + 125 = 126.25
  25 ≤ 126.25          ✓  (the ε trick gives a very loose bound when ε is small —
                             but it lets you put almost all the cost in one term)

Variants and generalizations

Scalar Young. ab ≤ a^p/p + b^q/q for 1/p + 1/q = 1.
Parameterized Young with ε. ab ≤ ε a^p/p + ε^(−q/p) b^q/q. Standard in PDE estimates — tune ε to absorb cross-terms into the principal energy.
Fenchel-Young (general convex). For a convex function φ with Legendre dual φ*, ab ≤ φ(a) + φ*(b). Setting φ(x) = x^p/p recovers scalar Young. The general statement is the duality inequality of convex analysis.
Young's convolution inequality. ‖f * g‖_r ≤ ‖f‖_p · ‖g‖_q for 1 ≤ p, q, r ≤ ∞ with 1/p + 1/q = 1 + 1/r. Sharp constant (Beckner-Brascamp-Lieb 1975) attained by Gaussians.
Matrix Young. For positive semi-definite matrices A, B: ‖AB‖ ≤ (‖A‖^p)/p + (‖B‖^q)/q in operator norm, with appropriate non-commutative subtleties.
Reverse Young. For 0 < p < 1 (so q < 0) and a, b > 0, the inequality reverses: ab ≥ a^p/p + b^q/q. Used in reverse Hölder / Gehring's lemma machinery.
Generalized n-term Young. For weights wᵢ summing to 1 and aᵢ ≥ 0: Πaᵢ^{wᵢ} ≤ Σwᵢ aᵢ (this is exactly weighted AM-GM).

Young proves Hölder (one application)

Apply Young pointwise to a = |f(x)| / ‖f‖_p, b = |g(x)| / ‖g‖_q. Then

|f(x) g(x)| / (‖f‖_p · ‖g‖_q) ≤ (1/p) |f(x)|^p / ‖f‖_p^p + (1/q) |g(x)|^q / ‖g‖_q^q

Integrate both sides over the measure space. The right side becomes 1/p + 1/q = 1. Multiplying both sides by ‖f‖_p · ‖g‖_q:

∫ |f g| dμ ≤ ‖f‖_p · ‖g‖_q          (Hölder's inequality)

One pointwise application of Young, one integration. The hierarchy convexity ⇒ Young ⇒ Hölder ⇒ Minkowski is tight: each step is short, and the cumulative power produces all of L^p theory.

The Legendre/Fenchel duality view

For a convex function φ : ℝ → ℝ ∪ {+∞}, its Legendre conjugate is φ*(y) = sup_x (xy − φ(x)). Equivalently, φ* is the convex function such that y ↦ x is the inverse derivative relationship. The Fenchel-Young inequality says, for all a, b:

ab ≤ φ(a) + φ*(b)

with equality iff b ∈ ∂φ(a) (the subgradient of φ at a). For φ(x) = x^p/p on [0, ∞), the Legendre dual is φ*(y) = y^q/q on [0, ∞), with q = p/(p−1) and the derivative relation y = x^(p−1) ⟺ x = y^(q−1). Substituting recovers scalar Young.

This view explains why Young's inequality appears across thermodynamics (energy/entropy duality), large-deviations theory (Cramér's theorem), and convex optimization (Lagrangian duality, primal/dual gap). Young is the prototypical Fenchel-Young inequality.

Common pitfalls

Forgetting non-negativity. Young requires a, b ≥ 0 (or absolute values for general reals). With negative values, the powers may not be defined or the inequality may flip.
Forgetting the conjugate condition. 1/p + 1/q = 1 is mandatory. With arbitrary p, q the constants on the right side don't add to 1 and the bound is wrong.
Confusing scalar and convolutional Young. The scalar form bounds ab pointwise; the convolutional form bounds the L^r norm of f * g. Both are called "Young", different statements.
Using a = b case as a general pattern. The case p = q = 2 is the simplest but not the most useful. The non-Cauchy-Schwarz cases (p = 3, q = 3/2; p = 4, q = 4/3) are where Young's flexibility matters.
Forgetting the ε version's purpose. Use it to absorb cross-terms when one factor is small. The cost is that the other factor blows up — there is no free lunch.
Believing the Legendre proof is "harder". Once you know the Legendre dual of x^p/p, the inequality is the defining property of the conjugate. The unfamiliarity comes from the language, not the depth.

Where Young's inequality shows up

L^p theory. Young → Hölder → Minkowski → triangle inequality for ‖·‖_p → L^p is a normed space. Half of functional analysis rests on this chain.
PDE energy estimates. Bound a quadratic cross-term in an energy identity by Young's inequality with ε, absorb the εa^p/p part into the principal energy, control the rest. Routine in Navier-Stokes, NLS, KdV, and parabolic regularity theory.
Sobolev embedding. Bounds like ‖f‖_q ≤ C(p, q) ‖∇f‖_p (Gagliardo-Nirenberg-Sobolev) use Hölder, which rests on Young.
Optimal transport and Wasserstein. Kantorovich-Rubinstein duality, optimal cost-density tradeoffs, and the Brenier theorem all use Legendre/Fenchel-Young.
Statistical mechanics / large deviations. Cramér's theorem expresses log-moment-generating function as Legendre dual of rate function — Young's inequality is the duality bound that proves the rate function is convex.
Information theory. Donsker-Varadhan variational formula for KL divergence; chi-squared and KL bounds on hypothesis testing all use Young/Fenchel-Young.
Optimization — duality. Lagrange duality: weak-duality bound and strong-duality conditions (Slater, KKT) rest on the Fenchel-Young inequality applied to the Lagrangian.
Harmonic analysis. Young's convolution inequality is the standard tool for bounding convolutional smoothing operators. Sharp constants involve Gaussians (Beckner, Brascamp-Lieb).

Frequently asked questions

What is Young's inequality (scalar form)?

For non-negative real numbers a, b and conjugate exponents p, q > 1 with 1/p + 1/q = 1: ab ≤ a^p/p + b^q/q. Equality iff a^p = b^q. The case p = q = 2 reduces to ab ≤ a²/2 + b²/2, which is just (a − b)² ≥ 0 rearranged. The inequality bounds a product by a sum of weighted powers — the duality that underlies the L^p–L^q pairing in Hölder's inequality.

How is Young's inequality proved?

Cleanest proof via concavity of log. For a, b > 0: log(ab) = log a + log b = (1/p)(p log a) + (1/q)(q log b) = (1/p) log a^p + (1/q) log b^q. Since log is concave, log((1/p) a^p + (1/q) b^q) ≥ (1/p) log a^p + (1/q) log b^q = log(ab). Exponentiating gives ab ≤ a^p/p + b^q/q. Equality requires equality in Jensen, hence a^p = b^q. An equivalent proof: weighted AM-GM with weights 1/p, 1/q gives the same result.

What is the geometric picture of Young's inequality?

Consider the rectangle [0, a] × [0, b] of area ab. Draw the curve y = x^(p−1) (equivalently x = y^(1/(p−1)) = y^(q−1)) through the origin. This curve splits the first quadrant into two regions. The integral ∫₀ᵃ x^(p−1) dx = a^p/p is the area under the curve up to x = a. The integral ∫₀ᵇ y^(q−1) dy = b^q/q is the area to the left of the curve up to y = b. If the point (a, b) lies on the curve (i.e., b = a^(p−1), equivalently a^p = b^q), these two regions exactly tile a rectangle of area ab — that's the equality case. If (a, b) is off the curve, the two regions overlap or leave a gap; either way their combined area a^p/p + b^q/q exceeds ab. Geometric proof of the inequality.

How does Young's inequality prove Hölder's inequality?

What is Young's convolution inequality?

For exponents 1 ≤ p, q, r ≤ ∞ with 1/p + 1/q = 1 + 1/r and functions f ∈ L^p(ℝⁿ), g ∈ L^q(ℝⁿ), the convolution f * g satisfies ‖f * g‖_r ≤ ‖f‖_p · ‖g‖_q. Cases: p = q = 2, r = ∞ gives ‖f * g‖_∞ ≤ ‖f‖₂ ‖g‖₂ (a Hölder-Cauchy bound on point values). p = 1, q = r gives the convolution-with-L¹ estimate. The sharp constant (Beckner's theorem, 1975) involves Gaussians as extremizers. This extension of the scalar Young is foundational for smoothing estimates, Sobolev embeddings, and harmonic analysis.

How is Young's inequality related to the Legendre transform?

The Legendre transform of f(x) = x^p/p (defined on x ≥ 0) is f*(y) = y^q/q with 1/p + 1/q = 1 — they are Legendre duals. Young's inequality is exactly the defining inequality of the Legendre transform: for any convex function φ with conjugate φ*, ab ≤ φ(a) + φ*(b). Setting φ(x) = x^p/p reproduces Young. So Young is the prototypical Fenchel-Young inequality, and it generalizes to any Legendre pair. This duality view explains why Young appears throughout convex analysis, optimization, and statistical mechanics (Legendre transforms swap energy and entropy).

Where does Young's inequality show up in everyday analysis?

It is the workhorse for splitting a product into a controllable sum. In PDE energy estimates: bound a cross-term uv by εu^p/p + (1/ε)^(q−1) v^q/q (parameterized Young with a tunable ε). In machine learning regularization: weight-decay arguments use Young to bound terms in the loss. In Sobolev embedding: control ‖fg‖_r by ‖f‖_p ‖g‖_q via Hölder, which rests on Young. In probability: moment generating function bounds and Cramér's large deviation theorem use the Legendre-transform form of Young.