Inequalities

Hölder's Inequality

Q: What does Hölder's inequality say in one sentence?

If 1/p + 1/q = 1 with 1 ≤ p, q ≤ ∞, then the integral of the product is bounded by the product of L^p norms: ∫|fg| dμ ≤ ‖f‖_p · ‖g‖_q. The two exponents are called conjugate. The same inequality holds for sums: Σ|aₖbₖ| ≤ (Σ|aₖ|^p)^(1/p) · (Σ|bₖ|^q)^(1/q). It bounds a pairing integral / sum by the natural norms of the two factors — a vast generalization of Cauchy-Schwarz.

Q: How does Hölder's inequality generalize Cauchy-Schwarz?

Cauchy-Schwarz is the case p = q = 2: ∫|fg| ≤ (∫|f|²)^(1/2) · (∫|g|²)^(1/2). Hölder lets you trade off — give more weight to f at the price of less weight for g. The extreme case p = 1, q = ∞ gives ∫|fg| ≤ ‖g‖_∞ · ∫|f|, the obvious bound. Cases like p = 3, q = 3/2 are the new content: you can pair an L^3 function with an L^(3/2) function. The trade-off is fixed by the conjugate condition 1/p + 1/q = 1.

Q: When does equality hold in Hölder's inequality?

Equality holds in ∫|fg| ≤ ‖f‖_p · ‖g‖_q exactly when there are non-negative constants α, β not both zero with α|f|^p = β|g|^q almost everywhere — that is, |f|^p and |g|^q are proportional. For p = q = 2 this recovers the Cauchy-Schwarz equality case: f and g are proportional (up to sign). The proportionality comes from chasing equality back through Young's inequality ab ≤ a^p/p + b^q/q, which is the engine of the standard proof.

Q: Why is Hölder the foundation of L^p spaces?

Two reasons. First, Hölder is the key step in proving Minkowski's inequality ‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p, which is the triangle inequality for the L^p norm — and triangle inequality is required for L^p to be a norm at all. Second, Hölder identifies the dual space (L^p)* with L^q: the bounded linear functional T_g(f) = ∫fg has operator norm exactly ‖g‖_q. So Hölder both makes L^p a normed space and tells you what its dual is. Without Hölder, the entire functional-analytic theory of L^p falls apart.

Q: How is Hölder used in PDE and harmonic analysis?

Constantly. The Sobolev embedding theorem uses Hölder to interpolate between L^p spaces; the energy method for parabolic PDE uses Hölder to control nonlinear terms by linear ones; Young's convolution inequality ‖f * g‖_r ≤ ‖f‖_p · ‖g‖_q (with 1/p + 1/q = 1 + 1/r) extends Hölder to convolution. In harmonic analysis, the Hausdorff-Young inequality bounds the Fourier transform across L^p spaces by Hölder-type interpolation. In PDE estimates for Navier-Stokes, Hölder bounds turn a quadratic nonlinearity uᵢ∂ᵢuⱼ into a product of norms — the difference between proving regularity and proving nothing.

Q: What is the generalized Hölder inequality for n functions?

For exponents p₁, p₂, …, pₙ ≥ 1 with 1/p₁ + 1/p₂ + … + 1/pₙ = 1, ∫|f₁ f₂ … fₙ| ≤ Π‖fᵢ‖_{pᵢ}. The standard two-function Hölder is n = 2. Induction extends to any finite n. The case n = 3 with p₁ = p₂ = p₃ = 3 gives ∫|fgh| ≤ ‖f‖₃ · ‖g‖₃ · ‖h‖₃ — used routinely in trilinear estimates for nonlinear PDE. The generalization preserves the proof structure: split the integrand into the right pieces, apply Young's inequality with multiple terms, and integrate.

Q: What is the reverse Hölder inequality?

For 0 < p < 1 (so q = p/(p−1) is negative) the inequality flips: ∫|fg| ≥ (∫|f|^p)^(1/p) · (∫|g|^q)^(1/q), with the q-norm interpreted in its extended sense. Reverse Hölder bounds an integral from below by a product of norms — useful in self-improving estimates for solutions of elliptic PDE (Gehring's lemma), in Muckenhoupt A_p weight theory, and in the geometric measure theory of fractals. The proof is again via Young's inequality, but with the roles of upper and lower bounds reversed.

∫|fg| ≤ ‖f‖_p · ‖g‖_q whenever 1/p + 1/q = 1

For measurable functions f, g and conjugate exponents p, q with 1/p + 1/q = 1, Hölder's inequality says ∫|fg| dμ ≤ (∫|f|^p)^(1/p) · (∫|g|^q)^(1/q). The integral of a product is bounded by a product of L^p norms. The case p = q = 2 is Cauchy-Schwarz; p = 1, q = ∞ is the essential-sup bound. Hölder is the cornerstone of L^p spaces: it proves Minkowski's inequality (triangle inequality for ‖·‖_p), identifies (L^p)* = L^q via f ↦ ∫fg, and seeds Young's convolution inequality. First stated by Leonard James Rogers (1888), reproved by Otto Hölder (1889). Equality holds iff |f|^p and |g|^q are proportional almost everywhere.

Statement∫|fg| ≤ ‖f‖_p · ‖g‖_q
Conjugate1/p + 1/q = 1
p = q = 2Cauchy-Schwarz
First provedRogers 1888, Hölder 1889
Dual(L^p)* ≅ L^q (1 ≤ p < ∞)
Equality iff|f|^p ∝ |g|^q a.e.

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The statement, precisely

Let (X, Σ, μ) be a measure space and let p, q ∈ [1, ∞] satisfy the conjugate-exponent condition 1/p + 1/q = 1 (with the conventions 1/∞ = 0 and 1/1 + 1/∞ = 1). For any measurable functions f, g : X → ℂ:

∫_X |f(x) g(x)| dμ(x) ≤ (∫_X |f|^p dμ)^(1/p) · (∫_X |g|^q dμ)^(1/q)

In norm notation, ‖fg‖_1 ≤ ‖f‖_p · ‖g‖_q. For finite sums (counting measure on {1, …, n}) this becomes the discrete Hölder inequality:

Σ_{k=1}^n |a_k b_k| ≤ (Σ |a_k|^p)^(1/p) · (Σ |b_k|^q)^(1/q)

The cases at the boundary are special. p = q = 2 is the classical Cauchy-Schwarz inequality |∫fg| ≤ ‖f‖₂ · ‖g‖₂. p = 1, q = ∞ gives the trivial bound ∫|fg| ≤ ‖g‖_∞ · ∫|f|. All the new content of Hölder lives in the intermediate cases — p = 3, q = 3/2; p = 4, q = 4/3; p = 5/3, q = 5/2 — where neither factor is in L² but their product is integrable.

Proof sketch — Young's inequality is the engine

The standard proof rests on Young's inequality for non-negative real numbers a, b:

ab ≤ a^p/p + b^q/q                  whenever 1/p + 1/q = 1, a, b ≥ 0

This is a one-variable inequality that follows from the concavity of log: log(a^p/p + b^q/q) ≥ (1/p) log a^p + (1/q) log b^q = log(ab). Apply Young pointwise to a = |f(x)| / ‖f‖_p and b = |g(x)| / ‖g‖_q:

|f(x) g(x)| / (‖f‖_p · ‖g‖_q) ≤ (1/p) · |f(x)|^p / ‖f‖_p^p + (1/q) · |g(x)|^q / ‖g‖_q^q

Now integrate over X. The right side becomes 1/p + 1/q = 1. Multiplying both sides by ‖f‖_p · ‖g‖_q gives Hölder's inequality. The proof is two lines once Young is established, and Young is one application of concavity.

The equality case in Young is a = b — that is, a^p = b^q. Tracing this back through the scaling, equality in Hölder holds iff |f|^p / ‖f‖_p^p = |g|^q / ‖g‖_q^q almost everywhere, equivalently iff |f|^p and |g|^q are proportional a.e.

Worked examples with numbers

Concrete checks on small sums and integrals make the inequality vivid.

Example 1 (p = q = 2, Cauchy-Schwarz):
  a = (1, 2, 3),  b = (4, 5, 6)
  Σ a·b = 4 + 10 + 18 = 32
  ‖a‖₂ = √14 ≈ 3.7417,  ‖b‖₂ = √77 ≈ 8.7750
  ‖a‖₂ · ‖b‖₂ ≈ 32.8329
  32 ≤ 32.8329     ✓  (close but not equal; a, b not proportional)

Example 2 (p = 3, q = 3/2):
  a = (1, 1, 1),  b = (2, 3, 4)
  Σ a·b = 9
  ‖a‖₃ = 3^(1/3) ≈ 1.4422
  ‖b‖_{3/2} = (2^(3/2) + 3^(3/2) + 4^(3/2))^(2/3) = (2.828 + 5.196 + 8.0)^(2/3) ≈ 6.342
  ‖a‖₃ · ‖b‖_{3/2} ≈ 9.149
  9 ≤ 9.149        ✓

Example 3 (p = ∞, q = 1):
  a = (1, 5, 2),  b = (3, 4, 6)
  Σ |a·b| = 3 + 20 + 12 = 35
  ‖a‖_∞ = 5,  ‖b‖_1 = 13
  5 · 13 = 65
  35 ≤ 65          ✓  (slack: this is the loosest of the Hölder cases)

Example 4 (equality case):
  a = (1, 2, 3),  b = (1, 2, 3)   (b = a, so |a|^p = |b|^q for p = q = 2)
  Σ a·b = 1 + 4 + 9 = 14
  ‖a‖₂ · ‖b‖₂ = √14 · √14 = 14
  14 = 14          ✓  EQUALITY

For a continuous example, take f(x) = x and g(x) = x² on [0, 1] with p = 3, q = 3/2:

∫_0^1 |fg| dx = ∫_0^1 x³ dx = 1/4
‖f‖₃ = (∫_0^1 x³ dx)^(1/3) = (1/4)^(1/3) ≈ 0.6300
‖g‖_{3/2} = (∫_0^1 x³ dx)^(2/3) = (1/4)^(2/3) ≈ 0.3969
‖f‖₃ · ‖g‖_{3/2} ≈ 0.2500          14 ≤ 0.2500 — EQUALITY (because |f|³ = x³ = |g|^{3/2})

The equality case is informative: whenever |f|^p and |g|^q agree (after scaling), Hölder is tight.

Variants and generalizations

Discrete Hölder. Σ|aₖbₖ| ≤ (Σ|aₖ|^p)^(1/p) · (Σ|bₖ|^q)^(1/q). Used to compare ℓ^p sequence norms.
Generalized n-function Hölder. Σ 1/pᵢ = 1 ⇒ ∫|f₁ … fₙ| ≤ Π‖fᵢ‖_{pᵢ}. Routine in nonlinear PDE estimates.
Interpolation form (Hölder with three exponents). If 1/r = θ/p + (1−θ)/q with θ ∈ [0, 1], then ‖f‖_r ≤ ‖f‖_p^θ · ‖f‖_q^(1−θ). A direct consequence of two-function Hölder applied to |f|^θ and |f|^(1−θ).
Reverse Hölder. For 0 < p < 1 (so q < 0), the inequality flips: ∫|fg| ≥ ‖f‖_p · ‖g‖_q. Used in Gehring's lemma and Muckenhoupt A_p weight theory.
Young's convolution inequality. ‖f * g‖_r ≤ ‖f‖_p · ‖g‖_q when 1/p + 1/q = 1 + 1/r. Reduces to Hölder when r = ∞.
Hausdorff-Young inequality. For 1 ≤ p ≤ 2 with conjugate q, the Fourier transform satisfies ‖F̂‖_q ≤ ‖f‖_p. A Hölder-type estimate adapted to Fourier duality.
Hölder on Lorentz spaces. ‖fg‖_{L^{1,1}} ≤ ‖f‖_{L^{p, r}} · ‖g‖_{L^{q, r′}}, refining the L^p form using Lorentz exponents.

Hölder proves Minkowski (the triangle inequality)

This is the most important corollary. To show ‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p for 1 ≤ p < ∞, write:

∫ |f + g|^p ≤ ∫ |f + g|^(p−1) (|f| + |g|)
            = ∫ |f| · |f + g|^(p−1) + ∫ |g| · |f + g|^(p−1)

Apply Hölder to each term with conjugate exponent q = p/(p−1):

∫ |f| · |f + g|^(p−1) ≤ ‖f‖_p · ‖(f + g)^(p−1)‖_q = ‖f‖_p · (∫|f + g|^p)^((p−1)/p)

Combining, dividing by (∫|f + g|^p)^((p−1)/p), and recognising the resulting exponent gives ‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p. Without Hölder, the L^p norm wouldn't satisfy the triangle inequality. Hölder is logically prior to Minkowski.

Hölder and L^p–L^q duality

For 1 ≤ p < ∞ with q its conjugate, every g ∈ L^q defines a bounded linear functional T_g : L^p → ℂ by T_g(f) = ∫ fg dμ. Hölder bounds its norm:

‖T_g‖ = sup {|T_g(f)| : ‖f‖_p = 1} ≤ ‖g‖_q

The reverse inequality (‖T_g‖ ≥ ‖g‖_q) follows by choosing f to be a normalized rearrangement of |g|^(q−1) sgn(g). So ‖T_g‖ = ‖g‖_q exactly, and the map g ↦ T_g is an isometric isomorphism L^q ≅ (L^p)*. This is the Riesz representation theorem for L^p, and Hölder is half its proof.

The case p = q = 2 is the self-dual one: (L²)* = L² via the inner product. The case p = 1 is special: (L¹)* = L^∞ but (L^∞)* ⊋ L¹ — L^∞ is not reflexive, because the dual of L^∞ contains "finitely additive measures" beyond L¹.

Common pitfalls

Forgetting the conjugate condition. Hölder requires 1/p + 1/q = 1. Plugging arbitrary exponents (e.g. p = 2, q = 3) gives a wrong, non-rescalable inequality.
Using Hölder when one of the norms is infinite. If ‖f‖_p = ∞ the inequality is trivially true; it gives no information. The useful regime is when both norms are finite.
Forgetting the absolute values. The statement controls ∫|fg|, not ∫fg. For complex-valued or sign-changing f, g you take absolute values throughout.
Equality requires |f|^p ∝ |g|^q, not f ∝ g. For p = q = 2 these coincide, but for other (p, q) they differ — easy to misremember.
Confusing Hölder with Minkowski. Hölder bounds ∫|fg| by a product of norms; Minkowski bounds the norm of a sum by a sum of norms. Different statements with different roles.
Believing Hölder fails for p = ∞. The case p = ∞, q = 1 is included via the conventions ‖f‖_∞ = ess sup |f| and gives the elementary bound ∫|fg| ≤ ‖f‖_∞ · ‖g‖_1.

Where Hölder shows up

L^p space theory. Hölder proves Minkowski's inequality (triangle for ‖·‖_p), identifies (L^p)* = L^q, bounds the inclusion L^p ⊂ L^r on a finite-measure space (for r < p), and underlies every L^p estimate in functional analysis.
PDE energy estimates. Bounding a nonlinear term like ‖u² ∂_x u‖_{L^1} by Hölder gives ‖u‖_{L^4}² · ‖∂_x u‖_{L^2}, which couples to Sobolev embedding to close the estimate. Standard machinery for Navier-Stokes, KdV, NLS.
Probability theory. E|XY| ≤ (E|X|^p)^(1/p) · (E|Y|^q)^(1/q) is the probabilistic Hölder. Special case: Jensen's-like bounds, moment inequalities, Lyapunov's inequality (‖X‖_p increases in p on a probability space).
Statistics and machine learning. Bias-variance decompositions in high dimensions use Hölder to bound cross-terms. Generalization-error bounds for L^p-regularized models use it to relate empirical to population norms.
Harmonic analysis. The Hausdorff-Young inequality (‖f̂‖_q ≤ ‖f‖_p, 1 ≤ p ≤ 2) is Hölder-flavoured; Strichartz estimates for dispersive PDE chain Hölder with space-time L^p^q^r norms.
Number theory. Discrete Hölder bounds character sums, exponential sums (Vinogradov's mean-value theorem), and partial sums in analytic number theory.
Optimization. Cauchy-Schwarz (p = q = 2) is the foundational duality in linear regression, principal components, and gradient methods. General Hölder extends to ℓ^p regularization and L^p-norm minimization.
Information theory. Rényi entropies of conjugate orders interact via Hölder-type identities; Bregman divergences include Hölder as a building block.

Frequently asked questions

What does Hölder's inequality say in one sentence?

If 1/p + 1/q = 1 with 1 ≤ p, q ≤ ∞, then the integral of the product is bounded by the product of L^p norms: ∫|fg| dμ ≤ ‖f‖_p · ‖g‖_q. The two exponents are called conjugate. The same inequality holds for sums: Σ|aₖbₖ| ≤ (Σ|aₖ|^p)^(1/p) · (Σ|bₖ|^q)^(1/q). It bounds a pairing integral / sum by the natural norms of the two factors — a vast generalization of Cauchy-Schwarz.

How does Hölder's inequality generalize Cauchy-Schwarz?

Cauchy-Schwarz is the case p = q = 2: ∫|fg| ≤ (∫|f|²)^(1/2) · (∫|g|²)^(1/2). Hölder lets you trade off — give more weight to f at the price of less weight for g. The extreme case p = 1, q = ∞ gives ∫|fg| ≤ ‖g‖_∞ · ∫|f|, the obvious bound. Cases like p = 3, q = 3/2 are the new content: you can pair an L³ function with an L^(3/2) function. The trade-off is fixed by the conjugate condition 1/p + 1/q = 1.

When does equality hold in Hölder's inequality?

Equality holds in ∫|fg| ≤ ‖f‖_p · ‖g‖_q exactly when there are non-negative constants α, β not both zero with α|f|^p = β|g|^q almost everywhere — that is, |f|^p and |g|^q are proportional. For p = q = 2 this recovers the Cauchy-Schwarz equality case: f and g are proportional (up to sign). The proportionality comes from chasing equality back through Young's inequality ab ≤ a^p/p + b^q/q, which is the engine of the standard proof.

Why is Hölder the foundation of L^p spaces?

Two reasons. First, Hölder is the key step in proving Minkowski's inequality ‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p, which is the triangle inequality for the L^p norm — and triangle inequality is required for L^p to be a norm at all. Second, Hölder identifies the dual space (L^p)* with L^q: the bounded linear functional T_g(f) = ∫fg has operator norm exactly ‖g‖_q. So Hölder both makes L^p a normed space and tells you what its dual is. Without Hölder, the entire functional-analytic theory of L^p falls apart.

How is Hölder used in PDE and harmonic analysis?

Constantly. The Sobolev embedding theorem uses Hölder to interpolate between L^p spaces; the energy method for parabolic PDE uses Hölder to control nonlinear terms by linear ones; Young's convolution inequality ‖f * g‖_r ≤ ‖f‖_p · ‖g‖_q (with 1/p + 1/q = 1 + 1/r) extends Hölder to convolution. In harmonic analysis, the Hausdorff-Young inequality bounds the Fourier transform across L^p spaces by Hölder-type interpolation. In PDE estimates for Navier-Stokes, Hölder bounds turn a quadratic nonlinearity uᵢ∂ᵢuⱼ into a product of norms — the difference between proving regularity and proving nothing.

What is the generalized Hölder inequality for n functions?

For exponents p₁, p₂, …, pₙ ≥ 1 with 1/p₁ + 1/p₂ + … + 1/pₙ = 1, ∫|f₁ f₂ … fₙ| ≤ Π‖fᵢ‖_{pᵢ}. The standard two-function Hölder is n = 2. Induction extends to any finite n. The case n = 3 with p₁ = p₂ = p₃ = 3 gives ∫|fgh| ≤ ‖f‖₃ · ‖g‖₃ · ‖h‖₃ — used routinely in trilinear estimates for nonlinear PDE. The generalization preserves the proof structure: split the integrand into the right pieces, apply Young's inequality with multiple terms, and integrate.

What is the reverse Hölder inequality?

For 0 < p < 1 (so q = p/(p−1) is negative) the inequality flips: ∫|fg| ≥ (∫|f|^p)^(1/p) · (∫|g|^q)^(1/q), with the q-norm interpreted in its extended sense. Reverse Hölder bounds an integral from below by a product of norms — useful in self-improving estimates for solutions of elliptic PDE (Gehring's lemma), in Muckenhoupt A_p weight theory, and in the geometric measure theory of fractals. The proof is again via Young's inequality, but with the roles of upper and lower bounds reversed.

Hölder, Cauchy-Schwarz, Young, Minkowski — at a glance

Inequality	Statement	Equality iff	Role
Young (scalar)	ab ≤ a^p/p + b^q/q (1/p + 1/q = 1)	a^p = b^q	Engine that drives Hölder's proof
Cauchy-Schwarz	\|∫fg\| ≤ ‖f‖₂ · ‖g‖₂	f ∝ g (a.e.)	The p = q = 2 case; L² is self-dual
Hölder	∫\|fg\| ≤ ‖f‖_p · ‖g‖_q (1/p + 1/q = 1)	\|f\|^p ∝ \|g\|^q (a.e.)	L^p–L^q duality; foundation of L^p
Minkowski	‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p	f = λg (a.e., λ ≥ 0)	Triangle inequality for L^p norm
Generalized Hölder (n functions)	∫\|Πfᵢ\| ≤ Π‖fᵢ‖_{pᵢ} (Σ 1/pᵢ = 1)	\|fᵢ\|^{pᵢ} all proportional	Multilinear PDE estimates
Young's convolution	‖f*g‖_r ≤ ‖f‖_p · ‖g‖_q (1/p + 1/q = 1 + 1/r)	f, g Gaussian (extremal)	Smoothing estimates; Sobolev embeddings

Watch the 60-second explainer

The statement, precisely

Proof sketch — Young's inequality is the engine

Worked examples with numbers

Variants and generalizations

Hölder proves Minkowski (the triangle inequality)

Hölder and L^p–L^q duality

Common pitfalls

Where Hölder shows up

Frequently asked questions

Hölder, Cauchy-Schwarz, Young, Minkowski — at a glance

Related concepts