Inequalities
Minkowski's Inequality
‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p — the triangle inequality for L^p norms
Minkowski's inequality states that for any measurable functions f, g and exponent 1 ≤ p ≤ ∞, the L^p norm of the sum is bounded by the sum of the L^p norms: ‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p. The discrete version: (Σ|aₖ + bₖ|^p)^(1/p) ≤ (Σ|aₖ|^p)^(1/p) + (Σ|bₖ|^p)^(1/p). This is precisely the triangle inequality for the L^p norm — the axiom that makes ‖·‖_p a norm and L^p a normed vector space (completed: a Banach space). Hermann Minkowski stated it in his 1896 Geometrie der Zahlen. The standard proof reduces to Hölder's inequality applied to |f + g|^p. Equality holds iff f = λg a.e. for some λ ≥ 0. For 0 < p < 1 the inequality reverses; ‖·‖_p is then a quasi-norm, not a norm, and L^p is not a Banach space.
- Statement‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p
- Range1 ≤ p ≤ ∞
- ProofHölder applied to |f + g|^p
- Named afterHermann Minkowski, 1896
- Equality ifff = λg a.e., λ ≥ 0
- For p < 1Reverses; ‖·‖_p not a norm
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The statement and what it means
Let (X, Σ, μ) be a measure space and 1 ≤ p ≤ ∞. Define the L^p norm by
‖f‖_p = (∫_X |f|^p dμ)^(1/p) for 1 ≤ p < ∞
‖f‖_∞ = ess sup |f| for p = ∞
Minkowski's inequality says that for any measurable f, g:
‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p
Equivalently, on sequence spaces (counting measure on ℕ):
(Σ |a_k + b_k|^p)^(1/p) ≤ (Σ |a_k|^p)^(1/p) + (Σ |b_k|^p)^(1/p)
And for vectors in ℝⁿ (counting measure on {1, …, n}), the same formula. The case p = 2 is the familiar Euclidean triangle inequality ‖a + b‖ ≤ ‖a‖ + ‖b‖.
The geometric content is simple: the diagonal of a parallelogram with sides f and g is no longer than the sum of the two sides. Minkowski's contribution is to make this statement quantitatively precise for any exponent p ≥ 1, in any dimension, on any measure space.
Proof sketch — Hölder is the engine
For p = 1 and p = ∞ the proof is direct. For p = 1: |f + g| ≤ |f| + |g| pointwise, integrate. For p = ∞: take essential sup of both sides.
For 1 < p < ∞, write |f + g|^p = |f + g| · |f + g|^(p−1), apply the pointwise triangle inequality |f + g| ≤ |f| + |g|, and split:
∫ |f + g|^p ≤ ∫ |f| · |f + g|^(p−1) + ∫ |g| · |f + g|^(p−1)
Now apply Hölder's inequality to each term with conjugate exponent q = p / (p − 1) (so 1/p + 1/q = 1):
∫ |f| · |f + g|^(p−1) ≤ ‖f‖_p · ‖|f + g|^(p−1)‖_q
= ‖f‖_p · (∫ |f + g|^p)^((p−1)/p)
= ‖f‖_p · ‖f + g‖_p^(p−1)
An identical bound holds for the g term. Summing and dividing both sides by ‖f + g‖_p^(p−1) gives:
‖f + g‖_p = ‖f + g‖_p^p / ‖f + g‖_p^(p−1) ≤ ‖f‖_p + ‖g‖_p □
The chain is short. Hölder is the only inequality used; everything else is algebra. Minkowski rests on Hölder, which rests on Young's inequality, which rests on convexity of log. The hierarchy is tight.
Numerical examples
Example 1 (Euclidean, p = 2):
a = (3, 4), b = (0, 5)
a + b = (3, 9)
‖a‖₂ = 5, ‖b‖₂ = 5, ‖a + b‖₂ = √90 ≈ 9.487
‖a‖₂ + ‖b‖₂ = 10
9.487 ≤ 10 ✓
Example 2 (Manhattan, p = 1):
a = (3, 4), b = (0, 5)
a + b = (3, 9)
‖a‖₁ = 7, ‖b‖₁ = 5, ‖a + b‖₁ = 12
‖a‖₁ + ‖b‖₁ = 12
12 ≤ 12 EQUALITY (a, b have same-sign coordinates)
Example 3 (Chebyshev, p = ∞):
a = (3, 4), b = (0, 5)
a + b = (3, 9)
‖a‖_∞ = 4, ‖b‖_∞ = 5, ‖a + b‖_∞ = 9
‖a‖_∞ + ‖b‖_∞ = 9
9 ≤ 9 EQUALITY (max coordinate in same slot)
Example 4 (p = 3):
a = (1, 2), b = (2, 1)
a + b = (3, 3)
‖a‖₃ = (1+8)^(1/3) = 9^(1/3) ≈ 2.080
‖b‖₃ ≈ 2.080
‖a+b‖₃ = (27+27)^(1/3) = 54^(1/3) ≈ 3.780
‖a‖₃ + ‖b‖₃ ≈ 4.160
3.780 ≤ 4.160 ✓
Example 5 (equality at p = 2):
a = (3, 4), b = (6, 8) (b = 2a)
‖a‖₂ = 5, ‖b‖₂ = 10, ‖a + b‖₂ = 15
‖a‖₂ + ‖b‖₂ = 15
15 = 15 EQUALITY (a, b proportional, same direction)
The geometric pattern is consistent: equality in the L² triangle inequality holds when the vectors are non-negatively parallel; for p = 1 and p = ∞ equality is easier to achieve.
Variants and generalizations
- Triangle inequality on ℝⁿ. The classical ‖a + b‖₂ ≤ ‖a‖₂ + ‖b‖₂ is Minkowski with p = 2 and counting measure on {1, …, n}.
- Sequences (ℓ^p). (Σ|aₖ + bₖ|^p)^(1/p) ≤ (Σ|aₖ|^p)^(1/p) + (Σ|bₖ|^p)^(1/p). Makes ℓ^p a normed (and hence Banach) space.
- Functions (L^p). (∫|f + g|^p)^(1/p) ≤ (∫|f|^p)^(1/p) + (∫|g|^p)^(1/p). Makes L^p a normed space and (after completion) a Banach space.
- Minkowski's integral inequality. ‖∫ f(·, y) dy‖_p ≤ ∫ ‖f(·, y)‖_p dy. Continuous-parameter version of triangle inequality. Used for bounding integral operators.
- Reverse Minkowski (0 < p < 1). The inequality reverses: ‖f + g‖_p ≥ ‖f‖_p + ‖g‖_p for non-negative f, g. So ‖·‖_p is not a norm; the natural translation-invariant metric is d(f, g) = ‖f − g‖_p^p, which is subadditive (an F-space metric).
- Minkowski-functional / gauge. For a convex symmetric set A containing the origin, p_A(x) = inf{t > 0 : x ∈ tA} is the Minkowski gauge — subadditive by the same convexity argument. Every norm is a Minkowski gauge of its unit ball.
- Weighted Minkowski. With a positive weight w(x), ‖f + g‖_{p, w} ≤ ‖f‖_{p, w} + ‖g‖_{p, w} where ‖·‖_{p, w} = (∫|·|^p w)^(1/p). Same proof.
- Anisotropic / mixed-norm Minkowski. ‖f‖_{L^p_x L^q_y} = (∫(∫|f|^q dy)^(p/q) dx)^(1/p) satisfies Minkowski in each slot separately.
Why Minkowski fails for p < 1
The proof above uses Hölder, which requires p ≥ 1 for conjugate q ≥ 1 to make sense. More fundamentally, for 0 < p < 1 the function t ↦ t^p is concave on [0, ∞), not convex; the unit ball of ‖·‖_p in ℝ² becomes a non-convex "astroid" shape rather than a convex disk. A counter-example for p = 1/2 on ℝ² with the natural definition:
a = (1, 0), b = (0, 1)
‖a‖_{1/2} = 1, ‖b‖_{1/2} = 1
‖a + b‖_{1/2} = (1^{1/2} + 1^{1/2})^2 = 4
4 > 2 = ‖a‖_{1/2} + ‖b‖_{1/2} triangle inequality FAILS
So for p < 1, ‖·‖_p is not even subadditive. The remedy is to use d(f, g) = ‖f − g‖_p^p (without the outer 1/p root) as a metric — this is subadditive and translation-invariant, and makes L^p (0 < p < 1) an F-space (a complete metric vector space). It is not a Banach space because no norm exists.
When is the bound attained?
For 1 < p < ∞, ‖f + g‖_p = ‖f‖_p + ‖g‖_p holds iff there exist non-negative constants α, β not both zero with αf = βg almost everywhere. Geometrically: f and g point in the same direction. The proof traces equality through Hölder: each Hölder application requires |f|^p ∝ |f + g|^p, i.e. f ∝ f + g, i.e. f ∝ g; and the sign-condition forces both proportionality constants to be non-negative.
For p = 1: equality holds whenever f and g have the same sign almost everywhere (because |f + g| = |f| + |g| pointwise on the agreement set). Much easier to attain.
For p = ∞: equality means ess sup |f + g| = ess sup |f| + ess sup |g|, which requires the two functions to attain their essential suprema at the same place, with the same sign.
Common pitfalls
- Forgetting that p ≥ 1 is required. Minkowski fails for 0 < p < 1. The triangle inequality on the "fractional L^p" reverses.
- Equality requires f = λg, not just |f| = λ|g|. For p > 1, f and g must point the same way — opposite signs break equality even with proportional magnitudes.
- Confusing Minkowski with Hölder. Minkowski bounds a sum-norm by sum of norms. Hölder bounds a product-integral by product of norms. Different statements with different uses.
- "Minkowski applies to any norm". Triangle inequality is a defining axiom of any norm, so trivially yes — but "Minkowski" specifically refers to the L^p form. Calling the general triangle inequality "Minkowski" is sloppy.
- Forgetting that Minkowski makes L^p a normed space, not just an inner-product space. Only L² has a compatible inner product; other L^p spaces are normed but not Hilbert (no inner product, no Cauchy-Schwarz natively).
- Believing the integral inequality is "obvious". Minkowski's integral inequality ‖∫f(·,y)dy‖_p ≤ ∫‖f(·,y)‖_p dy reverses the natural order of operations; the proof is a careful application of duality (the discrete triangle inequality being the warm-up).
Where Minkowski shows up
- L^p spaces and Banach spaces. Minkowski is the triangle inequality that makes ‖·‖_p a norm. After completion, L^p is a Banach space for 1 ≤ p ≤ ∞. Half of functional analysis is L^p theory.
- Probability theory. ‖X + Y‖_p ≤ ‖X‖_p + ‖Y‖_p bounds the p-th moment of a sum by the sum of p-th moments (Minkowski for random variables). The p = 2 case: standard deviation of a sum ≤ sum of standard deviations (in L², not in general; uncorrelated stronger bound is Pythagoras).
- Functional analysis. Bounded linear operators between L^p spaces satisfy ‖Tf + Tg‖_p ≤ ‖Tf‖_p + ‖Tg‖_p by linearity and Minkowski. Combined with Hölder, Minkowski gives the L^p–L^q duality.
- PDE estimates. Splitting a solution as u = u_lin + u_nonlin and bounding ‖u‖_p ≤ ‖u_lin‖_p + ‖u_nonlin‖_p is the standard energy-method recipe.
- Image and signal processing. Decomposing a signal into low- and high-frequency parts and bounding the L^p norm of the sum by Minkowski is fundamental to wavelet and Fourier denoising bounds.
- Geometry of numbers. Minkowski's original use: counting lattice points in convex bodies. The Minkowski sum A + B of convex sets has volume bounded below by Brunn-Minkowski — a deep refinement of the triangle inequality.
- Statistics. Bias-variance decomposition ‖estimator − target‖_p ≤ ‖estimator − E[estimator]‖_p + ‖E[estimator] − target‖_p is Minkowski applied to expectation differences.
- Optimization. Convexity of the L^p norm follows from Minkowski + homogeneity. Convexity makes L^p-regularized problems tractable (LASSO is p = 1; ridge is p = 2).
Frequently asked questions
What does Minkowski's inequality say?
It says the L^p norm of a sum is at most the sum of the L^p norms: ‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p for any 1 ≤ p ≤ ∞. For Euclidean ℝⁿ (p = 2) this is the familiar triangle inequality ‖a + b‖ ≤ ‖a‖ + ‖b‖. For general p it states the same geometric truth — the diagonal of the parallelogram is no longer than the sum of two sides — but measured with the L^p length function. It is what makes ‖·‖_p satisfy the triangle inequality axiom of a norm.
Why is Minkowski equivalent to the triangle inequality for L^p?
A norm on a vector space must satisfy three axioms: positive definiteness, homogeneity, and the triangle inequality. Positive definiteness and homogeneity are easy for ‖·‖_p; the triangle inequality is the only hard one. Minkowski is exactly that triangle inequality. Without Minkowski, L^p would not be a normed space — just a quasi-normed one. For 0 < p < 1 there is no Minkowski (the triangle inequality fails) and ‖·‖_p is a quasi-norm but not a norm; L^p with 0 < p < 1 is an F-space but not a Banach space.
How is Minkowski proved from Hölder?
Write |f + g|^p = |f + g| · |f + g|^(p−1) ≤ (|f| + |g|) · |f + g|^(p−1). Split into two pieces ∫|f| · |f + g|^(p−1) + ∫|g| · |f + g|^(p−1), and apply Hölder to each with conjugate exponent q = p/(p − 1). Each piece is bounded by ‖f‖_p · ‖|f + g|^(p−1)‖_q = ‖f‖_p · (∫|f+g|^p)^((p−1)/p). Sum and divide both sides by the common factor (∫|f+g|^p)^((p−1)/p) — the resulting exponent on the left is 1 − (p−1)/p = 1/p, and you get ‖f+g‖_p ≤ ‖f‖_p + ‖g‖_p.
When does equality hold in Minkowski?
For 1 < p < ∞, equality ‖f + g‖_p = ‖f‖_p + ‖g‖_p holds iff f and g are non-negatively proportional almost everywhere — that is, there exist constants α, β ≥ 0 not both zero with αf = βg a.e. (with the same sign). For p = 1 the condition is much weaker: equality holds whenever f and g have the same sign almost everywhere (because |f + g| = |f| + |g| pointwise). For p = ∞ equality means f and g achieve their essential supremum at the same place, up to a non-negative scalar. The vector intuition is identical to the Euclidean case: equality when the two vectors point in the same direction.
Does Minkowski hold for p < 1?
No — the inequality reverses. For 0 < p < 1, ‖f + g‖_p ≥ ‖f‖_p + ‖g‖_p is the correct direction, and ‖·‖_p is not a norm. Instead the metric d(f, g) = ∫|f − g|^p makes L^p (for 0 < p < 1) a translation-invariant metric space — an F-space. The unit ball is non-convex (looks like an astroid in ℝ²), which is the geometric reason convex-analysis techniques (Hahn-Banach, separation theorems) fail. Most of analysis lives in p ≥ 1 because Minkowski lives there.
How is Minkowski used in everyday analysis?
Anywhere you want to bound ‖A + B‖ by ‖A‖ + ‖B‖ in an L^p setting. Decomposing a signal as f = (low frequency) + (high frequency), Minkowski lets you bound each piece's L^p norm separately. Splitting a PDE into linear plus nonlinear and estimating each with L^p techniques relies on Minkowski. Variance decomposition Var(X + Y) ≤ 2(Var X + Var Y) is a Minkowski-flavoured bound (for the L² norm of random variables). Minkowski's integral inequality ‖∫f(·, y) dy‖_p ≤ ∫‖f(·, y)‖_p dy extends the discrete triangle inequality to a continuous parameter — used constantly in Fourier analysis and integral operator theory.
What is Minkowski's integral inequality?
It generalizes Minkowski to integrals over a parameter: ‖∫ f(x, y) dy‖_{L^p_x} ≤ ∫ ‖f(·, y)‖_{L^p_x} dy. The L^p norm of an integrated family of functions is at most the integrated L^p norm. The discrete Minkowski Σ_k ‖fₖ‖_p ≥ ‖Σ_k fₖ‖_p is the case of a sum (counting measure on the y-variable). Indispensable for bounding solution operators: if u = ∫ K(x, y) f(y) dy then ‖u‖_p ≤ ∫ ‖K(·, y)‖_p · |f(y)| dy — used in heat-kernel, wave, and Green-function estimates.
L^p triangle inequality across exponents
| p | Form of inequality | L^p is... | Equality condition | Geometric ball | Used in |
|---|---|---|---|---|---|
| p = 1 | Σ|aₖ + bₖ| ≤ Σ|aₖ| + Σ|bₖ| | Banach (L¹) | a, b same sign a.e. | Diamond (rotated square) | Total-variation, LASSO regression |
| 1 < p < 2 | Σ|aₖ + bₖ|^p ≤ (...)^(1/p) + (...)^(1/p) | Banach, not Hilbert | a = λb, λ ≥ 0 | Rounded diamond | Robust statistics, sparse signal recovery |
| p = 2 | ‖a + b‖₂ ≤ ‖a‖₂ + ‖b‖₂ | Hilbert (L²) | a = λb, λ ≥ 0 | Disk (Euclidean) | Fourier, QM, least squares, ridge regression |
| 2 < p < ∞ | ‖a + b‖_p ≤ ‖a‖_p + ‖b‖_p | Banach, not Hilbert | a = λb, λ ≥ 0 | Rounded square | Sobolev embedding, extremal lengths |
| p = ∞ | max|aₖ + bₖ| ≤ max|aₖ| + max|bₖ| | Banach (L^∞) | Maxes attained at same index, same sign | Square (Chebyshev) | Uniform bounds, worst-case analysis |
| 0 < p < 1 | Reverses: Σ|·|^p ≤ Σ|aₖ|^p + Σ|bₖ|^p (so ‖·‖_p quasi-norm) | F-space (not Banach) | — | Astroid (non-convex) | Compressed sensing (p < 1 quasi-norm) |