Inequalities

Minkowski's Inequality

Q: How is Minkowski proved from Hölder?

Write |f + g|^p = |f + g| · |f + g|^(p−1) ≤ (|f| + |g|) · |f + g|^(p−1). Split into two pieces ∫|f| · |f + g|^(p−1) + ∫|g| · |f + g|^(p−1), and apply Hölder to each with conjugate exponent q = p/(p − 1). Each piece is bounded by ‖f‖_p · ‖|f + g|^(p−1)‖_q = ‖f‖_p · (∫|f+g|^p)^((p−1)/p). Sum and divide both sides by the common factor (∫|f+g|^p)^((p−1)/p) — the resulting exponent on the left is 1 − (p−1)/p = 1/p, and you get ‖f+g‖_p ≤ ‖f‖_p + ‖g‖_p.

Q: When does equality hold in Minkowski?

For 1 < p < ∞, equality ‖f + g‖_p = ‖f‖_p + ‖g‖_p holds iff f and g are non-negatively proportional almost everywhere — that is, there exist constants α, β ≥ 0 not both zero with αf = βg a.e. (with the same sign). For p = 1 the condition is much weaker: equality holds whenever f and g have the same sign almost everywhere (because |f + g| = |f| + |g| pointwise). For p = ∞ equality means f and g achieve their essential supremum at the same place, up to a non-negative scalar. The vector intuition is identical to the Euclidean case: equality when the two vectors point in the same direction.

Q: Does Minkowski hold for p < 1?

No — the inequality reverses. For 0 < p < 1, ‖f + g‖_p ≥ ‖f‖_p + ‖g‖_p is the correct direction, and ‖·‖_p is not a norm. Instead the metric d(f, g) = ∫|f − g|^p makes L^p (for 0 < p < 1) a translation-invariant metric space — an F-space. The unit ball is non-convex (looks like an astroid in ℝ²), which is the geometric reason convex-analysis techniques (Hahn-Banach, separation theorems) fail. Most of analysis lives in p ≥ 1 because Minkowski lives there.

Q: How is Minkowski used in everyday analysis?

Anywhere you want to bound ‖A + B‖ by ‖A‖ + ‖B‖ in an L^p setting. Decomposing a signal as f = (low frequency) + (high frequency), Minkowski lets you bound each piece's L^p norm separately. Splitting a PDE into linear plus nonlinear and estimating each with L^p techniques relies on Minkowski. Variance decomposition Var(X + Y) ≤ 2(Var X + Var Y) is a Minkowski-flavoured bound (for the L² norm of random variables). Minkowski's integral inequality ‖∫f(·, y) dy‖_p ≤ ∫‖f(·, y)‖_p dy extends the discrete triangle inequality to a continuous parameter — used constantly in Fourier analysis and integral operator theory.

Q: What is Minkowski's integral inequality?

It generalizes Minkowski to integrals over a parameter: ‖∫ f(x, y) dy‖_{L^p_x} ≤ ∫ ‖f(·, y)‖_{L^p_x} dy. The L^p norm of an integrated family of functions is at most the integrated L^p norm. The discrete Minkowski Σ_k ‖fₖ‖_p ≥ ‖Σ_k fₖ‖_p is the case of a sum (counting measure on the y-variable). Indispensable for bounding solution operators: if u = ∫ K(x, y) f(y) dy then ‖u‖_p ≤ ∫ ‖K(·, y)‖_p · |f(y)| dy — used in heat-kernel, wave, and Green-function estimates.

‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p — the triangle inequality for L^p norms

Minkowski's inequality states that for any measurable functions f, g and exponent 1 ≤ p ≤ ∞, the L^p norm of the sum is bounded by the sum of the L^p norms: ‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p. The discrete version: (Σ|aₖ + bₖ|^p)^(1/p) ≤ (Σ|aₖ|^p)^(1/p) + (Σ|bₖ|^p)^(1/p). This is precisely the triangle inequality for the L^p norm — the axiom that makes ‖·‖_p a norm and L^p a normed vector space (completed: a Banach space). Hermann Minkowski stated it in his 1896 Geometrie der Zahlen. The standard proof reduces to Hölder's inequality applied to |f + g|^p. Equality holds iff f = λg a.e. for some λ ≥ 0. For 0 < p < 1 the inequality reverses; ‖·‖_p is then a quasi-norm, not a norm, and L^p is not a Banach space.

Statement‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p
Range1 ≤ p ≤ ∞
ProofHölder applied to |f + g|^p
Named afterHermann Minkowski, 1896
Equality ifff = λg a.e., λ ≥ 0
For p < 1Reverses; ‖·‖_p not a norm

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The statement and what it means

Let (X, Σ, μ) be a measure space and 1 ≤ p ≤ ∞. Define the L^p norm by

‖f‖_p = (∫_X |f|^p dμ)^(1/p)             for 1 ≤ p < ∞
‖f‖_∞ = ess sup |f|                        for p = ∞

Minkowski's inequality says that for any measurable f, g:

‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p

Equivalently, on sequence spaces (counting measure on ℕ):

(Σ |a_k + b_k|^p)^(1/p) ≤ (Σ |a_k|^p)^(1/p) + (Σ |b_k|^p)^(1/p)

And for vectors in ℝⁿ (counting measure on {1, …, n}), the same formula. The case p = 2 is the familiar Euclidean triangle inequality ‖a + b‖ ≤ ‖a‖ + ‖b‖.

The geometric content is simple: the diagonal of a parallelogram with sides f and g is no longer than the sum of the two sides. Minkowski's contribution is to make this statement quantitatively precise for any exponent p ≥ 1, in any dimension, on any measure space.

Proof sketch — Hölder is the engine

For p = 1 and p = ∞ the proof is direct. For p = 1: |f + g| ≤ |f| + |g| pointwise, integrate. For p = ∞: take essential sup of both sides.

For 1 < p < ∞, write |f + g|^p = |f + g| · |f + g|^(p−1), apply the pointwise triangle inequality |f + g| ≤ |f| + |g|, and split:

∫ |f + g|^p ≤ ∫ |f| · |f + g|^(p−1) + ∫ |g| · |f + g|^(p−1)

Now apply Hölder's inequality to each term with conjugate exponent q = p / (p − 1) (so 1/p + 1/q = 1):

∫ |f| · |f + g|^(p−1) ≤ ‖f‖_p · ‖|f + g|^(p−1)‖_q
                       = ‖f‖_p · (∫ |f + g|^p)^((p−1)/p)
                       = ‖f‖_p · ‖f + g‖_p^(p−1)

An identical bound holds for the g term. Summing and dividing both sides by ‖f + g‖_p^(p−1) gives:

‖f + g‖_p = ‖f + g‖_p^p / ‖f + g‖_p^(p−1) ≤ ‖f‖_p + ‖g‖_p     □

The chain is short. Hölder is the only inequality used; everything else is algebra. Minkowski rests on Hölder, which rests on Young's inequality, which rests on convexity of log. The hierarchy is tight.

Numerical examples

Example 1 (Euclidean, p = 2):
  a = (3, 4),   b = (0, 5)
  a + b = (3, 9)
  ‖a‖₂ = 5,  ‖b‖₂ = 5,  ‖a + b‖₂ = √90 ≈ 9.487
  ‖a‖₂ + ‖b‖₂ = 10
  9.487 ≤ 10            ✓

Example 2 (Manhattan, p = 1):
  a = (3, 4),   b = (0, 5)
  a + b = (3, 9)
  ‖a‖₁ = 7,  ‖b‖₁ = 5,  ‖a + b‖₁ = 12
  ‖a‖₁ + ‖b‖₁ = 12
  12 ≤ 12               EQUALITY (a, b have same-sign coordinates)

Example 3 (Chebyshev, p = ∞):
  a = (3, 4),   b = (0, 5)
  a + b = (3, 9)
  ‖a‖_∞ = 4,  ‖b‖_∞ = 5,  ‖a + b‖_∞ = 9
  ‖a‖_∞ + ‖b‖_∞ = 9
  9 ≤ 9                 EQUALITY (max coordinate in same slot)

Example 4 (p = 3):
  a = (1, 2),  b = (2, 1)
  a + b = (3, 3)
  ‖a‖₃ = (1+8)^(1/3) = 9^(1/3) ≈ 2.080
  ‖b‖₃ ≈ 2.080
  ‖a+b‖₃ = (27+27)^(1/3) = 54^(1/3) ≈ 3.780
  ‖a‖₃ + ‖b‖₃ ≈ 4.160
  3.780 ≤ 4.160         ✓

Example 5 (equality at p = 2):
  a = (3, 4),   b = (6, 8)   (b = 2a)
  ‖a‖₂ = 5,  ‖b‖₂ = 10,  ‖a + b‖₂ = 15
  ‖a‖₂ + ‖b‖₂ = 15
  15 = 15               EQUALITY (a, b proportional, same direction)

The geometric pattern is consistent: equality in the L² triangle inequality holds when the vectors are non-negatively parallel; for p = 1 and p = ∞ equality is easier to achieve.

Variants and generalizations

Triangle inequality on ℝⁿ. The classical ‖a + b‖₂ ≤ ‖a‖₂ + ‖b‖₂ is Minkowski with p = 2 and counting measure on {1, …, n}.
Sequences (ℓ^p). (Σ|aₖ + bₖ|^p)^(1/p) ≤ (Σ|aₖ|^p)^(1/p) + (Σ|bₖ|^p)^(1/p). Makes ℓ^p a normed (and hence Banach) space.
Functions (L^p). (∫|f + g|^p)^(1/p) ≤ (∫|f|^p)^(1/p) + (∫|g|^p)^(1/p). Makes L^p a normed space and (after completion) a Banach space.
Minkowski's integral inequality. ‖∫ f(·, y) dy‖_p ≤ ∫ ‖f(·, y)‖_p dy. Continuous-parameter version of triangle inequality. Used for bounding integral operators.
Reverse Minkowski (0 < p < 1). The inequality reverses: ‖f + g‖_p ≥ ‖f‖_p + ‖g‖_p for non-negative f, g. So ‖·‖_p is not a norm; the natural translation-invariant metric is d(f, g) = ‖f − g‖_p^p, which is subadditive (an F-space metric).
Minkowski-functional / gauge. For a convex symmetric set A containing the origin, p_A(x) = inf{t > 0 : x ∈ tA} is the Minkowski gauge — subadditive by the same convexity argument. Every norm is a Minkowski gauge of its unit ball.
Weighted Minkowski. With a positive weight w(x), ‖f + g‖_{p, w} ≤ ‖f‖_{p, w} + ‖g‖_{p, w} where ‖·‖_{p, w} = (∫|·|^p w)^(1/p). Same proof.
Anisotropic / mixed-norm Minkowski. ‖f‖_{L^p_x L^q_y} = (∫(∫|f|^q dy)^(p/q) dx)^(1/p) satisfies Minkowski in each slot separately.

Why Minkowski fails for p < 1

The proof above uses Hölder, which requires p ≥ 1 for conjugate q ≥ 1 to make sense. More fundamentally, for 0 < p < 1 the function t ↦ t^p is concave on [0, ∞), not convex; the unit ball of ‖·‖_p in ℝ² becomes a non-convex "astroid" shape rather than a convex disk. A counter-example for p = 1/2 on ℝ² with the natural definition:

a = (1, 0),  b = (0, 1)
‖a‖_{1/2} = 1,  ‖b‖_{1/2} = 1
‖a + b‖_{1/2} = (1^{1/2} + 1^{1/2})^2 = 4
4 > 2 = ‖a‖_{1/2} + ‖b‖_{1/2}              triangle inequality FAILS

So for p < 1, ‖·‖_p is not even subadditive. The remedy is to use d(f, g) = ‖f − g‖_p^p (without the outer 1/p root) as a metric — this is subadditive and translation-invariant, and makes L^p (0 < p < 1) an F-space (a complete metric vector space). It is not a Banach space because no norm exists.

When is the bound attained?

For 1 < p < ∞, ‖f + g‖_p = ‖f‖_p + ‖g‖_p holds iff there exist non-negative constants α, β not both zero with αf = βg almost everywhere. Geometrically: f and g point in the same direction. The proof traces equality through Hölder: each Hölder application requires |f|^p ∝ |f + g|^p, i.e. f ∝ f + g, i.e. f ∝ g; and the sign-condition forces both proportionality constants to be non-negative.

For p = 1: equality holds whenever f and g have the same sign almost everywhere (because |f + g| = |f| + |g| pointwise on the agreement set). Much easier to attain.

For p = ∞: equality means ess sup |f + g| = ess sup |f| + ess sup |g|, which requires the two functions to attain their essential suprema at the same place, with the same sign.

Common pitfalls

Forgetting that p ≥ 1 is required. Minkowski fails for 0 < p < 1. The triangle inequality on the "fractional L^p" reverses.
Equality requires f = λg, not just |f| = λ|g|. For p > 1, f and g must point the same way — opposite signs break equality even with proportional magnitudes.
Confusing Minkowski with Hölder. Minkowski bounds a sum-norm by sum of norms. Hölder bounds a product-integral by product of norms. Different statements with different uses.
"Minkowski applies to any norm". Triangle inequality is a defining axiom of any norm, so trivially yes — but "Minkowski" specifically refers to the L^p form. Calling the general triangle inequality "Minkowski" is sloppy.
Forgetting that Minkowski makes L^p a normed space, not just an inner-product space. Only L² has a compatible inner product; other L^p spaces are normed but not Hilbert (no inner product, no Cauchy-Schwarz natively).
Believing the integral inequality is "obvious". Minkowski's integral inequality ‖∫f(·,y)dy‖_p ≤ ∫‖f(·,y)‖_p dy reverses the natural order of operations; the proof is a careful application of duality (the discrete triangle inequality being the warm-up).

Where Minkowski shows up

L^p spaces and Banach spaces. Minkowski is the triangle inequality that makes ‖·‖_p a norm. After completion, L^p is a Banach space for 1 ≤ p ≤ ∞. Half of functional analysis is L^p theory.
Probability theory. ‖X + Y‖_p ≤ ‖X‖_p + ‖Y‖_p bounds the p-th moment of a sum by the sum of p-th moments (Minkowski for random variables). The p = 2 case: standard deviation of a sum ≤ sum of standard deviations (in L², not in general; uncorrelated stronger bound is Pythagoras).
Functional analysis. Bounded linear operators between L^p spaces satisfy ‖Tf + Tg‖_p ≤ ‖Tf‖_p + ‖Tg‖_p by linearity and Minkowski. Combined with Hölder, Minkowski gives the L^p–L^q duality.
PDE estimates. Splitting a solution as u = u_lin + u_nonlin and bounding ‖u‖_p ≤ ‖u_lin‖_p + ‖u_nonlin‖_p is the standard energy-method recipe.
Image and signal processing. Decomposing a signal into low- and high-frequency parts and bounding the L^p norm of the sum by Minkowski is fundamental to wavelet and Fourier denoising bounds.
Geometry of numbers. Minkowski's original use: counting lattice points in convex bodies. The Minkowski sum A + B of convex sets has volume bounded below by Brunn-Minkowski — a deep refinement of the triangle inequality.
Statistics. Bias-variance decomposition ‖estimator − target‖_p ≤ ‖estimator − E[estimator]‖_p + ‖E[estimator] − target‖_p is Minkowski applied to expectation differences.
Optimization. Convexity of the L^p norm follows from Minkowski + homogeneity. Convexity makes L^p-regularized problems tractable (LASSO is p = 1; ridge is p = 2).

Frequently asked questions

What does Minkowski's inequality say?

It says the L^p norm of a sum is at most the sum of the L^p norms: ‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p for any 1 ≤ p ≤ ∞. For Euclidean ℝⁿ (p = 2) this is the familiar triangle inequality ‖a + b‖ ≤ ‖a‖ + ‖b‖. For general p it states the same geometric truth — the diagonal of the parallelogram is no longer than the sum of two sides — but measured with the L^p length function. It is what makes ‖·‖_p satisfy the triangle inequality axiom of a norm.

Why is Minkowski equivalent to the triangle inequality for L^p?

A norm on a vector space must satisfy three axioms: positive definiteness, homogeneity, and the triangle inequality. Positive definiteness and homogeneity are easy for ‖·‖_p; the triangle inequality is the only hard one. Minkowski is exactly that triangle inequality. Without Minkowski, L^p would not be a normed space — just a quasi-normed one. For 0 < p < 1 there is no Minkowski (the triangle inequality fails) and ‖·‖_p is a quasi-norm but not a norm; L^p with 0 < p < 1 is an F-space but not a Banach space.

How is Minkowski proved from Hölder?

Write |f + g|^p = |f + g| · |f + g|^(p−1) ≤ (|f| + |g|) · |f + g|^(p−1). Split into two pieces ∫|f| · |f + g|^(p−1) + ∫|g| · |f + g|^(p−1), and apply Hölder to each with conjugate exponent q = p/(p − 1). Each piece is bounded by ‖f‖_p · ‖|f + g|^(p−1)‖_q = ‖f‖_p · (∫|f+g|^p)^((p−1)/p). Sum and divide both sides by the common factor (∫|f+g|^p)^((p−1)/p) — the resulting exponent on the left is 1 − (p−1)/p = 1/p, and you get ‖f+g‖_p ≤ ‖f‖_p + ‖g‖_p.

When does equality hold in Minkowski?

For 1 < p < ∞, equality ‖f + g‖_p = ‖f‖_p + ‖g‖_p holds iff f and g are non-negatively proportional almost everywhere — that is, there exist constants α, β ≥ 0 not both zero with αf = βg a.e. (with the same sign). For p = 1 the condition is much weaker: equality holds whenever f and g have the same sign almost everywhere (because |f + g| = |f| + |g| pointwise). For p = ∞ equality means f and g achieve their essential supremum at the same place, up to a non-negative scalar. The vector intuition is identical to the Euclidean case: equality when the two vectors point in the same direction.

Does Minkowski hold for p < 1?

No — the inequality reverses. For 0 < p < 1, ‖f + g‖_p ≥ ‖f‖_p + ‖g‖_p is the correct direction, and ‖·‖_p is not a norm. Instead the metric d(f, g) = ∫|f − g|^p makes L^p (for 0 < p < 1) a translation-invariant metric space — an F-space. The unit ball is non-convex (looks like an astroid in ℝ²), which is the geometric reason convex-analysis techniques (Hahn-Banach, separation theorems) fail. Most of analysis lives in p ≥ 1 because Minkowski lives there.

How is Minkowski used in everyday analysis?

Anywhere you want to bound ‖A + B‖ by ‖A‖ + ‖B‖ in an L^p setting. Decomposing a signal as f = (low frequency) + (high frequency), Minkowski lets you bound each piece's L^p norm separately. Splitting a PDE into linear plus nonlinear and estimating each with L^p techniques relies on Minkowski. Variance decomposition Var(X + Y) ≤ 2(Var X + Var Y) is a Minkowski-flavoured bound (for the L² norm of random variables). Minkowski's integral inequality ‖∫f(·, y) dy‖_p ≤ ∫‖f(·, y)‖_p dy extends the discrete triangle inequality to a continuous parameter — used constantly in Fourier analysis and integral operator theory.

What is Minkowski's integral inequality?

It generalizes Minkowski to integrals over a parameter: ‖∫ f(x, y) dy‖_{L^p_x} ≤ ∫ ‖f(·, y)‖_{L^p_x} dy. The L^p norm of an integrated family of functions is at most the integrated L^p norm. The discrete Minkowski Σ_k ‖fₖ‖_p ≥ ‖Σ_k fₖ‖_p is the case of a sum (counting measure on the y-variable). Indispensable for bounding solution operators: if u = ∫ K(x, y) f(y) dy then ‖u‖_p ≤ ∫ ‖K(·, y)‖_p · |f(y)| dy — used in heat-kernel, wave, and Green-function estimates.

L^p triangle inequality across exponents

p	Form of inequality	L^p is...	Equality condition	Geometric ball	Used in
p = 1	Σ\|aₖ + bₖ\| ≤ Σ\|aₖ\| + Σ\|bₖ\|	Banach (L¹)	a, b same sign a.e.	Diamond (rotated square)	Total-variation, LASSO regression
1 < p < 2	Σ\|aₖ + bₖ\|^p ≤ (...)^(1/p) + (...)^(1/p)	Banach, not Hilbert	a = λb, λ ≥ 0	Rounded diamond	Robust statistics, sparse signal recovery
p = 2	‖a + b‖₂ ≤ ‖a‖₂ + ‖b‖₂	Hilbert (L²)	a = λb, λ ≥ 0	Disk (Euclidean)	Fourier, QM, least squares, ridge regression
2 < p < ∞	‖a + b‖_p ≤ ‖a‖_p + ‖b‖_p	Banach, not Hilbert	a = λb, λ ≥ 0	Rounded square	Sobolev embedding, extremal lengths
p = ∞	max\|aₖ + bₖ\| ≤ max\|aₖ\| + max\|bₖ\|	Banach (L^∞)	Maxes attained at same index, same sign	Square (Chebyshev)	Uniform bounds, worst-case analysis
0 < p < 1	Reverses: Σ\|·\|^p ≤ Σ\|aₖ\|^p + Σ\|bₖ\|^p (so ‖·‖_p quasi-norm)	F-space (not Banach)	—	Astroid (non-convex)	Compressed sensing (p < 1 quasi-norm)

Watch the 60-second explainer

The statement and what it means

Proof sketch — Hölder is the engine

Numerical examples

Variants and generalizations

Why Minkowski fails for p < 1

When is the bound attained?

Common pitfalls

Where Minkowski shows up

Frequently asked questions

L^p triangle inequality across exponents

Related concepts