Inequalities
Triangle Inequality
d(x, z) ≤ d(x, y) + d(y, z) — detours never shorten the journey
The triangle inequality is the statement d(x, z) ≤ d(x, y) + d(y, z) for any three points x, y, z in a metric space — the distance from x to z is at most the distance from x to y plus the distance from y to z. Equivalently: any side of a triangle is no longer than the sum of the other two. For Euclidean vectors it reads ‖a + b‖ ≤ ‖a‖ + ‖b‖; for real numbers |a + b| ≤ |a| + |b|; for L^p norms it is Minkowski's inequality. Together with identity-of-indiscernibles and symmetry, the triangle inequality is one of the three defining axioms of a metric space. The reverse form |d(x, y) − d(y, z)| ≤ d(x, z) says the difference of two distances is bounded by the third. Strengthening to d(x, z) ≤ max(d(x, y), d(y, z)) gives the ultrametric condition of p-adic and rooted-tree distances. It is the silent workhorse of every ε/2 splitting argument in analysis.
- Statementd(x, z) ≤ d(x, y) + d(y, z)
- Vectors‖a + b‖ ≤ ‖a‖ + ‖b‖
- Reverse|d(x,y) − d(y,z)| ≤ d(x,z)
- RoleMetric space axiom 3
- Ultrametricd(x,z) ≤ max(d(x,y), d(y,z))
- Equality (ℝⁿ)y between x and z (collinear)
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
Three points, one inequality
Let (X, d) be a metric space. For any three points x, y, z ∈ X:
d(x, z) ≤ d(x, y) + d(y, z)
In words: going from x to z directly cannot exceed going via y. The intermediate point y can be anywhere in X; the bound holds uniformly. Geometrically, in a triangle with vertices x, y, z, any one side is at most the sum of the other two. Picking y to be one of the endpoints gives the trivial bound d(x, x) = 0 ≤ d(x, y) + d(y, x) = 2 d(x, y); the content is when y is somewhere genuinely else.
Specializations of the same statement, written in the language of the problem:
Real numbers: |a + b| ≤ |a| + |b| (d(x, y) = |x − y|, set a = x − y, b = y − z)
Complex numbers: |z + w| ≤ |z| + |w| (modulus, same proof)
ℝⁿ Euclidean: ‖a + b‖₂ ≤ ‖a‖₂ + ‖b‖₂ (from Cauchy-Schwarz)
General normed: ‖a + b‖ ≤ ‖a‖ + ‖b‖ (axiom of a norm)
L^p (Minkowski): ‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p, 1 ≤ p ≤ ∞ (from Hölder)
Inner product: ‖a + b‖ ≤ ‖a‖ + ‖b‖ (from Cauchy-Schwarz)
Each line is the triangle inequality wearing a different mathematical costume.
Proof sketches in three settings
Real numbers. For any a, b ∈ ℝ, the elementary identity (a + b)² = a² + 2ab + b² ≤ a² + 2|a||b| + b² = (|a| + |b|)² gives, taking square roots, |a + b| ≤ |a| + |b|. The key step is the bound 2ab ≤ 2|ab| = 2|a||b|.
Euclidean ℝⁿ. ‖a + b‖² = ⟨a + b, a + b⟩ = ‖a‖² + 2⟨a, b⟩ + ‖b‖². By Cauchy-Schwarz, ⟨a, b⟩ ≤ |⟨a, b⟩| ≤ ‖a‖ · ‖b‖, so ‖a + b‖² ≤ ‖a‖² + 2‖a‖‖b‖ + ‖b‖² = (‖a‖ + ‖b‖)². Square roots: ‖a + b‖ ≤ ‖a‖ + ‖b‖. Cauchy-Schwarz is the engine of the Euclidean triangle inequality.
General metric space. The triangle inequality is taken as an axiom — it must be verified separately for each concrete metric you propose. For example, the discrete metric (d = 1 if x ≠ y) trivially satisfies it; the p-adic metric satisfies the stronger ultrametric form; the L^p norms satisfy it via Minkowski. The axiom is not derived from the others.
Numerical examples
Example 1 (real numbers):
a = 3, b = −7
|a + b| = |−4| = 4
|a| + |b| = 3 + 7 = 10
4 ≤ 10 ✓ (strict)
Example 2 (real numbers, equality):
a = 3, b = 4 (both positive, "same direction")
|a + b| = 7
|a| + |b| = 7
7 = 7 EQUALITY
Example 3 (vectors in ℝ²):
a = (3, 0), b = (0, 4)
a + b = (3, 4)
‖a + b‖₂ = 5
‖a‖₂ + ‖b‖₂ = 3 + 4 = 7
5 ≤ 7 ✓ (strict; not collinear)
Example 4 (vectors in ℝ², equality):
a = (3, 4), b = (6, 8) (b = 2a)
‖a‖₂ = 5, ‖b‖₂ = 10
‖a + b‖₂ = ‖(9, 12)‖₂ = 15
‖a‖₂ + ‖b‖₂ = 15
15 = 15 EQUALITY (parallel, same direction)
Example 5 (reverse triangle):
x = 0, y = 3, z = 7 on ℝ
d(x, y) = 3, d(y, z) = 4, d(x, z) = 7
|d(x, y) − d(y, z)| = 1
1 ≤ 7 ✓
Example 6 (ultrametric, 5-adic on ℚ):
x = 0, y = 25, z = 30
d_5(0, 25) = 1/25 (since 25 = 5² · 1)
d_5(25, 30) = 1/5 (since 30 − 25 = 5)
d_5(0, 30) = 1/5 (since 30 = 5 · 6)
max(1/25, 1/5) = 1/5
1/5 ≤ 1/5 EQUALITY (ultrametric)
The reverse triangle inequality
From the standard triangle d(x, y) ≤ d(x, z) + d(z, y), rearrange:
d(x, y) − d(z, y) ≤ d(x, z)
d(z, y) − d(x, y) ≤ d(x, z) (by swapping x ↔ z)
|d(x, y) − d(z, y)| ≤ d(x, z)
The reverse triangle inequality says: the difference of two distances to a fixed reference is bounded by the distance between the two starting points. Equivalent statement: the distance function d(·, p) : X → ℝ is 1-Lipschitz, with Lipschitz constant exactly 1. This is precisely why the distance function to a fixed set or point is continuous — and why ε-balls B(p, ε) are well-defined open sets in the metric topology.
The ultrametric strengthening
An ultrametric satisfies the stronger condition
d(x, z) ≤ max(d(x, y), d(y, z))
This is strictly stronger than the ordinary triangle inequality (since max(a, b) ≤ a + b). Spaces satisfying it have surprising geometry:
- Every triangle is isosceles. If d(x, y) < d(y, z), then d(x, z) ≤ max(d(x, y), d(y, z)) = d(y, z), and by symmetry d(y, z) ≤ max(d(x, y), d(x, z)) = d(x, z), forcing d(x, z) = d(y, z). So the two longer sides are equal.
- Every point is the centre of every ball it lies in. If y ∈ B(x, r), then B(y, r) = B(x, r).
- Balls are clopen. Every open ball is also closed, every closed ball is also open. The space is totally disconnected.
The canonical examples: the p-adic metric on ℚ (where x is "close" to y iff x − y is divisible by a high power of p), the Hamming metric on infinite-length binary strings, word distance on a rooted tree. All have been crucial in number theory, phylogenetics, and theoretical computer science.
When is the bound tight?
In Euclidean ℝⁿ (and any inner product space), equality d(x, z) = d(x, y) + d(y, z) holds iff y lies on the segment from x to z — that is, x, y, z are collinear and y is between x and z. The geodesic from x to z passes through y exactly when the triangle "collapses" to a line.
For vectors: ‖a + b‖ = ‖a‖ + ‖b‖ iff a and b are non-negatively proportional (one is a non-negative scalar multiple of the other). Trace through the Cauchy-Schwarz step: equality requires equality in CS, hence a ∥ b; the bound 2⟨a, b⟩ ≤ 2‖a‖‖b‖ becomes an equality also when ⟨a, b⟩ ≥ 0 — that is, same direction.
For real numbers: |a + b| = |a| + |b| iff a and b have the same sign (or one is zero). Different signs cancel; same signs add — the triangle becomes a flat segment.
For the Manhattan metric on ℝ²: equality holds when y is in the axis-aligned rectangle spanned by x and z. The "betweenness" geometry depends on the metric.
Common pitfalls
- Forgetting absolute values. |a + b| ≤ |a| + |b| has absolute values throughout. a + b ≤ a + b is trivially equality and tells you nothing.
- Assuming equality means "y on the segment" in non-Euclidean metrics. The betweenness geometry differs across metrics — Manhattan, Chebyshev, p-adic all have their own equality criteria.
- Confusing triangle with reverse triangle. Triangle bounds above by a sum; reverse bounds the difference by the third side. Both are equivalent up to algebra, but they answer different questions.
- Treating the ultrametric as a generic triangle. Many statements (like "every point of a ball is its centre") use the strict ultrametric form, not just the ordinary triangle inequality.
- Believing the triangle inequality is "obvious". For Euclidean ℝⁿ it relies on Cauchy-Schwarz; for L^p it relies on Hölder. Without these auxiliary inequalities, the triangle inequality for non-trivial norms has to be proved.
- Forgetting that some "distances" are not metrics. The squared Euclidean d²(x, y) = ‖x − y‖² is not a metric — it fails triangle. KL divergence is not a metric — it fails symmetry and triangle. Cosine similarity is not a metric — it doesn't satisfy d(x, x) = 0 in the right form.
Where triangle inequality shows up
- Defining metric spaces. The third axiom alongside identity and symmetry. Without it the metric framework collapses.
- Continuity proofs (the ε/2-trick). |f(x) − f(x₀)| = |f(x) − f(p) + f(p) − f(x₀)| ≤ |f(x) − f(p)| + |f(p) − f(x₀)|. Standard "split and bound" pattern in every ε-δ proof.
- Cauchy sequence proofs. Showing |xₘ − xₙ| ≤ |xₘ − a| + |a − xₙ| reduces a 2-point question to two 1-point questions against a fixed reference.
- Numerical error analysis. Total error ≤ truncation error + roundoff error — a triangle bound on the gap between computed and true values.
- Probabilistic bounds. Total variation distance, Wasserstein distance, KL-derived metrics all satisfy triangle (when properly defined). Useful for stitching together multiple bounds.
- Algorithm correctness. Approximate nearest neighbor algorithms use triangle inequality to prune candidates: if d(query, c) > d(query, p) + threshold then c is too far to be of interest.
- Geometric data structures. Metric tree indices (BK-trees, vantage-point trees, M-trees) exploit triangle to bound distances without exhaustive computation.
- Norm-based optimization. Convergence of gradient descent, Nesterov acceleration, and primal-dual methods all use triangle to bound error accumulation.
- Functional analysis. Lipschitz constants, operator norms, dual-space estimates — every quantitative bound on a function uses triangle somewhere.
Frequently asked questions
What is the triangle inequality?
d(x, z) ≤ d(x, y) + d(y, z) — the distance from x to z is at most the distance from x to y plus the distance from y to z. Geometrically: in any triangle with vertices x, y, z, the length of any one side is at most the sum of the other two. For real numbers it reads |a + b| ≤ |a| + |b|; for vectors ‖a + b‖ ≤ ‖a‖ + ‖b‖; for complex numbers |z + w| ≤ |z| + |w|. The same statement, three notations.
Why is triangle inequality the defining axiom of metric spaces?
Without it, "distance" has no transitive structure: knowing d(x, y) and d(y, z) are small tells you nothing about d(x, z). Convergence and continuity break down. Concretely: if xₙ → x and we want xₙ also close to a fixed point p, we use d(xₙ, p) ≤ d(xₙ, x) + d(x, p) — pure triangle. Every analytic estimate involving "splitting" or "inserting an intermediate point" relies on triangle. The other metric axioms (identity, symmetry) are bookkeeping; triangle is the structural backbone.
What is the reverse triangle inequality?
|d(x, y) − d(y, z)| ≤ d(x, z). It says the difference of two distances is bounded by the third. Proof: by triangle, d(x, y) ≤ d(x, z) + d(z, y), so d(x, y) − d(y, z) ≤ d(x, z); swapping x and z gives d(y, z) − d(x, y) ≤ d(x, z); take absolute values. Useful in continuity proofs of the distance function itself: d(·, p) : X → ℝ is 1-Lipschitz, with the Lipschitz constant exactly 1 — and this is the reverse triangle inequality.
When does equality hold in the triangle inequality?
In Euclidean ℝⁿ (and any inner product space), d(x, z) = d(x, y) + d(y, z) iff y lies on the line segment from x to z — that is, x, y, z are collinear and y is between x and z. For vectors: ‖a + b‖ = ‖a‖ + ‖b‖ iff a and b point in the same direction (one is a non-negative scalar multiple of the other). For real numbers |a + b| = |a| + |b| iff a and b have the same sign (or one is zero). In an arbitrary metric space the equality case depends on the metric — for the Manhattan metric on ℝ², equality means y is in the axis-aligned box spanned by x and z.
Where does the proof come from on ℝⁿ?
For Euclidean ℝⁿ the cleanest proof uses the Cauchy-Schwarz inequality. ‖a + b‖² = ⟨a + b, a + b⟩ = ‖a‖² + 2⟨a, b⟩ + ‖b‖² ≤ ‖a‖² + 2‖a‖‖b‖ + ‖b‖² = (‖a‖ + ‖b‖)², by Cauchy-Schwarz on the middle term. Take square roots. For L^p norms (p ≠ 2) the proof is Minkowski's inequality, which uses Hölder. For the discrete absolute value on ℝ: |a + b|² = (a + b)² ≤ |a|² + 2|a||b| + |b|² — same algebra, scalar case.
What is the ultrametric (strong triangle) inequality?
d(x, z) ≤ max(d(x, y), d(y, z)) — a strictly stronger condition. Spaces satisfying this are ultrametric. Examples: p-adic numbers (where "close" means the difference is divisible by a high power of p), word distance on rooted trees, Hamming distance on infinite-radius cluster trees. Consequences: every triangle is isosceles with the two longer sides equal; every point of an open ball is its centre; the open and closed balls of any positive radius are clopen. This non-Archimedean geometry is fundamental in arithmetic geometry, phylogenetics, and Galois theory.
How is triangle inequality used in everyday analysis?
Constantly. The ε/2-trick: to show |f(x) − f(x₀)| < ε, split as |f(x) − f(p)| + |f(p) − f(x₀)| < ε/2 + ε/2 by triangle, given each piece is bounded. Showing a Cauchy sequence: |xₘ − xₙ| ≤ |xₘ − a| + |a − xₙ|. Bounding error in a numerical algorithm: total error ≤ truncation error + roundoff error. The triangle inequality is the silent partner in every estimate. Almost no analysis proof avoids it.
Triangle inequality across settings
| Setting | Statement | Proof tool | Equality iff | Strength | Common use |
|---|---|---|---|---|---|
| Real numbers | |a + b| ≤ |a| + |b| | (a + b)² ≤ (|a| + |b|)² | same sign | + | Error bounds, every ε/2-trick |
| Euclidean ℝⁿ | ‖a + b‖₂ ≤ ‖a‖₂ + ‖b‖₂ | Cauchy-Schwarz | a ∥ b, same direction | ++ | Geometry, Hilbert spaces |
| L^p (Minkowski) | ‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p | Hölder applied to |f + g|^p | f = λg, λ ≥ 0 | ++ | Functional analysis, PDE estimates |
| General metric space | d(x, z) ≤ d(x, y) + d(y, z) | Axiom (not derived) | depends on d | +++ | Topology, fixed-point theorems |
| Ultrametric | d(x, z) ≤ max(d(x, y), d(y, z)) | Non-Archimedean valuation | two sides equal | ++++ | p-adics, phylogenetics, trees |
| Reverse triangle | |d(x, y) − d(y, z)| ≤ d(x, z) | Rearrange triangle | 3 points collinear | + | d(·, p) is 1-Lipschitz, continuity of dist |