Inequalities

Triangle Inequality

Q: Why is triangle inequality the defining axiom of metric spaces?

Without it, 'distance' has no transitive structure: knowing d(x, y) and d(y, z) are small tells you nothing about d(x, z). Convergence and continuity break down. Concretely: if xₙ → x and we want xₙ also close to a fixed point p, we use d(xₙ, p) ≤ d(xₙ, x) + d(x, p) — pure triangle. Every analytic estimate involving 'splitting' or 'inserting an intermediate point' relies on triangle. The other metric axioms (identity, symmetry) are bookkeeping; triangle is the structural backbone.

d(x, z) ≤ d(x, y) + d(y, z) — detours never shorten the journey

The triangle inequality is the statement d(x, z) ≤ d(x, y) + d(y, z) for any three points x, y, z in a metric space — the distance from x to z is at most the distance from x to y plus the distance from y to z. Equivalently: any side of a triangle is no longer than the sum of the other two. For Euclidean vectors it reads ‖a + b‖ ≤ ‖a‖ + ‖b‖; for real numbers |a + b| ≤ |a| + |b|; for L^p norms it is Minkowski's inequality. Together with identity-of-indiscernibles and symmetry, the triangle inequality is one of the three defining axioms of a metric space. The reverse form |d(x, y) − d(y, z)| ≤ d(x, z) says the difference of two distances is bounded by the third. Strengthening to d(x, z) ≤ max(d(x, y), d(y, z)) gives the ultrametric condition of p-adic and rooted-tree distances. It is the silent workhorse of every ε/2 splitting argument in analysis.

Statementd(x, z) ≤ d(x, y) + d(y, z)
Vectors‖a + b‖ ≤ ‖a‖ + ‖b‖
Reverse|d(x,y) − d(y,z)| ≤ d(x,z)
RoleMetric space axiom 3
Ultrametricd(x,z) ≤ max(d(x,y), d(y,z))
Equality (ℝⁿ)y between x and z (collinear)

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Three points, one inequality

Let (X, d) be a metric space. For any three points x, y, z ∈ X:

d(x, z) ≤ d(x, y) + d(y, z)

In words: going from x to z directly cannot exceed going via y. The intermediate point y can be anywhere in X; the bound holds uniformly. Geometrically, in a triangle with vertices x, y, z, any one side is at most the sum of the other two. Picking y to be one of the endpoints gives the trivial bound d(x, x) = 0 ≤ d(x, y) + d(y, x) = 2 d(x, y); the content is when y is somewhere genuinely else.

Specializations of the same statement, written in the language of the problem:

Real numbers:      |a + b| ≤ |a| + |b|                          (d(x, y) = |x − y|, set a = x − y, b = y − z)
Complex numbers:   |z + w| ≤ |z| + |w|                          (modulus, same proof)
ℝⁿ Euclidean:      ‖a + b‖₂ ≤ ‖a‖₂ + ‖b‖₂                        (from Cauchy-Schwarz)
General normed:    ‖a + b‖ ≤ ‖a‖ + ‖b‖                          (axiom of a norm)
L^p (Minkowski):   ‖f + g‖_p ≤ ‖f‖_p + ‖g‖_p, 1 ≤ p ≤ ∞          (from Hölder)
Inner product:     ‖a + b‖ ≤ ‖a‖ + ‖b‖                          (from Cauchy-Schwarz)

Each line is the triangle inequality wearing a different mathematical costume.

Proof sketches in three settings

Real numbers. For any a, b ∈ ℝ, the elementary identity (a + b)² = a² + 2ab + b² ≤ a² + 2|a||b| + b² = (|a| + |b|)² gives, taking square roots, |a + b| ≤ |a| + |b|. The key step is the bound 2ab ≤ 2|ab| = 2|a||b|.

Euclidean ℝⁿ. ‖a + b‖² = ⟨a + b, a + b⟩ = ‖a‖² + 2⟨a, b⟩ + ‖b‖². By Cauchy-Schwarz, ⟨a, b⟩ ≤ |⟨a, b⟩| ≤ ‖a‖ · ‖b‖, so ‖a + b‖² ≤ ‖a‖² + 2‖a‖‖b‖ + ‖b‖² = (‖a‖ + ‖b‖)². Square roots: ‖a + b‖ ≤ ‖a‖ + ‖b‖. Cauchy-Schwarz is the engine of the Euclidean triangle inequality.

General metric space. The triangle inequality is taken as an axiom — it must be verified separately for each concrete metric you propose. For example, the discrete metric (d = 1 if x ≠ y) trivially satisfies it; the p-adic metric satisfies the stronger ultrametric form; the L^p norms satisfy it via Minkowski. The axiom is not derived from the others.

Numerical examples

Example 1 (real numbers):
  a = 3,  b = −7
  |a + b| = |−4| = 4
  |a| + |b| = 3 + 7 = 10
  4 ≤ 10                            ✓  (strict)

Example 2 (real numbers, equality):
  a = 3,  b = 4    (both positive, "same direction")
  |a + b| = 7
  |a| + |b| = 7
  7 = 7                             EQUALITY

Example 3 (vectors in ℝ²):
  a = (3, 0),   b = (0, 4)
  a + b = (3, 4)
  ‖a + b‖₂ = 5
  ‖a‖₂ + ‖b‖₂ = 3 + 4 = 7
  5 ≤ 7                             ✓  (strict; not collinear)

Example 4 (vectors in ℝ², equality):
  a = (3, 4),  b = (6, 8)  (b = 2a)
  ‖a‖₂ = 5,  ‖b‖₂ = 10
  ‖a + b‖₂ = ‖(9, 12)‖₂ = 15
  ‖a‖₂ + ‖b‖₂ = 15
  15 = 15                           EQUALITY (parallel, same direction)

Example 5 (reverse triangle):
  x = 0, y = 3, z = 7 on ℝ
  d(x, y) = 3,  d(y, z) = 4,  d(x, z) = 7
  |d(x, y) − d(y, z)| = 1
  1 ≤ 7                             ✓

Example 6 (ultrametric, 5-adic on ℚ):
  x = 0, y = 25, z = 30
  d_5(0, 25) = 1/25 (since 25 = 5² · 1)
  d_5(25, 30) = 1/5 (since 30 − 25 = 5)
  d_5(0, 30) = 1/5 (since 30 = 5 · 6)
  max(1/25, 1/5) = 1/5
  1/5 ≤ 1/5                         EQUALITY (ultrametric)

The reverse triangle inequality

From the standard triangle d(x, y) ≤ d(x, z) + d(z, y), rearrange:

d(x, y) − d(z, y) ≤ d(x, z)
d(z, y) − d(x, y) ≤ d(x, z)    (by swapping x ↔ z)
|d(x, y) − d(z, y)| ≤ d(x, z)

The reverse triangle inequality says: the difference of two distances to a fixed reference is bounded by the distance between the two starting points. Equivalent statement: the distance function d(·, p) : X → ℝ is 1-Lipschitz, with Lipschitz constant exactly 1. This is precisely why the distance function to a fixed set or point is continuous — and why ε-balls B(p, ε) are well-defined open sets in the metric topology.

The ultrametric strengthening

An ultrametric satisfies the stronger condition

d(x, z) ≤ max(d(x, y), d(y, z))

This is strictly stronger than the ordinary triangle inequality (since max(a, b) ≤ a + b). Spaces satisfying it have surprising geometry:

Every triangle is isosceles. If d(x, y) < d(y, z), then d(x, z) ≤ max(d(x, y), d(y, z)) = d(y, z), and by symmetry d(y, z) ≤ max(d(x, y), d(x, z)) = d(x, z), forcing d(x, z) = d(y, z). So the two longer sides are equal.
Every point is the centre of every ball it lies in. If y ∈ B(x, r), then B(y, r) = B(x, r).
Balls are clopen. Every open ball is also closed, every closed ball is also open. The space is totally disconnected.

The canonical examples: the p-adic metric on ℚ (where x is "close" to y iff x − y is divisible by a high power of p), the Hamming metric on infinite-length binary strings, word distance on a rooted tree. All have been crucial in number theory, phylogenetics, and theoretical computer science.

When is the bound tight?

In Euclidean ℝⁿ (and any inner product space), equality d(x, z) = d(x, y) + d(y, z) holds iff y lies on the segment from x to z — that is, x, y, z are collinear and y is between x and z. The geodesic from x to z passes through y exactly when the triangle "collapses" to a line.

For vectors: ‖a + b‖ = ‖a‖ + ‖b‖ iff a and b are non-negatively proportional (one is a non-negative scalar multiple of the other). Trace through the Cauchy-Schwarz step: equality requires equality in CS, hence a ∥ b; the bound 2⟨a, b⟩ ≤ 2‖a‖‖b‖ becomes an equality also when ⟨a, b⟩ ≥ 0 — that is, same direction.

For real numbers: |a + b| = |a| + |b| iff a and b have the same sign (or one is zero). Different signs cancel; same signs add — the triangle becomes a flat segment.

For the Manhattan metric on ℝ²: equality holds when y is in the axis-aligned rectangle spanned by x and z. The "betweenness" geometry depends on the metric.

Common pitfalls

Forgetting absolute values. |a + b| ≤ |a| + |b| has absolute values throughout. a + b ≤ a + b is trivially equality and tells you nothing.
Assuming equality means "y on the segment" in non-Euclidean metrics. The betweenness geometry differs across metrics — Manhattan, Chebyshev, p-adic all have their own equality criteria.
Confusing triangle with reverse triangle. Triangle bounds above by a sum; reverse bounds the difference by the third side. Both are equivalent up to algebra, but they answer different questions.
Treating the ultrametric as a generic triangle. Many statements (like "every point of a ball is its centre") use the strict ultrametric form, not just the ordinary triangle inequality.
Believing the triangle inequality is "obvious". For Euclidean ℝⁿ it relies on Cauchy-Schwarz; for L^p it relies on Hölder. Without these auxiliary inequalities, the triangle inequality for non-trivial norms has to be proved.
Forgetting that some "distances" are not metrics. The squared Euclidean d²(x, y) = ‖x − y‖² is not a metric — it fails triangle. KL divergence is not a metric — it fails symmetry and triangle. Cosine similarity is not a metric — it doesn't satisfy d(x, x) = 0 in the right form.

Where triangle inequality shows up

Defining metric spaces. The third axiom alongside identity and symmetry. Without it the metric framework collapses.
Continuity proofs (the ε/2-trick). |f(x) − f(x₀)| = |f(x) − f(p) + f(p) − f(x₀)| ≤ |f(x) − f(p)| + |f(p) − f(x₀)|. Standard "split and bound" pattern in every ε-δ proof.
Cauchy sequence proofs. Showing |xₘ − xₙ| ≤ |xₘ − a| + |a − xₙ| reduces a 2-point question to two 1-point questions against a fixed reference.
Numerical error analysis. Total error ≤ truncation error + roundoff error — a triangle bound on the gap between computed and true values.
Probabilistic bounds. Total variation distance, Wasserstein distance, KL-derived metrics all satisfy triangle (when properly defined). Useful for stitching together multiple bounds.
Algorithm correctness. Approximate nearest neighbor algorithms use triangle inequality to prune candidates: if d(query, c) > d(query, p) + threshold then c is too far to be of interest.
Geometric data structures. Metric tree indices (BK-trees, vantage-point trees, M-trees) exploit triangle to bound distances without exhaustive computation.
Norm-based optimization. Convergence of gradient descent, Nesterov acceleration, and primal-dual methods all use triangle to bound error accumulation.
Functional analysis. Lipschitz constants, operator norms, dual-space estimates — every quantitative bound on a function uses triangle somewhere.

Frequently asked questions

What is the triangle inequality?

d(x, z) ≤ d(x, y) + d(y, z) — the distance from x to z is at most the distance from x to y plus the distance from y to z. Geometrically: in any triangle with vertices x, y, z, the length of any one side is at most the sum of the other two. For real numbers it reads |a + b| ≤ |a| + |b|; for vectors ‖a + b‖ ≤ ‖a‖ + ‖b‖; for complex numbers |z + w| ≤ |z| + |w|. The same statement, three notations.

Why is triangle inequality the defining axiom of metric spaces?

Without it, "distance" has no transitive structure: knowing d(x, y) and d(y, z) are small tells you nothing about d(x, z). Convergence and continuity break down. Concretely: if xₙ → x and we want xₙ also close to a fixed point p, we use d(xₙ, p) ≤ d(xₙ, x) + d(x, p) — pure triangle. Every analytic estimate involving "splitting" or "inserting an intermediate point" relies on triangle. The other metric axioms (identity, symmetry) are bookkeeping; triangle is the structural backbone.

What is the reverse triangle inequality?

|d(x, y) − d(y, z)| ≤ d(x, z). It says the difference of two distances is bounded by the third. Proof: by triangle, d(x, y) ≤ d(x, z) + d(z, y), so d(x, y) − d(y, z) ≤ d(x, z); swapping x and z gives d(y, z) − d(x, y) ≤ d(x, z); take absolute values. Useful in continuity proofs of the distance function itself: d(·, p) : X → ℝ is 1-Lipschitz, with the Lipschitz constant exactly 1 — and this is the reverse triangle inequality.

When does equality hold in the triangle inequality?

In Euclidean ℝⁿ (and any inner product space), d(x, z) = d(x, y) + d(y, z) iff y lies on the line segment from x to z — that is, x, y, z are collinear and y is between x and z. For vectors: ‖a + b‖ = ‖a‖ + ‖b‖ iff a and b point in the same direction (one is a non-negative scalar multiple of the other). For real numbers |a + b| = |a| + |b| iff a and b have the same sign (or one is zero). In an arbitrary metric space the equality case depends on the metric — for the Manhattan metric on ℝ², equality means y is in the axis-aligned box spanned by x and z.

Where does the proof come from on ℝⁿ?

For Euclidean ℝⁿ the cleanest proof uses the Cauchy-Schwarz inequality. ‖a + b‖² = ⟨a + b, a + b⟩ = ‖a‖² + 2⟨a, b⟩ + ‖b‖² ≤ ‖a‖² + 2‖a‖‖b‖ + ‖b‖² = (‖a‖ + ‖b‖)², by Cauchy-Schwarz on the middle term. Take square roots. For L^p norms (p ≠ 2) the proof is Minkowski's inequality, which uses Hölder. For the discrete absolute value on ℝ: |a + b|² = (a + b)² ≤ |a|² + 2|a||b| + |b|² — same algebra, scalar case.

What is the ultrametric (strong triangle) inequality?

d(x, z) ≤ max(d(x, y), d(y, z)) — a strictly stronger condition. Spaces satisfying this are ultrametric. Examples: p-adic numbers (where "close" means the difference is divisible by a high power of p), word distance on rooted trees, Hamming distance on infinite-radius cluster trees. Consequences: every triangle is isosceles with the two longer sides equal; every point of an open ball is its centre; the open and closed balls of any positive radius are clopen. This non-Archimedean geometry is fundamental in arithmetic geometry, phylogenetics, and Galois theory.

How is triangle inequality used in everyday analysis?