Inequalities
Cauchy-Schwarz Inequality
|⟨u, v⟩| ≤ ‖u‖·‖v‖ — universal in every inner product space, equality iff u and v are collinear
In any inner product space, |⟨u, v⟩| ≤ ‖u‖·‖v‖ — equality only when u, v are collinear. The algebraic shadow of cos θ ≤ 1.
- Statement|⟨u, v⟩| ≤ ‖u‖ · ‖v‖
- Equalityu, v linearly dependent (collinear)
- Holds inEvery inner product space, any dimension
- Sum form(Σaᵢbᵢ)² ≤ (Σaᵢ²)(Σbᵢ²)
- Integral form|∫fg| ≤ √(∫f²) · √(∫g²)
- DiscoveredCauchy 1821, Bunyakovsky 1859, Schwarz 1888
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The statement and its three faces
The Cauchy-Schwarz inequality (CSB inequality, or Cauchy-Bunyakovsky-Schwarz in fuller credit) says that in any inner product space (V, ⟨·,·⟩), every pair u, v ∈ V satisfies
|⟨u, v⟩| ≤ ‖u‖ · ‖v‖
where ‖x‖ = √⟨x, x⟩ is the induced norm. The inequality has three concrete forms:
- Sum form. For real or complex n-tuples a = (a₁, …, a_n), b = (b₁, …, b_n): |Σ aᵢ b̄ᵢ| ≤ (Σ|aᵢ|²)^(1/2) · (Σ|bᵢ|²)^(1/2). Squaring: (Σ aᵢ bᵢ)² ≤ (Σ aᵢ²)(Σ bᵢ²) for reals.
- Integral form. For square-integrable functions f, g on a measure space (Ω, μ): |∫ f ḡ dμ| ≤ (∫|f|² dμ)^(1/2) · (∫|g|² dμ)^(1/2). This is the Cauchy-Schwarz inequality in L²(Ω).
- Probability form. For random variables X, Y with finite second moments: |E[XY]| ≤ √(E[X²]) · √(E[Y²]). This drops out of the inner product ⟨X, Y⟩ = E[XY] on the Hilbert space L²(Ω, ℱ, P).
The three are not separate theorems — they are the same inequality stated in three concrete inner-product spaces. Once you prove the abstract version, all three follow by choosing the inner product.
The one-line proof
Almost every standard proof of Cauchy-Schwarz rests on a single observation: a vector dotted with itself is non-negative. Take any real t and form the vector u − tv. Then
0 ≤ ⟨u − tv, u − tv⟩
= ⟨u, u⟩ − 2t⟨u, v⟩ + t²⟨v, v⟩
= ‖u‖² − 2t⟨u, v⟩ + t² ‖v‖².
This is a non-negative quadratic in t. A non-negative real quadratic has discriminant ≤ 0:
(2⟨u, v⟩)² − 4‖u‖² ‖v‖² ≤ 0
⟨u, v⟩² ≤ ‖u‖² ‖v‖².
Taking square roots gives |⟨u, v⟩| ≤ ‖u‖ · ‖v‖. For complex inner products, replace t by a complex scalar and the modulus appears on the left.
Equality. The discriminant is zero iff the quadratic has a real root t*, i.e. ⟨u − t*v, u − t*v⟩ = 0, i.e. u − t*v = 0 (because the inner product is positive-definite). So equality holds iff u = t* v — u and v are collinear. The geometric collinearity condition appears mechanically from the algebra.
A second proof, even shorter, uses the cosine formula in 2D: ⟨u, v⟩ = ‖u‖ ‖v‖ cos θ, and |cos θ| ≤ 1. That proof is correct in ℝ² and ℝ³ but does not lift to abstract inner-product spaces without first proving the abstract inequality itself — which is exactly the role of the quadratic argument above.
Worked examples
Two-vector example in ℝ³
Take u = (1, 2, 2) and v = (2, 1, 0). Compute ⟨u, v⟩ = 1·2 + 2·1 + 2·0 = 4. The norms are ‖u‖ = √(1 + 4 + 4) = 3 and ‖v‖ = √(4 + 1 + 0) = √5. Then |⟨u, v⟩| = 4 and ‖u‖ · ‖v‖ = 3√5 ≈ 6.71. So 4 ≤ 6.71 with equality far from holding — the vectors are not collinear, in fact the angle between them is arccos(4 / 3√5) ≈ 53.4°.
Sum form on four numbers
For a = (1, 2, 3, 4), b = (1, 1, 1, 1) the squared sum form gives (Σ aᵢ)² = 10² = 100 versus (Σ aᵢ²)(Σ 1²) = 30 · 4 = 120. So 100 ≤ 120, strict inequality because (a₁, a₂, a₃, a₄) is not collinear with (1, 1, 1, 1). The Cauchy-Schwarz gap 120 − 100 = 20 is exactly 4 · Var(a) up to a scaling — the deviation from collinearity measures variability.
Correlation coefficient bounded by 1
For random variables X, Y with means μ_X, μ_Y, variances σ²_X, σ²_Y, define X' = X − μ_X and Y' = Y − μ_Y. The correlation is ρ = E[X'Y']/(σ_X σ_Y). By Cauchy-Schwarz applied to the inner product ⟨X', Y'⟩ = E[X'Y'] on L²(P): |E[X'Y']| ≤ √(E[X'²]) · √(E[Y'²]) = σ_X σ_Y. So |ρ| ≤ 1, with equality iff Y' = λ X' for some constant λ — Y is an affine function of X, the random-variable analogue of collinearity.
Discrete probability — covariance bound
Two indicator random variables X = 1_A and Y = 1_B with P(A) = 0.3, P(B) = 0.4, P(A ∩ B) = 0.15. Then E[XY] = 0.15, E[X²] = 0.3, E[Y²] = 0.4. Cauchy-Schwarz: 0.15 ≤ √(0.3 · 0.4) = √0.12 ≈ 0.346. The inequality bounds the joint probability by the geometric mean of marginals, a free constraint with no model assumption.
What Cauchy-Schwarz buys you
Almost every analytic inequality in an inner-product space goes through Cauchy-Schwarz at least once. A short tour:
- Triangle inequality. ‖u + v‖² = ‖u‖² + 2 Re⟨u, v⟩ + ‖v‖² ≤ ‖u‖² + 2‖u‖·‖v‖ + ‖v‖² = (‖u‖ + ‖v‖)². Take square roots.
- Hölder's inequality. The case p = q = 2 of Hölder's inequality |Σ aᵢbᵢ| ≤ (Σ|aᵢ|^p)^(1/p)(Σ|bᵢ|^q)^(1/q) (with 1/p + 1/q = 1) IS Cauchy-Schwarz. General Hölder is the natural generalisation to L^p spaces.
- Cosine inequality. For real vectors, ⟨u, v⟩ / (‖u‖ · ‖v‖) ∈ [−1, 1], so this ratio can be interpreted as cos of an angle θ. The very notion of angle in a high-dimensional inner-product space rests on Cauchy-Schwarz.
- Cauchy-Schwarz on operators. For a bounded linear operator A on a Hilbert space, |⟨Au, v⟩| ≤ ‖A‖ ‖u‖ ‖v‖. This is the operator-norm version, used throughout functional analysis.
- Variance and covariance. |Cov(X, Y)| ≤ √(Var X) · √(Var Y), the random-variable form already proven above.
- Schwarz inequality in PDE. Energy estimates for elliptic PDEs (Poisson, heat, Schrödinger) routinely use ∫ ∇u · ∇v dx ≤ (∫|∇u|²)^(1/2) (∫|∇v|²)^(1/2) — Cauchy-Schwarz in the Sobolev space H¹.
Cauchy-Schwarz vs other inequalities
| Inequality | Statement | Setting | Special case of |
|---|---|---|---|
| Cauchy-Schwarz | |⟨u, v⟩| ≤ ‖u‖ ‖v‖ | Inner-product space | Hölder with p = q = 2 |
| Hölder | |Σ aᵢ bᵢ| ≤ (Σ|aᵢ|^p)^(1/p) (Σ|bᵢ|^q)^(1/q) | L^p spaces, 1/p + 1/q = 1 | Concave Jensen on log |
| Minkowski | ‖u + v‖_p ≤ ‖u‖_p + ‖v‖_p | L^p spaces | Triangle inequality in L^p |
| Triangle | ‖u + v‖ ≤ ‖u‖ + ‖v‖ | Any normed space | Minkowski with p = 2 in inner-product spaces; uses CS |
| AM-QM | (Σ aᵢ²/n)^(1/2) ≥ Σ aᵢ/n | Real numbers | Cauchy-Schwarz with b = (1,...,1) |
| Jensen | f(E[X]) ≤ E[f(X)] for convex f | Probability space | Convex analysis |
| Correlation bound | |ρ(X, Y)| ≤ 1 | L²(P) | CS in L² |
Cauchy-Schwarz is the workhorse at the inner-product level; Hölder and Minkowski extend it to L^p spaces (p ≠ 2); Jensen lives in a parallel hierarchy of expectation-of-convex-function inequalities. They overlap: AM ≥ GM, for instance, falls out of both Jensen (with f = −log) and Cauchy-Schwarz (via the QM-AM step plus iteration).
Three names, three proofs
Augustin-Louis Cauchy first wrote down the finite sum form (Σ aᵢ bᵢ)² ≤ (Σ aᵢ²)(Σ bᵢ²) in his 1821 textbook Cours d'Analyse, as an exercise in the algebraic theory of polynomial inequalities. Cauchy's proof was the discriminant-of-a-quadratic argument that appears above. Viktor Yakovlevich Bunyakovsky, working in Saint Petersburg, extended the inequality to integrals in his 1859 monograph on inequalities — the integral form |∫f g| ≤ √(∫f²)·√(∫g²) bears his name in Russian textbooks. Hermann Schwarz, unaware of Bunyakovsky's work, independently rediscovered the integral form in 1888 in his work on minimal surfaces; Schwarz's proof, organised around the abstract structure of the inner product, was what eventually propagated through 20th-century analysis under the name Schwarz's inequality. The full attribution Cauchy-Bunyakovsky-Schwarz (CBS) is standard in Eastern European texts; Western texts more often say Cauchy-Schwarz, occasionally Schwarz alone.
Where Cauchy-Schwarz earns its keep
- Statistics. The correlation coefficient ρ = Cov(X, Y) / (σ_X σ_Y) lies in [−1, 1] by Cauchy-Schwarz, with ±1 only when one variable is an affine function of the other. Every Pearson correlation in every dataset is implicitly using Cauchy-Schwarz to know its bound.
- Quantum mechanics. The Heisenberg uncertainty principle is proved by applying Cauchy-Schwarz to the inner product of the position and momentum operators acting on a state vector: |⟨Aψ, Bψ⟩| ≤ ‖Aψ‖ · ‖Bψ‖ becomes a lower bound on the product of standard deviations Δx · Δp ≥ ℏ/2.
- Signal processing — matched filter. Given a signal s(t) and a filter h(t), the output |∫ s(t) h(t) dt| ≤ ‖s‖₂ · ‖h‖₂ by Cauchy-Schwarz. The output is maximised (with equality) precisely when h is a scaled copy of s — the matched filter. Used in radar, sonar, gravitational-wave detection.
- Numerical linear algebra. The condition number of a matrix and many error-propagation bounds rest on Cauchy-Schwarz applied to the operator inner product. The conjugate gradient method's residual bounds use Cauchy-Schwarz on every iteration.
- Probability — convergence rates. Many concentration inequalities (Bernstein, Hoeffding, McDiarmid) are derived through Cauchy-Schwarz applied to the moment generating function. Variance reduction in Monte Carlo uses Cauchy-Schwarz to design control variates with maximum correlation.
- Functional analysis — Bessel's and Parseval's identities. Bessel's inequality Σ|⟨v, e_n⟩|² ≤ ‖v‖² for an orthonormal sequence {e_n} is Cauchy-Schwarz applied componentwise. Parseval's equality is Bessel saturated to equality on a complete basis.
- Machine learning — kernel methods. Reproducing kernel Hilbert spaces (RKHS) use the kernel function K(x, y) as an inner product. Generalisation bounds for kernel SVMs and Gaussian processes rely on Cauchy-Schwarz applied to the RKHS inner product to relate kernel evaluations to function norms.
- Cryptography — Schwartz-Zippel lemma. Polynomial identity testing uses a Cauchy-Schwarz-style bound on the probability that a non-zero polynomial vanishes on a random point. The two-prover interactive proof protocol (PCP) uses similar bounds on inner products of provers' answers.
Common mistakes
- Forgetting the absolute value on complex inner products. For complex inner products, ⟨u, v⟩ can be complex and you need |⟨u, v⟩|, not ⟨u, v⟩, on the left. Real-only proofs slip on this when ported to ℂⁿ.
- Using Cauchy-Schwarz on non-inner-product norms. The inequality lives in inner-product spaces. The ℓ^∞ max-norm and the ℓ¹ taxicab norm are not inner-product norms (they fail the parallelogram law), so Cauchy-Schwarz does not apply. Use Hölder's inequality in L^p for p ≠ 2.
- Trying to use Cauchy-Schwarz to prove independence. Equality |ρ| = 1 means perfect linear dependence, not independence. Strict inequality |ρ| < 1 does not imply independence — it only rules out an affine relationship. The dependent-but-uncorrelated example is Y = X² with X uniform on [−1, 1].
- Ignoring measurability and integrability hypotheses. The integral form requires f, g to be square integrable (in L²). For functions whose L² norm is infinite, the inequality is technically vacuous (∞ ≥ anything) and using it loses information.
- Mistaking Cauchy-Schwarz for an equality. Cauchy-Schwarz is a one-way bound; it tells you the maximum possible inner product, not the value. The expected error in heuristic arguments that treat ≤ as = is exactly the gap ‖u‖·‖v‖ − ⟨u, v⟩.
- Squaring before applying. Some texts state Cauchy-Schwarz in the squared form ⟨u, v⟩² ≤ ‖u‖² ‖v‖². Both are equivalent for non-negative real values, but if you forget to take the square root before writing the un-squared form you get a much weaker bound. Be explicit about which form you are using.
- Forgetting the inequality is sharp. Cauchy-Schwarz attains equality on collinear vectors, so the bound cannot be improved without additional structure. Searches for 'tighter' Cauchy-Schwarz are searches for inequalities that exploit extra information (orthogonality complement, positive components, etc.), not improvements to the universal bound.
Three alternative proofs
Cauchy-Schwarz has so many proofs that mathematicians collect them like trading cards. Three favourites alongside the discriminant argument above:
- Lagrange's identity. Direct expansion of (a₁b₂ − a₂b₁)² + (Σ aᵢbᵢ)² = (Σ aᵢ²)(Σ bᵢ²) in two-variable form generalises algebraically to any n. The first term is the squared cross product, manifestly non-negative; dropping it gives Cauchy-Schwarz.
- AM-GM proof. For positive aᵢ, bᵢ, apply AM ≥ GM to each pair (aᵢ², bᵢ²) and sum; the algebra simplifies to (Σ aᵢbᵢ)² ≤ (Σ aᵢ²)(Σ bᵢ²). This proof exposes Cauchy-Schwarz as a relative of AM-GM and Jensen, all sharing convex/concave structure.
- Projection proof. In an inner-product space, decompose u into u_∥ (parallel to v) and u_⊥ (orthogonal to v). Then ‖u‖² = ‖u_∥‖² + ‖u_⊥‖² ≥ ‖u_∥‖² = |⟨u, v⟩|² / ‖v‖². Rearranging gives Cauchy-Schwarz. This proof makes the equality condition transparent: u_⊥ = 0, so u is parallel to v.
- Engel form (Titu's lemma). For positive reals: Σ aᵢ² / bᵢ ≥ (Σ aᵢ)² / Σ bᵢ. A rearrangement of Cauchy-Schwarz that is the workhorse of competition mathematics — IMO solutions deploy it constantly. The Engel form makes Cauchy-Schwarz immediately applicable to sums of fractions.
Schwarz lemma and the Hardy-Schwarz mass
Hermann Schwarz's 1888 proof of Cauchy-Schwarz was a byproduct of his work on minimal surfaces and conformal mappings. In complex analysis Schwarz also proved the Schwarz lemma: if f is holomorphic on the unit disk with f(0) = 0 and |f| ≤ 1, then |f(z)| ≤ |z| with equality only when f is a rotation. The Schwarz lemma is unrelated to Cauchy-Schwarz in content but a useful pointer that the same Schwarz produced two of the most-cited single-letter inequalities in mathematics.
The Hardy-Schwarz mass — a related estimate from Schwarz's work — bounds the L²-mass of harmonic functions on a domain in terms of boundary integrals using Cauchy-Schwarz in an inner-product space of Dirichlet integrals. This estimate appears in the proof of the existence of conformal maps (Riemann mapping theorem) and in the calculus of variations.
Strengthened forms
Several refinements of Cauchy-Schwarz appear in different corners of analysis.
- Aczél's inequality. For real numbers with a₁² > Σ_{i>1} aᵢ² and b₁² > Σ_{i>1} bᵢ², (a₁b₁ − Σ_{i>1} aᵢbᵢ)² ≥ (a₁² − Σ_{i>1} aᵢ²)(b₁² − Σ_{i>1} bᵢ²). This reverse Cauchy-Schwarz appears in special relativity (Minkowski inner product) and in information geometry.
- Polya-Szegő. A symmetric-decreasing rearrangement of two functions can only increase their integral inner product — a strengthening of Cauchy-Schwarz under rearrangement that powers isoperimetric inequalities.
- Wirtinger. For continuously differentiable f vanishing at the endpoints of [0, π], ∫₀^π f² dx ≤ ∫₀^π (f')² dx — a Cauchy-Schwarz-flavoured Sobolev embedding bound, fundamental in PDE.
- Operator Cauchy-Schwarz. For positive operators A, B on a Hilbert space, |⟨Au, v⟩|² ≤ ⟨Au, u⟩ · ⟨Bv, v⟩ when AB = BA. Used in quantum information theory and the analysis of channel capacities.
Cauchy-Schwarz in Hilbert spaces and quantum mechanics
In an abstract Hilbert space H over ℂ, Cauchy-Schwarz takes the form |⟨ψ, φ⟩|² ≤ ⟨ψ, ψ⟩ · ⟨φ, φ⟩. Applied to operators, |⟨Aψ, Bψ⟩|² ≤ ⟨Aψ, Aψ⟩ · ⟨Bψ, Bψ⟩ = (Δ_ψ A)² (Δ_ψ B)² where Δ_ψ A is the standard deviation of observable A in state ψ. Combined with the commutator identity ⟨Aψ, Bψ⟩ − ⟨Bψ, Aψ⟩ = ⟨ψ, [A, B] ψ⟩ and a simple manipulation, this yields the Robertson-Schrödinger uncertainty relation:
Δ_ψ A · Δ_ψ B ≥ ½ |⟨ψ, [A, B] ψ⟩|.
For position X and momentum P with [X, P] = iℏ, the right-hand side becomes ℏ/2, giving Heisenberg's Δx · Δp ≥ ℏ/2. The famous uncertainty principle is, structurally, a single application of Cauchy-Schwarz in the Hilbert space of quantum states.
Cauchy-Schwarz and the Hilbert projection theorem
One of the cleanest consequences of Cauchy-Schwarz is the Hilbert projection theorem: for any closed convex subset C of a Hilbert space H and any point x ∈ H, there exists a unique closest point P_C(x) ∈ C. The projection P_C is non-expansive: ‖P_C(x) − P_C(y)‖ ≤ ‖x − y‖. The non-expansiveness is proved via Cauchy-Schwarz on the difference of projections.
This single theorem powers Gram-Schmidt orthonormalisation (projecting onto each basis vector in turn), the conjugate gradient method (residual orthogonality is projection onto the Krylov subspace), Fourier series (best L²-approximation by trigonometric polynomials is projection onto their span), and the entire theory of orthogonal polynomial approximation in numerical analysis.
Costed claims
Cauchy-Schwarz: equality iff u, v collinear — proven by the discriminant-of-a-non-negative-quadratic argument. The Robertson-Schrödinger uncertainty relation (and hence Heisenberg's Δx · Δp ≥ ℏ/2) is a one-line consequence of Cauchy-Schwarz in Hilbert space. Jensen: log E[X] ≥ E[log X] (concave version), the cousin inequality for expectations of convex functions. Strong duality holds under Slater's condition (strictly feasible interior). KKT: Lagrangian gradient = 0 + λ·g(x) = 0 (slackness). Convex hull of N points in 2D and 3D: O(N log N). Cauchy-Schwarz is the special case p = q = 2 of Hölder's inequality and the foundation of correlation bounds and the triangle inequality in inner-product spaces.
Frequently asked questions
What does Cauchy-Schwarz say exactly?
In any inner product space (V, ⟨·,·⟩), for all u, v ∈ V, |⟨u, v⟩| ≤ ‖u‖ · ‖v‖, where ‖x‖ = √⟨x, x⟩. Equality holds iff u and v are linearly dependent — collinear vectors. For real vectors the absolute value is unnecessary on the right; for complex inner products the modulus is required on the left because ⟨u, v⟩ can be complex.
What's the cleanest proof?
Consider the quadratic in real t: ⟨u − tv, u − tv⟩ = ‖u‖² − 2t⟨u, v⟩ + t² ‖v‖² ≥ 0 for every t (inner products of a vector with itself are non-negative). This is a non-negative quadratic in t, so its discriminant must be non-positive: 4⟨u, v⟩² − 4‖u‖² ‖v‖² ≤ 0, which rearranges to ⟨u, v⟩² ≤ ‖u‖² ‖v‖². Take square roots. Equality in the discriminant requires the quadratic to have a real root, which means u − tv = 0 for some t, i.e. u and v are linearly dependent.
When does equality hold?
Equality |⟨u, v⟩| = ‖u‖ · ‖v‖ holds if and only if u and v are linearly dependent — i.e. one is a scalar multiple of the other (collinear in the geometric picture). For real vectors with positive inner product, equality means u = λv with λ > 0. For real vectors with negative inner product, λ < 0. For complex inner products, λ is a complex scalar.
How does Cauchy-Schwarz give the triangle inequality?
Compute ‖u + v‖² = ⟨u + v, u + v⟩ = ‖u‖² + 2 Re⟨u, v⟩ + ‖v‖² ≤ ‖u‖² + 2|⟨u, v⟩| + ‖v‖² ≤ ‖u‖² + 2‖u‖·‖v‖ + ‖v‖² = (‖u‖ + ‖v‖)². Take square roots to get ‖u + v‖ ≤ ‖u‖ + ‖v‖. Cauchy-Schwarz is the third inequality used; the rest is just expanding the norm squared. Every triangle-inequality proof in an inner product space goes through Cauchy-Schwarz exactly once.
Why is correlation bounded between −1 and +1?
The correlation of two random variables X, Y with finite second moments is ρ = E[(X−μ_X)(Y−μ_Y)] / (σ_X σ_Y). The numerator is an inner product on the space of centred square-integrable random variables; the denominator is the product of induced norms. So ρ = ⟨X − μ_X, Y − μ_Y⟩ / (‖X − μ_X‖ · ‖Y − μ_Y‖). Cauchy-Schwarz says the absolute value of this ratio is at most 1, so −1 ≤ ρ ≤ 1. Equality at ±1 means X and Y are affinely related, the geometric collinearity condition for random variables.
Who actually proved it?
Augustin-Louis Cauchy proved the finite-dimensional version for sums in his 1821 Cours d'Analyse. Viktor Bunyakovsky generalised it to integrals in 1859. Hermann Schwarz independently proved the integral version in 1888 in his work on minimal surfaces, and his proof was the one that propagated through 20th-century mathematics. Many Eastern European texts call it the 'Cauchy-Bunyakovsky-Schwarz' inequality to credit all three; English-language texts usually drop Bunyakovsky.
Does Cauchy-Schwarz work in infinite dimensions?
Yes, in any inner product space — including infinite-dimensional Hilbert spaces like ℓ² (square-summable sequences) and L²(Ω) (square-integrable functions). The same quadratic-discriminant proof works because the only ingredients are bilinearity, conjugate symmetry, and positive-definiteness of the inner product. In L²(Ω) the inequality reads |∫ f g| ≤ (∫ |f|²)^(1/2) (∫ |g|²)^(1/2), the integral Cauchy-Schwarz.