Linear Algebra

Orthogonal Projection

The closest point in a subspace, found by dropping a perpendicular

An orthogonal projection drops a vector perpendicularly onto a line, plane, or higher-dimensional subspace, returning the closest point inside that subspace. The formula proj_v(u) = (u·v / v·v) v gives the projection of u onto the line spanned by v. It is the foundation of least-squares regression, Fourier series, principal component analysis, and the Gram-Schmidt process. Geometrically, the residual u − proj_v(u) is perpendicular to v; algebraically, the projection is the unique closest point in the subspace under the Euclidean norm.

Onto a lineproj_v(u) = (u·v / v·v) v
Onto a subspaceP = A(AᵀA)⁻¹Aᵀ
ResultA vector inside the target subspace
Residualu − proj(u) is orthogonal to subspace
IdempotentP² = P
Closest-point propertyMinimizes |u − w| over w in W

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How orthogonal projection works

Imagine a vector u sitting somewhere in space, and a line through the origin in the direction of v. Walk along u and at every point ask: what is the perpendicular distance to the line? At one specific point — the foot of the perpendicular — you can drop straight down to the line. That foot is the projection of u onto v.

The projection is a vector along v, so write it as cv for some scalar c. The defining condition is that the residual u − cv is perpendicular to v:

(u − cv) · v = 0
   u · v − c (v · v) = 0
   c = (u · v) / (v · v)

Substituting back, the projection of u onto v is:

proj_v(u) = ( u · v / v · v ) v

The numerator measures how much u points along v; the denominator scales for v's length. If v happens to be a unit vector, v·v = 1 and the formula simplifies to (u·v) v.

Geometrically, the projection is the closest point on the line to u. Any other point cv' on the line is farther — the perpendicular drop is the shortest path from a point to a line.

Worked example: projecting numerically

Project u = (3, 4) onto v = (1, 0) — that is, onto the x-axis.

Compute u·v = 3·1 + 4·0 = 3.
Compute v·v = 1·1 + 0·0 = 1.
Apply the formula: proj_v(u) = (3 / 1)·(1, 0) = (3, 0).

The projection lands on the x-axis at (3, 0), which is exactly the foot of the perpendicular from (3, 4). The residual u − proj_v(u) = (0, 4) is perpendicular to (1, 0). Sanity check: (0, 4)·(1, 0) = 0.

A less trivial case: project u = (4, 5) onto v = (3, 1).

u·v = 4·3 + 5·1 = 17.
v·v = 9 + 1 = 10.
proj_v(u) = (17/10)·(3, 1) = (5.1, 1.7).
Residual: u − proj = (4 − 5.1, 5 − 1.7) = (−1.1, 3.3).
Verify perpendicularity: (−1.1, 3.3)·(3, 1) = −3.3 + 3.3 = 0.

Projection onto a line vs plane vs subspace

	Onto a line	Onto a plane	Onto a general subspace W
Defining data	One direction vector v	Two basis vectors v₁, v₂	k-dimensional basis or matrix A
Formula (orthogonal basis)	(u·v / v·v) v	Σ (u·vᵢ / vᵢ·vᵢ) vᵢ	Σ (u·vᵢ / vᵢ·vᵢ) vᵢ over k vectors
Formula (general basis)	Same — one vector is always orthogonal to itself	P = A(AᵀA)⁻¹Aᵀ	P = A(AᵀA)⁻¹Aᵀ
Residual	Perpendicular to v	Perpendicular to entire plane	In the orthogonal complement W⊥
Image dimension	1	2	k = dim(W)
Projection matrix rank	1	2	k
Closest-point property	Minimizes \|u − cv\|	Minimizes \|u − w\| over w in plane	Minimizes \|u − w\| over w in W

The pattern is identical at every dimension — sum one term per orthogonal basis vector. The only complication is that real-world bases are rarely orthogonal, so either you orthogonalize first (Gram-Schmidt) or you use the matrix form P = A(AᵀA)⁻¹Aᵀ, which silently solves the cross-terms via the AᵀA inversion.

The projection matrix

For projection onto a line spanned by v, the projection matrix is:

P = (v vᵀ) / (vᵀ v)

This is an outer product divided by a scalar. P has rank 1 — its image is exactly the line through v.

For projection onto the column space of a matrix A whose columns are linearly independent:

P = A (AᵀA)⁻¹ Aᵀ

Two key identities make P a true projection:

Idempotent: P² = P. Once you project, projecting again does nothing.
Symmetric: Pᵀ = P. The projection is along directions perpendicular to the subspace, not slanted.

The complementary projection I − P sends u to its perpendicular component, the residual. The pair (P, I − P) splits any vector into "inside the subspace" and "outside it" cleanly.

Where orthogonal projection shows up

Least-squares regression. Fitting a linear model y ≈ Ax means finding x̂ such that Ax̂ is the projection of y onto col(A). The normal equations AᵀAx̂ = Aᵀy are the projection condition in disguise. Every regression coefficient is a projection coordinate.
Fourier series. Expressing a function as a sum of sines and cosines is projection onto an orthogonal basis in an inner-product space. Each Fourier coefficient is a projection scalar.
Principal component analysis. PCA finds the subspace that maximizes the projected variance. Compressing data to a few dimensions is projecting onto that subspace.
Gram-Schmidt orthogonalization. The process subtracts off projections one at a time. Without projection, no Gram-Schmidt, and no QR decomposition.
Computer graphics. Shadow projection (object onto floor along light direction) and reflection use related projection-like maps. Scalar shadows on a flat ground plane are direct applications.
Signal processing. Filtering as projection onto the subspace of low-frequency components. Noise removal often reduces to "project away the noise subspace".
Quantum measurement. Measurement collapses a quantum state by projecting it onto an eigenstate. The projection postulate is one of quantum mechanics' three axioms.

Common mistakes

Forgetting the v·v denominator. The formula reduces to (u·v) v only when v is a unit vector. Skipping the divisor on a non-unit v gives an answer that is too long by a factor of |v|².
Using the simple sum on a non-orthogonal basis. Σ (u·vᵢ / vᵢ·vᵢ) vᵢ is correct only when the vᵢ are pairwise orthogonal. With a tilted basis, you must invert AᵀA.
Confusing projection with reflection. Projection sends u to its foot in W. Reflection through W sends u to 2·proj(u) − u, on the other side. They differ by a factor of 2 and a sign.
Treating the residual as small. The residual u − proj(u) is perpendicular to W but can be enormous. "Closest" in W still leaves the entire perpendicular distance unfilled.
Squaring P twice and getting different answers. If P² ≠ P, what you have is not a projection. The idempotency check is essential.
Inverting AᵀA when it is singular. If A's columns are linearly dependent, AᵀA is not invertible. Use the pseudoinverse (Moore–Penrose) instead, or orthogonalize first.

Frequently asked questions

Why is the projection formula proj_v(u) = (u·v / v·v) v?

Write the projection as some scalar c times v. The defining property of an orthogonal projection is that u − cv is perpendicular to v, so (u − cv)·v = 0. Solving gives c = (u·v) / (v·v). Multiplying back by v gives the formula. The numerator captures alignment; dividing by v·v rescales for v's length.

What is the difference between projecting onto a line and onto a plane?

Projection onto a line uses a single basis vector v: proj_v(u) = (u·v / v·v) v. Projection onto a plane (or higher subspace) sums one such term per orthogonal basis vector. If {v₁, v₂} is an orthogonal basis for the plane, proj(u) = (u·v₁/v₁·v₁) v₁ + (u·v₂/v₂·v₂) v₂. With an orthonormal basis, the denominators all become 1 and the formula simplifies.

How does projection power least-squares regression?

Fitting a linear model means finding the closest point in the column space of A to the data vector b. That closest point is the orthogonal projection of b onto col(A). The least-squares solution x̂ satisfies the normal equations AᵀAx̂ = Aᵀb, which is just the projection condition rewritten. Every regression coefficient you have ever computed is a projection coordinate.

What is a projection matrix?

A square matrix P that satisfies P² = P (idempotent) and P = Pᵀ (symmetric). For projection onto the column space of A, P = A(AᵀA)⁻¹Aᵀ. Applying P to any vector projects it; applying P twice gives the same answer as once, because once you are inside the subspace you stay there. The trace of P equals the dimension of the subspace.

Is the projection unique?

Yes. For any subspace W, every vector u in the ambient space decomposes uniquely as u = w + w⊥, where w lies in W and w⊥ lies in W's orthogonal complement. The projection w is the unique closest point in W to u — closer than any other element of W in Euclidean distance. This is the orthogonal-decomposition theorem, sometimes called the projection theorem.

Why must the basis be orthogonal in the projection formula?

If basis vectors are not orthogonal, the simple sum-of-projections formula double-counts directions. To project onto a subspace using a non-orthogonal basis, you have to invert AᵀA — a small linear system. Orthogonalizing the basis first (via Gram-Schmidt) avoids the inversion: each basis vector contributes independently, and projection is a single sum.