Linear Algebra

Gram-Schmidt Process

Turn any tilted basis into a perpendicular one, one projection at a time

The Gram-Schmidt process turns any linearly independent set of vectors into an orthonormal basis spanning the same subspace, by repeatedly projecting away each new vector's component along the directions already chosen. It is the constructive engine behind QR decomposition, least-squares regression, and the entire family of orthogonal polynomials. Named for Jørgen Gram (1883) and Erhard Schmidt (1907), the algorithm comes in three practical variants — classical, modified, and Householder — that differ in their numerical stability rather than their final answer.

Inputk linearly independent vectors
Outputk orthonormal vectors with the same span
Core operationSubtract projections, then normalize
ComputesQR decomposition of input matrix
Stable variantModified Gram-Schmidt or Householder
CostO(mk²) flops for m×k input

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How Gram-Schmidt works

Start with a set of linearly independent vectors {v₁, v₂, …, vₖ}. The goal is to produce {q₁, q₂, …, qₖ} where every pair is perpendicular and every vector has unit length, while preserving the same span at each stage (span{v₁, …, vᵢ} = span{q₁, …, qᵢ}).

The recipe is iterative. At each step, take the next input vector and remove all of its components along the already-chosen orthonormal directions:

Set u₁ = v₁ and q₁ = u₁ / |u₁|.
For each subsequent i = 2, 3, …, k:
- Compute the projection-stripped vector: uᵢ = vᵢ − Σ_{j<i} (vᵢ · qⱼ) qⱼ.
- Normalize: qᵢ = uᵢ / |uᵢ|.

The subtraction step is the heart of the algorithm. It says: take vᵢ, subtract whatever portion of it is already explained by q₁, …, qᵢ₋₁, and what remains must be orthogonal to all of them. That remainder, normalized, is qᵢ.

By construction, qᵢ is perpendicular to every qⱼ with j < i, because the projections onto those directions have been subtracted out. The dot product (qᵢ · qⱼ) is exactly the kind of quantity Gram-Schmidt was designed to make zero.

Worked example: orthonormalizing three vectors in ℝ³

Take v₁ = (1, 1, 0), v₂ = (1, 0, 1), v₃ = (0, 1, 1). These are linearly independent.

Step 1. u₁ = v₁ = (1, 1, 0). |u₁| = √2. So q₁ = (1/√2, 1/√2, 0).

Step 2. Compute v₂·q₁ = 1·(1/√2) + 0·(1/√2) + 1·0 = 1/√2.

u₂ = v₂ − (v₂·q₁) q₁
   = (1, 0, 1) − (1/√2)(1/√2, 1/√2, 0)
   = (1, 0, 1) − (1/2, 1/2, 0)
   = (1/2, −1/2, 1)

|u₂| = √(1/4 + 1/4 + 1) = √(3/2). Normalize: q₂ = (1/√6, −1/√6, 2/√6) after rationalizing.

Step 3. Compute v₃·q₁ = 0·(1/√2) + 1·(1/√2) + 1·0 = 1/√2.

v₃·q₂ = 0·(1/√6) + 1·(−1/√6) + 1·(2/√6) = 1/√6.

u₃ = v₃ − (v₃·q₁) q₁ − (v₃·q₂) q₂
   = (0, 1, 1) − (1/2, 1/2, 0) − (1/6, −1/6, 2/6)
   = (−1/2 − 1/6, 1/2 + 1/6, 1 − 1/3)
   = (−2/3, 2/3, 2/3)

|u₃| = √(4/9 + 4/9 + 4/9) = 2/√3. Normalize: q₃ = (−1/√3, 1/√3, 1/√3).

Verify orthogonality: q₁·q₂ = (1/√2)(1/√6) + (1/√2)(−1/√6) + 0 = 0. ✓

q₁·q₃ = (1/√2)(−1/√3) + (1/√2)(1/√3) + 0 = 0. ✓

q₂·q₃ = (1/√6)(−1/√3) + (−1/√6)(1/√3) + (2/√6)(1/√3) = (−1 − 1 + 2)/√18 = 0. ✓

And each |qᵢ| = 1 by construction. The set {q₁, q₂, q₃} is an orthonormal basis for ℝ³ spanning the same space as the original three.

Classical vs modified Gram-Schmidt vs Householder

	Classical Gram-Schmidt	Modified Gram-Schmidt	Householder reflections
How it computes uᵢ	Subtract all projections at once: uᵢ = vᵢ − Σⱼ (vᵢ·qⱼ) qⱼ	Subtract one projection at a time, updating the residual after each	Reflects vector across a hyperplane to zero out below-diagonal entries
Output	Orthonormal basis Q	Orthonormal basis Q (same Q in exact arithmetic)	Orthonormal Q via product of reflections
Numerical stability	Poor when vectors are nearly dependent	Better — error doesn't accumulate as badly	Best — backward stable to machine precision
Cost	~2mk² flops	~2mk² flops	~2mk² − (2/3)k³ flops
Easy to code	Yes — one nested sum	Yes — minor rearrangement	Trickier — needs reflection vectors
Used in production	Rarely (textbook only)	Sometimes for sparse data	LAPACK's geqrf default
Best for	Teaching the idea	Iterative methods needing intermediate q's	Robust dense QR factorization

The three variants compute the same Q in exact arithmetic but differ in floating-point. Classical Gram-Schmidt loses orthogonality fast on ill-conditioned input; Modified Gram-Schmidt loses it slowly; Householder is essentially exact. Production numerical libraries (LAPACK, NumPy's linalg.qr) default to Householder.

Connection to QR decomposition

Gram-Schmidt isn't just an orthogonalization trick — it is a constructive proof of the QR decomposition. Stack the input vectors as columns of a matrix A. Stack the orthonormal output vectors as columns of Q. The dot products vᵢ·qⱼ that Gram-Schmidt computes are exactly the entries of an upper-triangular matrix R such that:

A = Q R

where Q has orthonormal columns and R is upper triangular with positive diagonal. The factorization is unique (up to sign of diagonal) for full-column-rank A. Every linear-algebra package implements QR — most use Householder for stability, but the underlying decomposition was discovered through Gram-Schmidt.

QR is the workhorse for least-squares regression (Ax ≈ b becomes Rx = Qᵀb, easy to back-solve), eigenvalue iteration (the QR algorithm computes eigenvalues by repeated QR factorizations), and orthogonal-basis problems generally.

Where Gram-Schmidt shows up

QR decomposition. Direct construction. The Q and R matrices fall out of the Gram-Schmidt steps with no extra work.
Least-squares regression. Solving min |Ax − b|² uses QR: x̂ = R⁻¹Qᵀb. Faster and more stable than the normal equations.
Orthogonal polynomials. Apply Gram-Schmidt to {1, x, x², …} under different inner products and you generate Legendre, Hermite, Laguerre, or Chebyshev polynomials. Each family solves a different physical or numerical problem (Legendre for spherical harmonics, Hermite for the quantum harmonic oscillator, and so on).
Eigenvalue algorithms. The QR algorithm — basis of every eigenvalue solver since 1961 — repeatedly QR-factors a matrix until it converges to an upper-triangular form whose diagonal contains the eigenvalues.
Krylov subspace methods. Iterative solvers for huge linear systems (GMRES, Arnoldi, Lanczos) build an orthonormal basis of a growing subspace via Gram-Schmidt at each step.
Lattice basis reduction. The LLL algorithm, central to cryptanalysis and integer programming, uses a discrete cousin of Gram-Schmidt to find short, nearly-orthogonal lattice vectors.
Signal processing. Adaptive filters and beamforming arrays continuously orthogonalize received signals to suppress interference, a streaming Gram-Schmidt pass.

Common mistakes

Using classical Gram-Schmidt on ill-conditioned data. The output Q can be far from orthogonal — sometimes off by orders of magnitude — when input vectors are nearly parallel. Use modified Gram-Schmidt or Householder for numerical work.
Forgetting to normalize. Subtracting projections gives an orthogonal but not orthonormal set. Skipping the |uᵢ| division leaves vectors of arbitrary length.
Confusing orthogonal and orthonormal. Gram-Schmidt outputs orthonormal (perpendicular and unit length) by default. Some textbooks describe an orthogonal-only variant that skips the normalization. Check which one your reference uses.
Applying it to dependent vectors. If vᵢ is in span{v₁, …, vᵢ₋₁}, the projection-removal yields the zero vector and normalization fails. Detect zero residuals and skip.
Re-projecting against the original vectors. The formula uses qⱼ (already-normalized output), not vⱼ (raw input). Mixing these up gives wrong answers.
Storing only Q. The R matrix from QR is also valuable — it contains the projection coefficients you computed and is needed to reconstruct A or solve linear systems. Don't throw it away.

Frequently asked questions

What does Gram-Schmidt actually do?

It turns a set of linearly independent vectors {v₁, v₂, …, vₖ} into a set {q₁, q₂, …, qₖ} of orthonormal vectors that span the same subspace. The process is constructive: take v₁ and normalize it. For each subsequent vᵢ, subtract its projection onto every previously chosen direction, then normalize. The result is a basis where every pair is perpendicular and every vector has unit length.

Why is classical Gram-Schmidt numerically unstable?

Classical Gram-Schmidt computes all projections of vᵢ against the original v₁, …, vᵢ₋₁ at once, then subtracts them in one shot. Floating-point round-off in those projections accumulates and the resulting q's lose orthogonality, especially when the input vectors are nearly linearly dependent. Modified Gram-Schmidt subtracts projections one at a time using the most recent partial result, which damps error propagation; Householder reflections avoid the issue altogether.

What is the connection to QR decomposition?

Gram-Schmidt is the constructive proof of QR decomposition. Stacking the q's as columns of Q, and recording the projection coefficients in an upper-triangular R, gives A = QR, where Q is orthogonal and R is upper triangular. Every step of Gram-Schmidt corresponds to filling in one column of Q and one column of R.

Does Gram-Schmidt work in any inner-product space?

Yes. The process needs only a way to compute inner products and norms — it never assumes the vectors are in ℝⁿ. Applied to polynomials with the inner product ⟨f,g⟩ = ∫ f(x)g(x) dx on [−1, 1], starting from {1, x, x², …}, it produces the Legendre polynomials. Different weight functions yield Hermite, Laguerre, or Chebyshev polynomials. Every classical orthogonal polynomial family is just Gram-Schmidt with a different inner product.

What if the input vectors are linearly dependent?

When some vᵢ already lies in the span of the earlier ones, the projection-removal step yields the zero vector. You cannot normalize zero, so the algorithm signals dependence and skips that vector. The output is an orthonormal basis for the actual span, which has dimension less than the number of input vectors.

Who invented the Gram-Schmidt process?

Jørgen Pedersen Gram described the technique for least-squares fitting in 1883, and Erhard Schmidt formalized it for Hilbert spaces in 1907 — but Laplace had used essentially the same construction in his 1820 work on probability. The label Gram-Schmidt is twentieth-century convention; the underlying idea is older than either name.