Linear Algebra

Singular Value Decomposition

Every matrix is a rotation, a stretch along axes, then another rotation

The singular value decomposition writes any matrix A as A = U·Σ·V^T — an orthogonal rotation, a non-negative diagonal stretch, and another orthogonal rotation. It is the most informative factorization in linear algebra and underwrites image compression, least squares, PCA, and pseudoinverses.

FactorizationA = U · Σ · V^T
U, VOrthogonal — U^TU = I, V^TV = I
ΣDiagonal with σ₁ ≥ σ₂ ≥ ⋯ ≥ 0
Exists forEvery real or complex matrix, any shape
Cost (m×n, m ≥ n)≈ 4mn² + 8n³/3
Used inPCA, image compression, pseudoinverse, recommender systems, denoising

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The geometric meaning

Apply a matrix A to the unit sphere in n-space. The image is an ellipsoid in m-space. SVD names the pieces of that picture:

The columns of V are the input directions that get mapped to the principal axes of the output ellipsoid.
The singular values σ₁, σ₂, … on Σ's diagonal are the lengths of those axes.
The columns of U are the unit vectors that point along the output axes.

So multiplying by A is exactly: rotate the input via V^T, stretch each new coordinate by σ_i, rotate to the output frame via U. Three understandable steps, every time, for every matrix.

The formal statement

For any m×n real matrix A there exist orthogonal U (m×m), V (n×n), and a diagonal Σ (m×n) with non-negative diagonal entries such that:

A = U · Σ · Vᵀ

The σ_i are unique (sorted in decreasing order) and equal to the square roots of the eigenvalues of A^TA. The columns of V are the corresponding eigenvectors of A^TA; the columns of U are eigenvectors of A·A^T. The number of nonzero σ_i equals the rank of A.

Worked example — SVD of a 2×3 matrix

Compute the SVD of A = [[3, 1, 1], [−1, 3, 1]].

Step 1. Form A·A^T:

A · Aᵀ = [ 3·3+1·1+1·1   3·(−1)+1·3+1·1 ]   = [ 11   1 ]
         [    sym             1+9+1     ]     [  1  11 ]

Step 2. Eigenvalues of A·A^T: solve det(A·A^T − μI) = 0:

(11 − μ)² − 1 = 0  →  μ = 12 or μ = 10.

Singular values: σ₁ = √12 = 2√3, σ₂ = √10.

Step 3. Eigenvectors of A·A^T give U. For μ = 12: solve [[−1, 1], [1, −1]]·u = 0 to get u₁ = [1, 1]^T/√2. For μ = 10: u₂ = [1, −1]^T/√2. So

U = [ 1/√2   1/√2 ]
    [ 1/√2  −1/√2 ]

Step 4. The first two right singular vectors come from v_i = A^Tu_i/σ_i:

v₁ = Aᵀ·u₁ / σ₁ = (1/√2)·[3−1, 1+3, 1+1]ᵀ / (2√3)
   = [2, 4, 2]ᵀ / (2√6) = [1, 2, 1]ᵀ / √6.

v₂ = Aᵀ·u₂ / σ₂ = (1/√2)·[3+1, 1−3, 1−1]ᵀ / √10
   = [4, −2, 0]ᵀ / (2√5) = [2, −1, 0]ᵀ / √5.

Step 5. Because A is 2×3, V is 3×3 and we need a third right singular vector v₃ orthogonal to v₁ and v₂. Up to sign, v₃ = [1, 2, −5]^T/√30. Its singular value is zero — the vector lies in the null space of A.

Result.

U = [ 1/√2   1/√2 ]
    [ 1/√2  −1/√2 ]

Σ = [ 2√3   0    0 ]
    [  0   √10   0 ]

Vᵀ = [ 1/√6   2/√6   1/√6 ]
     [ 2/√5  −1/√5    0   ]
     [ 1/√30  2/√30 −5/√30]

Verify by multiplying U · Σ · V^T — you recover A = [[3, 1, 1], [−1, 3, 1]].

Low-rank approximation — the Eckart–Young theorem

Truncating the SVD after k terms gives the best possible rank-k approximation of A:

Aₖ = σ₁·u₁·v₁ᵀ + σ₂·u₂·v₂ᵀ + ⋯ + σₖ·uₖ·vₖᵀ

Among all rank-k matrices, A_k minimizes ‖A − A_k‖ in both the spectral norm (largest discarded singular value) and the Frobenius norm (square root of sum of discarded squared singular values).

That single fact is responsible for SVD's appearance everywhere data-compression matters: image compression by truncation, denoising by zeroing tiny singular values, latent-semantic indexing in search, and matrix factorization in collaborative-filtering recommenders.

SVD vs eigendecomposition vs polar decomposition

	SVD	Eigendecomposition	Polar decomposition
Form	A = U·Σ·V^T	A = P·D·P⁻¹	A = U·H
Required structure	None — any m×n matrix	Square, diagonalizable	Square
Diagonal/factor signs	σ_i ≥ 0	λ_i can be negative or complex	H is positive semidefinite, U is orthogonal
Bases	Two orthogonal bases (U, V)	One non-orthogonal basis (P)	One orthogonal U + one symmetric H
Reveals rank	Yes — count of nonzero σ_i	Indirectly — through zero eigenvalues	Indirectly — through H's null space
Numerical robustness	Excellent	Sensitive to defective matrices	Good — built from SVD in practice
Best low-rank approximation	Yes — Eckart–Young	Only for symmetric matrices	Not directly
Typical use	Compression, PCA, pseudoinverse	Diagonalize a fixed transform	Continuum mechanics, robotics (closest rotation)

Where SVD shows up

Image compression. Treat a grayscale image as a matrix and keep the top k singular components. A 512×512 image kept at k = 50 stores only 50·(512+512+1) = 51,250 numbers (vs 262,144) at very acceptable visual quality.
Pseudoinverse and least squares. A⁺ = V·Σ⁺·U^T. Min-norm least-squares solution x* = A⁺b works for any A — square, rectangular, rank-deficient — without forming normal equations.
Principal component analysis. The principal axes of a centered data matrix X are the right singular vectors of X; the variances along them are σ_i²/(n−1).
Recommender systems. Truncated SVD of a sparse user×item rating matrix factors users and items into a small shared latent space — the heart of model-based collaborative filtering.
Denoising and total least squares. Zero out singular values below a noise threshold to recover the underlying low-rank signal.
Numerical conditioning. κ₂(A) = σ_max/σ_min. SVD reveals ill-conditioning that other factorizations only hint at.
Procrustes problems. The closest orthogonal matrix to A is U·V^T from its SVD — used for shape alignment, point-cloud registration, and orthogonalizing rotation matrices in graphics.

Common mistakes

Treating singular values like signed eigenvalues. They are always non-negative. The sign-bearing structure of A lives inside U and V, not Σ.
Confusing the columns of V with the rows of V^T. The right singular vectors are the columns of V, equivalently the rows of V^T. Mixing them up flips signs and breaks reconstructions.
Using the eigendecomposition of A^TA to compute SVD. Forming A^TA squares the condition number and erases information about small singular values. Use bidiagonalization-based SVD routines, not "form-and-solve".
Asking for full U and V when you only need the top k. Truncated and randomized SVD compute just what you use, often 100× faster than the full decomposition.
Truncating below a fixed σ threshold for an unscaled matrix. Singular values inherit the scale of the matrix; pick the threshold relative to σ₁ (e.g. tol = ε·σ₁·max(m, n)) instead of a fixed number.
Forgetting to center data before PCA-via-SVD. SVD on uncentered data factors the data, not its variance. Subtract column means first.

Frequently asked questions

Does every matrix have an SVD?

Yes — every real or complex matrix, of any shape, of any rank, has a singular value decomposition. That universality is what makes SVD the Swiss Army knife of linear algebra. Eigendecomposition, by contrast, only exists when A is square and diagonalizable; SVD has no such restriction.

Why are singular values non-negative?

By construction. The singular values σ_i are the square roots of eigenvalues of A^T·A, and A^T·A is positive semidefinite, so its eigenvalues are non-negative real numbers. The conventional ordering puts them on Σ's diagonal in decreasing order: σ_1 ≥ σ_2 ≥ ⋯ ≥ σ_r ≥ 0.

What is the difference between SVD and eigendecomposition?

Eigendecomposition writes A = P·D·P^(-1) using a single basis P that diagonalizes A; the eigenvalues on D can be negative or complex, and P is generally not orthogonal. SVD uses two orthogonal bases (U and V) and a diagonal of non-negative singular values; it works for any matrix, including non-square and rank-deficient ones.

Why is SVD called the best low-rank approximation?

The Eckart–Young theorem says that truncating the SVD to its top k singular values and corresponding columns of U and V gives the best rank-k approximation of A in both the spectral and Frobenius norms. No other rank-k matrix gets closer to A. This is why SVD is the backbone of compression, denoising, and dimensionality reduction.

How does the pseudoinverse come from SVD?

The Moore–Penrose pseudoinverse is A⁺ = V·Σ⁺·U^T where Σ⁺ inverts each non-zero singular value and leaves the zeros (or near-zeros) alone. It produces the minimum-norm least-squares solution to A x = b for any A — square, rectangular, full-rank, or rank-deficient.

How expensive is SVD?

Full SVD on an m×n matrix costs about 4mn² + 8n³/3 floating-point operations using the standard Golub–Reinsch algorithm. That is more expensive than QR or LU but yields far more information. For tall thin matrices, the thin SVD costs roughly 2mn² + 11n³, and randomized SVD can compute the top-k components in O(mn log k) for very large matrices.