Linear Algebra
Singular Value Decomposition
Every matrix is a rotation, a stretch along axes, then another rotation
The singular value decomposition writes any matrix A as A = U·Σ·VT — an orthogonal rotation, a non-negative diagonal stretch, and another orthogonal rotation. It is the most informative factorization in linear algebra and underwrites image compression, least squares, PCA, and pseudoinverses.
- FactorizationA = U · Σ · VT
- U, VOrthogonal — UTU = I, VTV = I
- ΣDiagonal with σ₁ ≥ σ₂ ≥ ⋯ ≥ 0
- Exists forEvery real or complex matrix, any shape
- Cost (m×n, m ≥ n)≈ 4mn² + 8n³/3
- Used inPCA, image compression, pseudoinverse, recommender systems, denoising
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The geometric meaning
Apply a matrix A to the unit sphere in n-space. The image is an ellipsoid in m-space. SVD names the pieces of that picture:
- The columns of V are the input directions that get mapped to the principal axes of the output ellipsoid.
- The singular values σ1, σ2, … on Σ's diagonal are the lengths of those axes.
- The columns of U are the unit vectors that point along the output axes.
So multiplying by A is exactly: rotate the input via VT, stretch each new coordinate by σi, rotate to the output frame via U. Three understandable steps, every time, for every matrix.
The formal statement
For any m×n real matrix A there exist orthogonal U (m×m), V (n×n), and a diagonal Σ (m×n) with non-negative diagonal entries such that:
A = U · Σ · Vᵀ
The σi are unique (sorted in decreasing order) and equal to the square roots of the eigenvalues of ATA. The columns of V are the corresponding eigenvectors of ATA; the columns of U are eigenvectors of A·AT. The number of nonzero σi equals the rank of A.
Worked example — SVD of a 2×3 matrix
Compute the SVD of A = [[3, 1, 1], [−1, 3, 1]].
Step 1. Form A·AT:
A · Aᵀ = [ 3·3+1·1+1·1 3·(−1)+1·3+1·1 ] = [ 11 1 ]
[ sym 1+9+1 ] [ 1 11 ]
Step 2. Eigenvalues of A·AT: solve det(A·AT − μI) = 0:
(11 − μ)² − 1 = 0 → μ = 12 or μ = 10.
Singular values: σ1 = √12 = 2√3, σ2 = √10.
Step 3. Eigenvectors of A·AT give U. For μ = 12: solve [[−1, 1], [1, −1]]·u = 0 to get u₁ = [1, 1]T/√2. For μ = 10: u₂ = [1, −1]T/√2. So
U = [ 1/√2 1/√2 ]
[ 1/√2 −1/√2 ]
Step 4. The first two right singular vectors come from vi = ATui/σi:
v₁ = Aᵀ·u₁ / σ₁ = (1/√2)·[3−1, 1+3, 1+1]ᵀ / (2√3)
= [2, 4, 2]ᵀ / (2√6) = [1, 2, 1]ᵀ / √6.
v₂ = Aᵀ·u₂ / σ₂ = (1/√2)·[3+1, 1−3, 1−1]ᵀ / √10
= [4, −2, 0]ᵀ / (2√5) = [2, −1, 0]ᵀ / √5.
Step 5. Because A is 2×3, V is 3×3 and we need a third right singular vector v3 orthogonal to v1 and v2. Up to sign, v3 = [1, 2, −5]T/√30. Its singular value is zero — the vector lies in the null space of A.
Result.
U = [ 1/√2 1/√2 ]
[ 1/√2 −1/√2 ]
Σ = [ 2√3 0 0 ]
[ 0 √10 0 ]
Vᵀ = [ 1/√6 2/√6 1/√6 ]
[ 2/√5 −1/√5 0 ]
[ 1/√30 2/√30 −5/√30]
Verify by multiplying U · Σ · VT — you recover A = [[3, 1, 1], [−1, 3, 1]].
Low-rank approximation — the Eckart–Young theorem
Truncating the SVD after k terms gives the best possible rank-k approximation of A:
Aₖ = σ₁·u₁·v₁ᵀ + σ₂·u₂·v₂ᵀ + ⋯ + σₖ·uₖ·vₖᵀ
Among all rank-k matrices, Ak minimizes ‖A − Ak‖ in both the spectral norm (largest discarded singular value) and the Frobenius norm (square root of sum of discarded squared singular values).
That single fact is responsible for SVD's appearance everywhere data-compression matters: image compression by truncation, denoising by zeroing tiny singular values, latent-semantic indexing in search, and matrix factorization in collaborative-filtering recommenders.
SVD vs eigendecomposition vs polar decomposition
| SVD | Eigendecomposition | Polar decomposition | |
|---|---|---|---|
| Form | A = U·Σ·VT | A = P·D·P−1 | A = U·H |
| Required structure | None — any m×n matrix | Square, diagonalizable | Square |
| Diagonal/factor signs | σi ≥ 0 | λi can be negative or complex | H is positive semidefinite, U is orthogonal |
| Bases | Two orthogonal bases (U, V) | One non-orthogonal basis (P) | One orthogonal U + one symmetric H |
| Reveals rank | Yes — count of nonzero σi | Indirectly — through zero eigenvalues | Indirectly — through H's null space |
| Numerical robustness | Excellent | Sensitive to defective matrices | Good — built from SVD in practice |
| Best low-rank approximation | Yes — Eckart–Young | Only for symmetric matrices | Not directly |
| Typical use | Compression, PCA, pseudoinverse | Diagonalize a fixed transform | Continuum mechanics, robotics (closest rotation) |
Where SVD shows up
- Image compression. Treat a grayscale image as a matrix and keep the top k singular components. A 512×512 image kept at k = 50 stores only 50·(512+512+1) = 51,250 numbers (vs 262,144) at very acceptable visual quality.
- Pseudoinverse and least squares. A+ = V·Σ+·UT. Min-norm least-squares solution x* = A+b works for any A — square, rectangular, rank-deficient — without forming normal equations.
- Principal component analysis. The principal axes of a centered data matrix X are the right singular vectors of X; the variances along them are σi2/(n−1).
- Recommender systems. Truncated SVD of a sparse user×item rating matrix factors users and items into a small shared latent space — the heart of model-based collaborative filtering.
- Denoising and total least squares. Zero out singular values below a noise threshold to recover the underlying low-rank signal.
- Numerical conditioning. κ2(A) = σmax/σmin. SVD reveals ill-conditioning that other factorizations only hint at.
- Procrustes problems. The closest orthogonal matrix to A is U·VT from its SVD — used for shape alignment, point-cloud registration, and orthogonalizing rotation matrices in graphics.
Common mistakes
- Treating singular values like signed eigenvalues. They are always non-negative. The sign-bearing structure of A lives inside U and V, not Σ.
- Confusing the columns of V with the rows of VT. The right singular vectors are the columns of V, equivalently the rows of VT. Mixing them up flips signs and breaks reconstructions.
- Using the eigendecomposition of ATA to compute SVD. Forming ATA squares the condition number and erases information about small singular values. Use bidiagonalization-based SVD routines, not "form-and-solve".
- Asking for full U and V when you only need the top k. Truncated and randomized SVD compute just what you use, often 100× faster than the full decomposition.
- Truncating below a fixed σ threshold for an unscaled matrix. Singular values inherit the scale of the matrix; pick the threshold relative to σ1 (e.g. tol = ε·σ1·max(m, n)) instead of a fixed number.
- Forgetting to center data before PCA-via-SVD. SVD on uncentered data factors the data, not its variance. Subtract column means first.
Frequently asked questions
Does every matrix have an SVD?
Yes — every real or complex matrix, of any shape, of any rank, has a singular value decomposition. That universality is what makes SVD the Swiss Army knife of linear algebra. Eigendecomposition, by contrast, only exists when A is square and diagonalizable; SVD has no such restriction.
Why are singular values non-negative?
By construction. The singular values σ_i are the square roots of eigenvalues of A^T·A, and A^T·A is positive semidefinite, so its eigenvalues are non-negative real numbers. The conventional ordering puts them on Σ's diagonal in decreasing order: σ_1 ≥ σ_2 ≥ ⋯ ≥ σ_r ≥ 0.
What is the difference between SVD and eigendecomposition?
Eigendecomposition writes A = P·D·P^(-1) using a single basis P that diagonalizes A; the eigenvalues on D can be negative or complex, and P is generally not orthogonal. SVD uses two orthogonal bases (U and V) and a diagonal of non-negative singular values; it works for any matrix, including non-square and rank-deficient ones.
Why is SVD called the best low-rank approximation?
The Eckart–Young theorem says that truncating the SVD to its top k singular values and corresponding columns of U and V gives the best rank-k approximation of A in both the spectral and Frobenius norms. No other rank-k matrix gets closer to A. This is why SVD is the backbone of compression, denoising, and dimensionality reduction.
How does the pseudoinverse come from SVD?
The Moore–Penrose pseudoinverse is A⁺ = V·Σ⁺·U^T where Σ⁺ inverts each non-zero singular value and leaves the zeros (or near-zeros) alone. It produces the minimum-norm least-squares solution to A x = b for any A — square, rectangular, full-rank, or rank-deficient.
How expensive is SVD?
Full SVD on an m×n matrix costs about 4mn² + 8n³/3 floating-point operations using the standard Golub–Reinsch algorithm. That is more expensive than QR or LU but yields far more information. For tall thin matrices, the thin SVD costs roughly 2mn² + 11n³, and randomized SVD can compute the top-k components in O(mn log k) for very large matrices.