Linear Algebra
LU Decomposition
Factor a matrix once, solve many systems cheaply
LU decomposition writes a square matrix as A = L·U, the product of a lower-triangular L and an upper-triangular U. Once factored, every subsequent solve of Ax = b costs just O(n²) — making LU the workhorse for repeated linear systems.
- FactorizationP · A = L · U
- LUnit-lower-triangular (Doolittle convention)
- UUpper-triangular
- Factoring cost≈ n³/3
- Solve cost per b≈ n²
- Used inRepeated linear solves, determinants, matrix inverse, iterative refinement
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
What LU is doing
Gaussian elimination already turns A into U. The new idea in LU is to also remember the multipliers — the mik = aik / akk values you used to zero each entry. Stored in a unit-lower-triangular matrix L, those multipliers exactly reconstruct A:
A = L · U
where Lᵢₖ = mᵢₖ for i > k and Lᵢᵢ = 1.
Once you have L and U, you have a perfect record of how to apply A's transformation as two cheap triangular solves. Factor once, use forever.
Solving with L and U
To solve Ax = b, write Ax = LUx = b and split it into two triangular systems:
- Forward solve Ly = b. Because L is unit-lower-triangular, y₁ = b₁, then y₂ = b₂ − L₂₁·y₁, then y₃ = b₃ − L₃₁·y₁ − L₃₂·y₂, and so on top-down. Cost: ≈ n²/2.
- Back solve Ux = y. Read xn from the bottom row, substitute upward. Cost: ≈ n²/2.
Pictured as a flow:
b → [forward solve with L] → y → [back solve with U] → x
Each triangular solve touches every entry once, no more.
Worked example — PA = LU for a 3×3
Factor A = [[2, 1, 1], [4, 3, 3], [8, 7, 9]] with partial pivoting.
Step 1. The largest |entry| in column 1 is 8 in row 3. Swap row 1 with row 3. Update P to record the swap.
[ 8 7 9 ]
[ 4 3 3 ]
[ 2 1 1 ]
Multipliers — m₂₁ = 4/8 = 1/2, m₃₁ = 2/8 = 1/4. Subtract:
Row 2 ← Row 2 − (1/2)·Row 1 → [ 0 −1/2 −3/2 ]
Row 3 ← Row 3 − (1/4)·Row 1 → [ 0 −3/4 −5/4 ]
State after column 1:
[ 8 7 9 ]
[ 0 −1/2 −3/2 ]
[ 0 −3/4 −5/4 ]
Step 2. Largest |entry| in remaining column 2 is 3/4 in row 3. Swap row 2 and row 3. Update P. Also swap the already-stored multipliers in the L column for column 1 — they belong with their original rows.
[ 8 7 9 ]
[ 0 −3/4 −5/4 ]
[ 0 −1/2 −3/2 ]
Multiplier m₃₂ = (−1/2)/(−3/4) = 2/3. Row 3 ← Row 3 − (2/3)·Row 2:
U = [ 8 7 9 ]
[ 0 −3/4 −5/4 ]
[ 0 0 −2/3 ]
Final L (with the swaps applied) and P:
L = [ 1 0 0 ]
[ 1/4 1 0 ]
[ 1/2 2/3 1 ]
P swaps row 1 ↔ row 3, then row 2 ↔ row 3 of A.
Verify by computing L·U — the product equals PA. To solve Ax = b for any b, permute b by P, forward-solve with L, back-solve with U.
LU vs LDLT vs Cholesky vs full pivot LU
| LU (with partial pivot) | LDLT | Cholesky (LLT) | LU with full pivoting | |
|---|---|---|---|---|
| Required structure of A | Square, nonsingular | Symmetric (any signature) | Symmetric positive definite | Square, nonsingular |
| Factor form | P·A = L·U | A = L·D·LT | A = L·LT | P·A·Q = L·U |
| Pivoting | Row swaps | Bunch–Kaufman block pivots | None needed | Row and column swaps |
| Factor cost | n³/3 | n³/6 | n³/6 | n³/3 + O(n³) compares |
| Memory | n² | n²/2 | n²/2 | n² |
| Square roots needed | No | No | Yes (one per row) | No |
| Stability | Backward-stable in practice | Stable with B–K pivots | Always stable | Provably stable |
| Typical use | General dense systems | Symmetric indefinite (KKT) | Covariance, normal equations, FEM | Worst-case-stable solvers |
Determinants, inverses, and reuse
Because P is a permutation, det(P) = ±1, and triangular matrices have determinant equal to the product of their diagonals:
det(A) = (sign of P) · ∏ᵢ Uᵢᵢ
For a matrix inverse, solve LU·X = P column-by-column against the identity matrix. That is n triangular solves of cost n² each, so O(n³) total — same order as the factorization itself but with a much smaller constant than redoing elimination n times.
For iterative refinement: solve Ax = b once with L and U, compute the residual r = b − Ax in higher precision, solve Ad = r the same way, update x ← x + d. Two or three iterations recover full precision even when the LU factors were computed in single precision — the trick behind modern mixed-precision solvers on GPUs.
Where LU shows up
- Repeated linear solves. Time-stepping a stiff ODE solves (M − ΔtJ)x = b every step with the same matrix; LU lets each step be O(n²) instead of O(n³).
- Newton's method on systems. Each iteration solves J(x)Δx = −F(x). Reusing an old LU factor for several steps is a standard quasi-Newton trick.
- Steady-state circuit analysis. SPICE-style simulators factor the modified-nodal-analysis matrix and reuse it across DC sweeps and small-signal AC analyses.
- Optimization KKT systems. Interior-point methods solve symmetric indefinite KKT systems each iteration; LDLT with Bunch–Kaufman is the bread-and-butter solver.
- Computing det(A). Direct cofactor expansion is O(n!); LU gives the determinant in O(n³).
- Sparse direct solvers. Libraries like UMFPACK, MUMPS, and SuperLU compute sparse LU with elaborate fill-in-reducing reorderings; the underlying math is still PA = LU.
Common mistakes
- Believing every matrix has an LU factorization. Without pivoting, even a simple matrix like [[0, 1], [1, 0]] has none. With partial pivoting, every nonsingular matrix has PA = LU.
- Forgetting to permute b. If you stored P as a row order, you must reorder b the same way before the forward solve. Many "wrong answer" bugs are missing permutations.
- Treating LU as unique without a convention. L and U are defined only up to a diagonal scaling. Pick Doolittle (unit L) or Crout (unit U) and stick with it; library outputs follow Doolittle by default.
- Ignoring symmetry. Running LU on a symmetric positive definite matrix wastes half the work and half the memory. Check for symmetry up front and call Cholesky when applicable.
- Trying to recover A from L and U after in-place factorization. Most libraries overwrite A with L (below the diagonal) and U (on and above). The original matrix is gone — keep a copy if you need it.
- Confusing the sign of det(A). det(A) = (sign of P) · ∏Uii. The sign of P is (−1)k where k is the number of row swaps; forgetting the sign flips your determinant.
Frequently asked questions
Why factor A as L·U instead of just running Gaussian elimination?
Because the factors are reusable. Factoring costs O(n³) once, but every subsequent right-hand side b only costs O(n²) (one forward solve, one back solve). If you need to solve Ax = b for ten different b vectors, LU is roughly ten times faster than redoing elimination from scratch.
What does the P stand for in PA = LU?
P is the permutation matrix that records the row swaps from partial pivoting. Without pivoting, many matrices fail to admit an LU factorization (a zero pivot kills the algorithm). With pivoting, every nonsingular matrix has a factorization PA = LU, where P is the recorded permutation. In code, P is usually stored as an integer array of indices, not as a matrix.
Is LU unique?
Without a normalization, no — you can rescale rows of L and the inverse rescaling on U. The Doolittle convention pins L to have unit diagonal (L_ii = 1); the Crout convention pins U to have unit diagonal. Either choice makes the factorization unique once the row order (the P) is fixed.
How do I solve Ax = b using L and U?
Two triangular solves. First apply the permutation: c = Pb. Then solve Ly = c by forward substitution (top to bottom). Then solve Ux = y by back substitution (bottom to top). Each triangular solve is O(n²). The two together are still cheaper than one elimination.
When should I use Cholesky instead of LU?
When A is symmetric positive definite. Cholesky factors A = LL^T using only the lower triangle, costs half as much as LU (n³/6 vs n³/3), uses half the memory, and is unconditionally stable without any pivoting. Covariance matrices, stiffness matrices, and Gram matrices all qualify.
What is LDL^T and why use it?
LDL^T factors a symmetric matrix as L·D·L^T where L is unit-lower-triangular and D is diagonal (or block-diagonal). Compared to Cholesky it avoids square roots, which is faster on some hardware and works for symmetric indefinite matrices where Cholesky would fail. With Bunch–Kaufman pivoting it is the standard symmetric indefinite solver.