Why factor A as L·U instead of just running Gaussian elimination?

Because the factors are reusable. Factoring costs O(n³) once, but every subsequent right-hand side b only costs O(n²) (one forward solve, one back solve). If you need to solve Ax = b for ten different b vectors, LU is roughly ten times faster than redoing elimination from scratch.

What does the P stand for in PA = LU?

P is the permutation matrix that records the row swaps from partial pivoting. Without pivoting, many matrices fail to admit an LU factorization (a zero pivot kills the algorithm). With pivoting, every nonsingular matrix has a factorization PA = LU, where P is the recorded permutation. In code, P is usually stored as an integer array of indices, not as a matrix.

Without a normalization, no — you can rescale rows of L and the inverse rescaling on U. The Doolittle convention pins L to have unit diagonal (L_ii = 1); the Crout convention pins U to have unit diagonal. Either choice makes the factorization unique once the row order (the P) is fixed.

How do I solve Ax = b using L and U?

Two triangular solves. First apply the permutation: c = Pb. Then solve Ly = c by forward substitution (top to bottom). Then solve Ux = y by back substitution (bottom to top). Each triangular solve is O(n²). The two together are still cheaper than one elimination.

When should I use Cholesky instead of LU?

When A is symmetric positive definite. Cholesky factors A = LL^T using only the lower triangle, costs half as much as LU (n³/6 vs n³/3), uses half the memory, and is unconditionally stable without any pivoting. Covariance matrices, stiffness matrices, and Gram matrices all qualify.

What is LDL^T and why use it?

LDL^T factors a symmetric matrix as L·D·L^T where L is unit-lower-triangular and D is diagonal (or block-diagonal). Compared to Cholesky it avoids square roots, which is faster on some hardware and works for symmetric indefinite matrices where Cholesky would fail. With Bunch–Kaufman pivoting it is the standard symmetric indefinite solver.

LU Decomposition — How It Works, Pivoting & Worked Example

What LU is doing

Gaussian elimination already turns A into U. The new idea in LU is to also remember the multipliers — the m_ik = a_ik / a_kk values you used to zero each entry. Stored in a unit-lower-triangular matrix L, those multipliers exactly reconstruct A:

A = L · U
where Lᵢₖ = mᵢₖ for i > k and Lᵢᵢ = 1.

Once you have L and U, you have a perfect record of how to apply A's transformation as two cheap triangular solves. Factor once, use forever.

Solving with L and U

To solve Ax = b, write Ax = LUx = b and split it into two triangular systems:

Forward solve Ly = b. Because L is unit-lower-triangular, y₁ = b₁, then y₂ = b₂ − L₂₁·y₁, then y₃ = b₃ − L₃₁·y₁ − L₃₂·y₂, and so on top-down. Cost: ≈ n²/2.
Back solve Ux = y. Read x_n from the bottom row, substitute upward. Cost: ≈ n²/2.

Pictured as a flow:

b → [forward solve with L] → y → [back solve with U] → x

Each triangular solve touches every entry once, no more.

Worked example — PA = LU for a 3×3

Factor A = [[2, 1, 1], [4, 3, 3], [8, 7, 9]] with partial pivoting.

Step 1. The largest |entry| in column 1 is 8 in row 3. Swap row 1 with row 3. Update P to record the swap.

[ 8  7  9 ]
[ 4  3  3 ]
[ 2  1  1 ]

Multipliers — m₂₁ = 4/8 = 1/2, m₃₁ = 2/8 = 1/4. Subtract:

Row 2 ← Row 2 − (1/2)·Row 1   →  [ 0  −1/2  −3/2 ]
Row 3 ← Row 3 − (1/4)·Row 1   →  [ 0  −3/4  −5/4 ]

State after column 1:

[ 8     7     9   ]
[ 0  −1/2  −3/2  ]
[ 0  −3/4  −5/4  ]

Step 2. Largest |entry| in remaining column 2 is 3/4 in row 3. Swap row 2 and row 3. Update P. Also swap the already-stored multipliers in the L column for column 1 — they belong with their original rows.

[ 8     7     9   ]
[ 0  −3/4  −5/4  ]
[ 0  −1/2  −3/2  ]

Multiplier m₃₂ = (−1/2)/(−3/4) = 2/3. Row 3 ← Row 3 − (2/3)·Row 2:

U = [ 8     7     9   ]
    [ 0  −3/4  −5/4  ]
    [ 0     0  −2/3  ]

Final L (with the swaps applied) and P:

L = [   1     0   0 ]
    [ 1/4    1   0 ]
    [ 1/2  2/3   1 ]

P swaps row 1 ↔ row 3, then row 2 ↔ row 3 of A.

Verify by computing L·U — the product equals PA. To solve Ax = b for any b, permute b by P, forward-solve with L, back-solve with U.

LU vs LDL^T vs Cholesky vs full pivot LU

	LU (with partial pivot)	LDL^T	Cholesky (LL^T)	LU with full pivoting
Required structure of A	Square, nonsingular	Symmetric (any signature)	Symmetric positive definite	Square, nonsingular
Factor form	P·A = L·U	A = L·D·L^T	A = L·L^T	P·A·Q = L·U
Pivoting	Row swaps	Bunch–Kaufman block pivots	None needed	Row and column swaps
Factor cost	n³/3	n³/6	n³/6	n³/3 + O(n³) compares
Memory	n²	n²/2	n²/2	n²
Square roots needed	No	No	Yes (one per row)	No
Stability	Backward-stable in practice	Stable with B–K pivots	Always stable	Provably stable
Typical use	General dense systems	Symmetric indefinite (KKT)	Covariance, normal equations, FEM	Worst-case-stable solvers

Determinants, inverses, and reuse

Because P is a permutation, det(P) = ±1, and triangular matrices have determinant equal to the product of their diagonals:

det(A) = (sign of P) · ∏ᵢ Uᵢᵢ

For a matrix inverse, solve LU·X = P column-by-column against the identity matrix. That is n triangular solves of cost n² each, so O(n³) total — same order as the factorization itself but with a much smaller constant than redoing elimination n times.

For iterative refinement: solve Ax = b once with L and U, compute the residual r = b − Ax in higher precision, solve Ad = r the same way, update x ← x + d. Two or three iterations recover full precision even when the LU factors were computed in single precision — the trick behind modern mixed-precision solvers on GPUs.

Where LU shows up

Repeated linear solves. Time-stepping a stiff ODE solves (M − ΔtJ)x = b every step with the same matrix; LU lets each step be O(n²) instead of O(n³).
Newton's method on systems. Each iteration solves J(x)Δx = −F(x). Reusing an old LU factor for several steps is a standard quasi-Newton trick.
Steady-state circuit analysis. SPICE-style simulators factor the modified-nodal-analysis matrix and reuse it across DC sweeps and small-signal AC analyses.
Optimization KKT systems. Interior-point methods solve symmetric indefinite KKT systems each iteration; LDL^T with Bunch–Kaufman is the bread-and-butter solver.
Computing det(A). Direct cofactor expansion is O(n!); LU gives the determinant in O(n³).
Sparse direct solvers. Libraries like UMFPACK, MUMPS, and SuperLU compute sparse LU with elaborate fill-in-reducing reorderings; the underlying math is still PA = LU.

Common mistakes

Believing every matrix has an LU factorization. Without pivoting, even a simple matrix like [[0, 1], [1, 0]] has none. With partial pivoting, every nonsingular matrix has PA = LU.
Forgetting to permute b. If you stored P as a row order, you must reorder b the same way before the forward solve. Many "wrong answer" bugs are missing permutations.
Treating LU as unique without a convention. L and U are defined only up to a diagonal scaling. Pick Doolittle (unit L) or Crout (unit U) and stick with it; library outputs follow Doolittle by default.
Ignoring symmetry. Running LU on a symmetric positive definite matrix wastes half the work and half the memory. Check for symmetry up front and call Cholesky when applicable.
Trying to recover A from L and U after in-place factorization. Most libraries overwrite A with L (below the diagonal) and U (on and above). The original matrix is gone — keep a copy if you need it.
Confusing the sign of det(A). det(A) = (sign of P) · ∏U_ii. The sign of P is (−1)^k where k is the number of row swaps; forgetting the sign flips your determinant.

LU Decomposition

Watch the 60-second explainer

What LU is doing

Solving with L and U

Worked example — PA = LU for a 3×3

LU vs LDL^T vs Cholesky vs full pivot LU

Determinants, inverses, and reuse

Where LU shows up

Common mistakes

Frequently asked questions

Watch the 60-second explainer

What LU is doing

Solving with L and U

Worked example — PA = LU for a 3×3

LU vs LDLT vs Cholesky vs full pivot LU

Determinants, inverses, and reuse

Where LU shows up

Common mistakes

Frequently asked questions

Related concepts

LU vs LDL^T vs Cholesky vs full pivot LU