Linear Algebra

Gaussian Elimination

Sweep variables out of a linear system, one column at a time

Gaussian elimination reduces a system of linear equations to upper-triangular form by sweeping out variables column by column. With partial pivoting it solves Ax = b in O(n³) operations and is the algorithmic spine of nearly all dense numerical linear algebra.

  • SolvesA · x = b for square or rectangular A
  • Forward sweep cost≈ n³/3 multiply-adds
  • Back-substitution cost≈ n²/2 multiply-adds
  • Output formUpper-triangular U with row swaps recorded
  • StabilityBackward-stable with partial pivoting
  • Used inLinear systems, matrix inverse, determinant, rank, null space

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The three legal row operations

Gaussian elimination only ever does three things to the augmented matrix [A | b]:

  1. Swap two rows. The system has the same solution either way — equation order is meaningless.
  2. Multiply a row by a nonzero scalar. Scaling 2x + 4y = 10 to x + 2y = 5 doesn't change which (x, y) satisfies it.
  3. Add a multiple of one row to another. If row i and row j are both true, then row i + c·row j is also true.

Every step of elimination is a sequence of operation 3, occasionally interleaved with operation 1 for stability. The point is to use these moves to drive every entry below the diagonal to zero.

Forward sweep — getting to upper triangular

Process columns left to right. For column k:

  1. Choose a pivot — the entry on the diagonal in column k. (With partial pivoting, swap in the row whose entry in column k has the largest absolute value.)
  2. For every row i below the pivot, compute the multiplier mik = aik / akk.
  3. Subtract mik times row k from row i. This zeroes aik and updates the rest of row i.
  4. Apply the same multiplier to bi: bi ← bi − mik·bk.

After processing column k, every entry below the diagonal in that column is zero. After processing all n columns the augmented matrix has the shape:

[ u₁₁ u₁₂ u₁₃ ... u₁ₙ | c₁ ]
[  0  u₂₂ u₂₃ ... u₂ₙ | c₂ ]
[  0   0  u₃₃ ... u₃ₙ | c₃ ]
[  ...                  ... ]
[  0   0   0  ... uₙₙ | cₙ ]

Back-substitution — reading the answer

The bottom row is now a one-variable equation: unn·xn = cn, so xn = cn/unn. Substitute that into row n−1, which now has only one unknown left, and solve for xn−1. Walk up the rows one at a time. The general formula:

xₖ = ( cₖ − Σⱼ₌ₖ₊₁ⁿ uₖⱼ · xⱼ ) / uₖₖ

Worked example — a 3×3 system

Solve:

2x +  y −  z =  8
−3x −  y + 2z = −11
−2x +  y + 2z =  −3

Augmented matrix:

[  2   1  −1 |  8 ]
[ −3  −1   2 | −11]
[ −2   1   2 | −3 ]

Step 1 — eliminate column 1. Pivot is 2 in row 1. Multipliers:

  • m₂₁ = −3/2. Row 2 ← Row 2 − (−3/2)·Row 1 = Row 2 + 1.5·Row 1.
  • m₃₁ = −2/2 = −1. Row 3 ← Row 3 + 1·Row 1.
[  2   1   −1   |  8 ]
[  0   1/2  1/2 |  1 ]
[  0   2    1   |  5 ]

Step 2 — eliminate column 2. Pivot is 1/2 in row 2. Multiplier m₃₂ = 2 / (1/2) = 4. Row 3 ← Row 3 − 4·Row 2.

[  2   1   −1  |  8 ]
[  0   1/2  1/2|  1 ]
[  0   0   −1  |  1 ]

Step 3 — back-substitute.

  • From row 3: −1·z = 1, so z = −1.
  • From row 2: (1/2)y + (1/2)(−1) = 1 → (1/2)y = 3/2 → y = 3.
  • From row 1: 2x + 3 − (−1) = 8 → 2x = 4 → x = 2.

Solution: (x, y, z) = (2, 3, −1). Check: 2(2) + 3 − (−1) = 4 + 3 + 1 = 8. ✓

Pivoting — partial, complete, and rook

The single most important practical detail is pivoting. Naive elimination divides by whatever sits on the diagonal, which can be tiny or zero. Three strategies:

StrategyWhat you swapCost per stepStability
No pivotingNothing0Fails outright if a pivot is 0; numerically unstable in general
Partial pivotingSwap rows so the largest |entry| in column k is on the diagonalO(n) compares per stepBackward-stable in practice; almost always used
Complete pivotingSearch the whole remaining submatrix and swap both rows and columnsO((n−k)²) comparesProvably stable; rarely worth the constant factor
Rook pivotingSearch a row and column alternately for a local maxO(n) on averageCompromise — better than partial, cheaper than complete

LAPACK's dgesv, NumPy's linalg.solve, and MATLAB's backslash all use partial pivoting. You should too.

Gaussian elimination vs Gauss–Jordan vs LU

Gaussian + back-subGauss–JordanLU factorization
Final shape of AUpper triangular UIdentity IL (lower) and U (upper) stored separately
Forward-sweep cost≈ n³/3≈ n³/2≈ n³/3 (same as Gaussian)
Back-substitution cost≈ n²/20 (read off directly)≈ n² per right-hand side
Best for one Ax = bYes — minimum workNo — wasted arithmeticNo — same cost as Gaussian without reuse
Best for many b's, same AHave to redo each timeHave to redo each timeYes — factor once, solve cheaply per b
Useful for inversionNo (need extra work)Yes — apply to [A | I]Yes — solve LU·X = I column by column
Output reusableJust the answer xJust the answer xL, U, P — reusable for solve, det, refinement

Elimination as a sequence of matrix multiplications

Each elimination step is left-multiplication by an elementary matrix. Subtracting m·row k from row i corresponds to multiplying by I minus a single m·eiekT. After all n−1 columns, you have

Lₙ₋₁ ⋯ L₂ L₁ · A = U

and inverting the Lk's shows that A = (L1−1 ⋯ Ln−1−1) · U = L · U where L collects all multipliers below its diagonal. That's the LU decomposition: a record of the elimination steps.

Where Gaussian elimination shows up

  • Solving Ax = b — the canonical use, from circuit equations to finite-element stiffness systems.
  • Computing the determinant — det(A) = ±(product of diagonal entries of U), where the sign comes from the parity of row swaps.
  • Computing a matrix inverse — apply elimination to [A | I]; when the left side becomes I, the right side is A⁻¹.
  • Finding rank, null space, column space — the row-echelon form exposes pivot columns (basis of column space) and free columns (null-space directions).
  • Polynomial and spline interpolation — Vandermonde and tridiagonal systems are still solved by elimination, often with structured-matrix shortcuts that bring the cost down to O(n²) or O(n).
  • Linear programming pivots — the simplex method is essentially Gaussian elimination with a cost-vector test driving which pivot to take next.

Common mistakes

  • Skipping pivoting. Hand-rolled Gaussian elimination without row swaps is one of the classic numerical traps. A pivot of 10⁻¹⁵ produces multipliers around 10¹⁵ and destroys 15 digits of precision in a single step.
  • Forgetting to update the right-hand side. Every operation on row k of A must also be applied to bk. Working on the augmented matrix [A | b] enforces this automatically.
  • Treating row swaps as an afterthought. The row-swap parity flips the sign of the determinant, and the swap order matters when you want LU = PA later. Track the permutation as you go.
  • Mistaking a zero pivot for "the algorithm broke". A zero pivot means the column has a structural problem — either swap to fix it or accept that the matrix is singular. The algorithm is fine; the data is exposing rank deficiency.
  • Using Gauss–Jordan when Gaussian + back-sub would do. Gauss–Jordan is taught first because it produces the identity directly, but it does roughly 50% more arithmetic. For Ax = b, stop at upper triangular and back-substitute.
  • Inverting A to solve Ax = b. Computing A⁻¹b is slower and less accurate than direct elimination. The numerical-analysis maxim "never invert a matrix unless you really need A⁻¹" applies here.

Frequently asked questions

Why is Gaussian elimination O(n³)?

Each of the n elimination steps zeroes one column below the pivot, touching the remaining (n−k)×(n−k+1) submatrix. Summing k(n−k)² for k = 1..n gives roughly n³/3 multiply-adds for the forward sweep, plus another n²/2 for back-substitution. The cubic term dominates, so doubling n multiplies cost by eight.

What is partial pivoting and why do I need it?

Before zeroing column k, you swap rows so the entry with the largest absolute value sits on the diagonal. Without this swap, a tiny pivot (like 10⁻¹⁵) gets divided into the rest of the row, magnifying every floating-point error and wrecking the answer. Partial pivoting is cheap (one row swap per column) and standard in every production solver.

What is the difference between Gaussian elimination and Gauss–Jordan?

Gaussian elimination produces an upper-triangular matrix and then back-substitutes; Gauss–Jordan keeps eliminating until the matrix is the identity, so the right-hand side directly reads off the solution. Gauss–Jordan does about 50% more arithmetic, so for solving Ax = b you should prefer plain Gaussian + back-substitution. Gauss–Jordan is convenient when teaching matrix inversion.

What happens if a pivot is exactly zero?

You cannot divide by zero, so the algorithm fails as written. If any other row has a nonzero entry in that column, swap rows and continue. If every remaining row is zero in that column, the matrix is singular: the system has either no solution or infinitely many, and elimination has revealed the rank deficiency.

Does Gaussian elimination work for non-square matrices?

Yes. The same row operations reduce any m×n matrix to row-echelon form, which exposes the rank, the column space, and the null space. For overdetermined least-squares systems you typically use QR or SVD instead, but the underlying machinery is the same set of row operations.

Why not just compute A⁻¹ and multiply by b?

Computing A⁻¹ takes about three Gaussian eliminations worth of work, then multiplying A⁻¹b costs another n² operations and has worse roundoff than direct elimination. Numerical analysts have a slogan: never invert a matrix to solve a linear system. Use elimination (or LU) directly.