Linear Algebra
Matrix Inverse
The matrix that undoes A — when one exists
The inverse of a square matrix A is the unique matrix A⁻¹ satisfying A·A⁻¹ = A⁻¹·A = I. It exists if and only if det A ≠ 0 (equivalently, A's columns are linearly independent). Computed via the cofactor adjugate formula A⁻¹ = (1/det A)·adj(A), via Gauss–Jordan elimination on [A | I], or via LU decomposition. The inverse undoes the linear transformation A and is the abstract analogue of dividing by A.
- Defining propertyA·A⁻¹ = A⁻¹·A = I
- Exists iffA is square and det A ≠ 0
- 2×2 formula(1/(ad − bc)) · [[d, −b], [−c, a]]
- General formulaA⁻¹ = (1/det A) · adj(A)
- Numerical costO(n³) via LU or Gauss–Jordan
- Order rule(AB)⁻¹ = B⁻¹A⁻¹
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
What an inverse means
A matrix A acts on vectors by left multiplication: x ↦ A·x. The inverse A⁻¹ is the matrix that reverses that action. If A rotates the plane by 30°, A⁻¹ rotates it back by −30°. If A doubles the x-component and halves the y-component, A⁻¹ halves the x-component and doubles the y-component.
The defining equation is A·A⁻¹ = A⁻¹·A = I, where I is the identity matrix (1s on the diagonal, 0s elsewhere). For this equation to make sense, A must be square. For an inverse to exist, A must be nonsingular — det A ≠ 0.
The geometric reason singular matrices have no inverse: if det A = 0, the transformation A collapses some direction to zero, so multiple input vectors map to the same output. There is no way to recover the input uniquely; no inverse function exists.
Worked example — 2×2 inverse
Take A = [[2, 1], [3, 4]]. Compute det A = 2·4 − 1·3 = 5.
The 2×2 inverse formula is A⁻¹ = (1/det A) · [[d, −b], [−c, a]] (swap diagonal, negate off-diagonal, divide by det):
A⁻¹ = (1/5) · [[4, −1], [−3, 2]] = [[4/5, −1/5], [−3/5, 2/5]].
Verify: A·A⁻¹ = [[2, 1], [3, 4]] · [[4/5, −1/5], [−3/5, 2/5]] = [[8/5 − 3/5, −2/5 + 2/5], [12/5 − 12/5, −3/5 + 8/5]] = [[1, 0], [0, 1]]. ✓
Worked example — 3×3 inverse via cofactor adjugate
Take A = [[1, 2, 3], [0, 1, 4], [5, 6, 0]].
Step 1 — det A. Expand along the first row:
det A = 1·(1·0 − 4·6) − 2·(0·0 − 4·5) + 3·(0·6 − 1·5) = 1·(−24) − 2·(−20) + 3·(−5) = −24 + 40 − 15 = 1.
Step 2 — cofactor matrix C. Cij = (−1)i+j·Mij, where Mij is the minor (determinant of the 2×2 submatrix from deleting row i, column j).
C = [[−24, 20, −5], [18, −15, 4], [5, −4, 1]]
Step 3 — adjugate. adj(A) = CT (transpose of the cofactor matrix):
adj(A) = [[−24, 18, 5], [20, −15, −4], [−5, 4, 1]]
Step 4 — divide by det. det A = 1, so
A⁻¹ = adj(A) = [[−24, 18, 5], [20, −15, −4], [−5, 4, 1]].
Quick sanity check: row 1 of A times column 1 of A⁻¹ = 1·(−24) + 2·20 + 3·(−5) = −24 + 40 − 15 = 1. ✓
Methods of finding A⁻¹
| Cofactor adjugate | Gauss–Jordan elimination | LU decomposition | |
|---|---|---|---|
| Idea | A⁻¹ = (1/det A) · adj(A) | Row-reduce [A | I] until A becomes I; the right block becomes A⁻¹ | Factor A = LU; solve LU·X = I column by column |
| Best for | Symbolic 2×2 and 3×3 by hand | 4×4 and larger by hand | Numerical computation, especially for many right-hand sides |
| Asymptotic cost | O(n!) recursive, O(n⁴) with caching | O(n³) | O(n³) factorisation + O(n²) per solve |
| Numerical stability | Poor — division by det amplifies errors | Decent with partial pivoting | Excellent with partial pivoting (PA = LU) |
| Closed-form output | Yes — explicit formula in entries | No — algorithmic | Yes (factors), but not a single closed form |
| Reusable for many right-hand sides? | Yes (compute A⁻¹ once) | Yes | Yes — most efficient: factor once, solve many |
| Detects singularity | det = 0 obviously | Pivot becomes zero | U has zero on diagonal |
Practical advice: by hand on small symbolic matrices, use cofactors. Numerically, almost never compute A⁻¹ explicitly — solve the system Ax = b directly via LU, which is roughly twice as fast and far more stable.
Algebraic properties
- (A⁻¹)⁻¹ = A. Inverting twice is the identity operation.
- (AB)⁻¹ = B⁻¹A⁻¹. Reverse order — undo the most recent transformation first.
- (Aᵀ)⁻¹ = (A⁻¹)ᵀ. Transpose and inverse commute.
- (cA)⁻¹ = (1/c)·A⁻¹ for scalar c ≠ 0.
- det(A⁻¹) = 1 / det A. Inverse scales volumes by the reciprocal factor.
- Eigenvalues of A⁻¹ are reciprocals of eigenvalues of A. Same eigenvectors.
- A is orthogonal ⇔ A⁻¹ = Aᵀ. Hugely useful — orthogonal matrices invert by transposition.
Classical applications
- Solving linear systems. Ax = b has the formal solution x = A⁻¹b. In practice you do not compute A⁻¹; you solve directly.
- Linear regression — normal equations. The least-squares estimator is β̂ = (XᵀX)⁻¹·Xᵀy. Every regression textbook displays the inverse, but real software computes it via QR or SVD for stability.
- Change of basis. If P is the change-of-basis matrix from one coordinate system to another, the same linear map represented in the new basis is P⁻¹·A·P. Diagonalisation A = P·D·P⁻¹ is the canonical example.
- Computer graphics. Camera and model matrices are inverted to map between world, view, and screen coordinates. Orthogonal rotation matrices are inverted just by transposing.
- Cryptography (Hill cipher). Encrypts blocks of letters as vectors multiplied by an invertible matrix; decryption uses the inverse modulo 26.
- Markov chain analysis. Stationary distributions and fundamental matrices are computed by inverting (I − Q) where Q is the transient submatrix.
Common mistakes
- Writing (AB)⁻¹ = A⁻¹B⁻¹. Wrong order. The correct identity is (AB)⁻¹ = B⁻¹A⁻¹. Drop this and almost everything in derivations breaks.
- Trying to invert a non-square matrix. Only square matrices have classical inverses. For rectangular matrices, use the Moore–Penrose pseudoinverse A⁺.
- Computing A⁻¹ explicitly when you only need A⁻¹b. Slower and less stable than solving Ax = b directly.
- Forgetting to check det ≠ 0. If det A = 0, no inverse exists. Numerical software returns garbage or warnings rather than refusing outright.
- Using the cofactor formula for large matrices. O(n!) growth makes it impractical beyond about 5×5. Switch to Gauss–Jordan or LU.
- Confusing A⁻¹ with 1/A. 1/A is not a defined operation on matrices — there is no entry-wise division that makes algebraic sense. Always write A⁻¹.
- Mixing right and left inverses. For square matrices, left inverse = right inverse = the inverse. For rectangular matrices, one-sided inverses can exist without the other.
Frequently asked questions
Why does A⁻¹ exist iff det A ≠ 0?
Det A measures how A scales volumes. Det = 0 means A collapses some direction to zero — multiple inputs map to the same output, so A is not one-to-one and cannot be undone. Algebraically, the cofactor formula A⁻¹ = (1/det A)·adj(A) divides by det, which is undefined when det = 0. The condition det ≠ 0 is equivalent to: columns linearly independent, rank n, only kernel is 0, and the equation Ax = b has a unique solution for every b.
What is the formula for the inverse of a 2×2 matrix?
For A = [[a, b], [c, d]] with det A = ad − bc ≠ 0, the inverse is A⁻¹ = (1 / (ad − bc)) · [[d, −b], [−c, a]]. Swap a and d on the diagonal, negate b and c off the diagonal, divide by det. This is the cofactor adjugate formula specialised to 2×2.
Is (AB)⁻¹ = A⁻¹B⁻¹?
No. The correct identity is (AB)⁻¹ = B⁻¹A⁻¹. You unwind composed transformations in reverse order — like taking off shoes and socks. Verify by multiplying: AB·B⁻¹A⁻¹ = A·(BB⁻¹)·A⁻¹ = A·I·A⁻¹ = AA⁻¹ = I. The wrong order gives AB·A⁻¹B⁻¹, which only equals I when A and B commute.
Which method should I use to compute A⁻¹ in practice?
For 2×2 matrices, the cofactor formula is fastest by hand. For 3×3, cofactor still works but Gauss–Jordan on [A | I] starts to be competitive. For n ≥ 4 by hand, Gauss–Jordan wins. Numerically, a computer almost never explicitly inverts — it computes an LU or QR decomposition once and uses forward/back substitution to solve Ax = b. Cofactor expansion is O(n!), Gauss–Jordan and LU are O(n³).
What is the difference between an inverse and a pseudoinverse?
The classical inverse only exists for square nonsingular matrices. The Moore–Penrose pseudoinverse A⁺ generalises it to any rectangular or singular matrix and reduces to A⁻¹ when A is invertible. The pseudoinverse gives the least-squares solution to Ax = b: x = A⁺b minimises ‖Ax − b‖. It is the workhorse of linear regression and is computed via the singular value decomposition.
Why do numerical libraries warn against explicit inversion?
Computing A⁻¹ explicitly and then multiplying A⁻¹·b loses precision and is roughly twice as expensive as solving Ax = b directly with LU decomposition. For nearly singular A, the explicit inverse can have huge entries that swamp b in finite precision. Both NumPy and MATLAB documentation explicitly recommend solve(A, b) over inv(A)·b for this reason.