Linear Algebra
Cayley-Hamilton Theorem
If p_A(λ) = det(λI − A), then p_A(A) = 0 — the matrix annihilates its own characteristic polynomial
The Cayley-Hamilton theorem states: if A is an n×n matrix over a commutative ring and p_A(λ) = det(λI − A) is its characteristic polynomial, then substituting A for λ gives the zero matrix: p_A(A) = 0. Stated by Cayley (1858) for 2×2 and 3×3 cases without complete proof; generalized by Hamilton for quaternions; proven in full generality by Frobenius (1878). The theorem implies the minimal polynomial divides the characteristic polynomial. Used to compute matrix powers efficiently (Aⁿ in terms of A^(n−1), …, A⁰), invert matrices in O(n³) without Gaussian elimination, and prove the existence of Jordan normal form.
- Statementp_A(A) = 0
- StatedCayley 1858
- GeneralizedHamilton (quaternions)
- First general proofFrobenius 1878
- Impliesminimal poly | char poly
- Applicationmatrix exponentiation
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
Why Cayley-Hamilton matters
- Matrix exponential. e^(At) = Σ (At)ᵏ/k! reduces to a polynomial in A of degree ≤ n−1 — the infinite series collapses by Cayley-Hamilton, so e^(At) is a finite linear combination of I, A, A², …, A^(n−1) with scalar coefficients depending on t.
- Jordan form proof. Cayley-Hamilton plus the primary decomposition theorem gives the existence of the Jordan canonical form over algebraically closed fields — every matrix is similar to a block-diagonal of Jordan blocks.
- ODE solutions. Linear systems x' = Ax have solution x(t) = e^(At)x(0); Cayley-Hamilton gives a finite-formula representation, avoiding diagonalization for non-diagonalizable A.
- Control theory. Reachability and observability tests use the Cayley-Hamilton expansion of A — the controllability matrix [B | AB | A²B | … | A^(n−1)B] captures all reachable states because higher Aᵏ are linear combinations of these.
- Matrix inverse. If A is invertible, Cayley-Hamilton lets you write A⁻¹ = −(1/c_0)(A^(n−1) + c_(n−1) A^(n−2) + … + c_1 I) where the cᵢ are characteristic polynomial coefficients — gives an explicit formula without Gaussian elimination.
- Polynomial identities. Algebraic structure: the ring of polynomials in A is finite-dimensional as a vector space, isomorphic to F[x]/(m_A(x)) where m_A is the minimal polynomial.
- Foundational role. Cayley-Hamilton is one of the rare results stated for matrices that immediately gives a structural decomposition — it's the bridge between the algebraic (characteristic polynomial) and geometric (eigenspaces) views.
Common misconceptions
- Trivial proof by substitution. "Set λ = A in det(λI − A) = 0, get det(0) = 0." This is wrong. The expression det(λI − A) is a polynomial in scalar λ; you cannot substitute a matrix for a scalar inside the determinant — the entries λI − A become matrix-valued, and det is undefined on such things.
- Only over fields. The theorem holds over any commutative ring, including ℤ, ℤ/nℤ, polynomial rings, and rings of continuous functions. The proof uses only the adjugate identity, which works in any commutative ring.
- Minimal = characteristic. The minimal polynomial divides the characteristic polynomial but they need not equal. For diagonal matrix diag(2, 2, 3): characteristic = (λ−2)²(λ−3), minimal = (λ−2)(λ−3). Equality holds iff A has a cyclic vector — the minimal polynomial degree equals n.
- Eigenvalues are roots of minimal. Half-true. Both polynomials have the same roots (the eigenvalues), but with potentially different multiplicities. Geometrically: minimal multiplicity = largest Jordan block size for that eigenvalue; characteristic multiplicity = total dimension of generalized eigenspace.
- Cayley proved it. Cayley stated and verified it for 2×2 and 3×3 cases (1858), but no general proof. Hamilton handled quaternions (a noncommutative case requiring care). Frobenius (1878) gave the first complete proof for arbitrary n.
- Theorem requires diagonalizability. No — Cayley-Hamilton works for every matrix, including non-diagonalizable ones. In fact, it's most useful precisely when diagonalization fails: it's how you handle Jordan blocks and generalized eigenvectors.
Worked example
Take A = [[2, 1], [0, 3]]. The characteristic polynomial is p_A(λ) = (λ−2)(λ−3) = λ² − 5λ + 6. Cayley-Hamilton claims p_A(A) = A² − 5A + 6I = 0.
Compute: A² = [[4, 5], [0, 9]]. Then 5A = [[10, 5], [0, 15]] and 6I = [[6, 0], [0, 6]]. So A² − 5A + 6I = [[4 − 10 + 6, 5 − 5 + 0], [0, 9 − 15 + 6]] = [[0, 0], [0, 0]]. Confirmed.
Now compute A⁵. Cayley-Hamilton gives A² = 5A − 6I, so A³ = A·A² = 5A² − 6A = 5(5A−6I) − 6A = 19A − 30I. Continuing: A⁴ = 19A² − 30A = 19(5A−6I) − 30A = 65A − 114I. A⁵ = 65A² − 114A = 65(5A−6I) − 114A = 211A − 390I = [[211·2 − 390, 211], [0, 211·3 − 390]] = [[32, 211], [0, 243]]. No 5×5 matrix multiplications required.
Frequently asked questions
What is the characteristic polynomial?
For an n×n matrix A, the characteristic polynomial is p_A(λ) = det(λI − A), a monic polynomial of degree n in the variable λ. Its roots are the eigenvalues of A — values λ for which Av = λv has a nonzero solution. For a 2×2 matrix [[a,b],[c,d]] the polynomial is λ² − (a+d)λ + (ad−bc) = λ² − tr(A)λ + det(A). The constant term is (−1)ⁿ det(A) and the coefficient of λ^(n−1) is −tr(A). Cayley-Hamilton says A itself satisfies this polynomial.
Why does p_A(A) = 0 not follow trivially from setting λ = A in det(λI − A) = 0?
This is the most common false proof. The expression det(λI − A) is a polynomial in the scalar λ; substituting the matrix A for the scalar λ inside the determinant doesn't make sense — det requires a matrix entry, and λI − A with λ replaced by A is a matrix whose entries are themselves matrices. The trick det(AI − A) = det(0) = 0 is meaningless. A real proof uses the adjugate (classical adjoint) matrix and the identity (λI − A)·adj(λI − A) = det(λI − A)·I, viewed as polynomial identities in λ with matrix coefficients.
How does it imply the minimal polynomial divides the characteristic polynomial?
The minimal polynomial m_A(λ) of A is the monic polynomial of least degree with m_A(A) = 0. By Cayley-Hamilton, p_A is one such polynomial, so m_A has degree at most n. Polynomial division gives p_A = q·m_A + r with deg(r) < deg(m_A). Evaluating at A: p_A(A) − q(A)·m_A(A) = r(A), so r(A) = 0. By minimality of m_A, r must be zero. Hence m_A divides p_A. The roots of m_A are exactly the eigenvalues, but with multiplicities ≤ those in p_A.
How is the theorem used to compute matrix powers?
Cayley-Hamilton gives Aⁿ = c_(n−1) A^(n−1) + … + c_1 A + c_0 I as a linear combination of lower powers — coefficients read off the characteristic polynomial. For higher powers, multiply both sides by A: A^(n+1) is again expressible as a combination of A^(n−1), …, I. This means every Aᵏ for k ≥ n lies in the n-dimensional span {I, A, A², …, A^(n−1)}. Computing A^1000 reduces to finding 1000-step recurrences on n coefficients — far cheaper than n×n matrix multiplications repeated 1000 times.
What's the relationship to the Jordan canonical form?
Cayley-Hamilton is one ingredient in the proof that every matrix over an algebraically closed field is similar to a Jordan canonical form. The characteristic polynomial factors as ∏(λ − λᵢ)^(mᵢ) where mᵢ is the algebraic multiplicity. For each eigenvalue λᵢ, the kernel of (A − λᵢI)^(mᵢ) is the generalized eigenspace, and these decompose ℂⁿ. Picking generalized eigenvector chains gives the Jordan basis. Conversely, JCF lets you read off both polynomials: characteristic polynomial = ∏ (λ−λᵢ)^(mᵢ), minimal polynomial = ∏ (λ−λᵢ)^(largest block size at λᵢ).
Why does it work over any commutative ring?
The proof uses only: determinants exist, the adjugate identity (λI − A)·adj(λI − A) = det(λI − A)·I holds, and polynomial coefficients commute. All three hold over any commutative ring R — even ones without inverses or zero divisors. So the theorem applies to matrices over ℤ, ℤ/nℤ, polynomial rings R[x], rings of continuous functions, and so on. This generality makes it a foundational result, not just a fact about real or complex matrices.