Q: How is the theorem used to compute matrix powers?

Cayley-Hamilton gives Aⁿ = c_(n−1) A^(n−1) + … + c_1 A + c_0 I as a linear combination of lower powers — coefficients read off the characteristic polynomial. For higher powers, multiply both sides by A: A^(n+1) is again expressible as a combination of A^(n−1), …, I. This means every Aᵏ for k ≥ n lies in the n-dimensional span {I, A, A², …, A^(n−1)}. Computing A^1000 reduces to finding 1000-step recurrences on n coefficients — far cheaper than n×n matrix multiplications repeated 1000 times.

Q: What's the relationship to the Jordan canonical form?

Cayley-Hamilton is one ingredient in the proof that every matrix over an algebraically closed field is similar to a Jordan canonical form. The characteristic polynomial factors as ∏(λ − λᵢ)^(mᵢ) where mᵢ is the algebraic multiplicity. For each eigenvalue λᵢ, the kernel of (A − λᵢI)^(mᵢ) is the generalized eigenspace, and these decompose ℂⁿ. Picking generalized eigenvector chains gives the Jordan basis. Conversely, JCF lets you read off both polynomials: characteristic polynomial = ∏ (λ−λᵢ)^(mᵢ), minimal polynomial = ∏ (λ−λᵢ)^(largest block size at λᵢ).

Q: Why does it work over any commutative ring?

The proof uses only: determinants exist, the adjugate identity (λI − A)·adj(λI − A) = det(λI − A)·I holds, and polynomial coefficients commute. All three hold over any commutative ring R — even ones without inverses or zero divisors. So the theorem applies to matrices over ℤ, ℤ/nℤ, polynomial rings R[x], rings of continuous functions, and so on. This generality makes it a foundational result, not just a fact about real or complex matrices.

Question 1

What is the characteristic polynomial?

Accepted Answer

For an n×n matrix A, the characteristic polynomial is p_A(λ) = det(λI − A), a monic polynomial of degree n in the variable λ. Its roots are the eigenvalues of A — values λ for which Av = λv has a nonzero solution. For a 2×2 matrix [[a,b],[c,d]] the polynomial is λ² − (a+d)λ + (ad−bc) = λ² − tr(A)λ + det(A). The constant term is (−1)ⁿ det(A) and the coefficient of λ^(n−1) is −tr(A). Cayley-Hamilton says A itself satisfies this polynomial.

Question 2

Why does p_A(A) = 0 not follow trivially from setting λ = A in det(λI − A) = 0?

Accepted Answer

This is the most common false proof. The expression det(λI − A) is a polynomial in the scalar λ; substituting the matrix A for the scalar λ inside the determinant doesn't make sense — det requires a matrix entry, and λI − A with λ replaced by A is a matrix whose entries are themselves matrices. The trick det(AI − A) = det(0) = 0 is meaningless. A real proof uses the adjugate (classical adjoint) matrix and the identity (λI − A)·adj(λI − A) = det(λI − A)·I, viewed as polynomial identities in λ with matrix coefficients.

Question 3

How does it imply the minimal polynomial divides the characteristic polynomial?

Accepted Answer

The minimal polynomial m_A(λ) of A is the monic polynomial of least degree with m_A(A) = 0. By Cayley-Hamilton, p_A is one such polynomial, so m_A has degree at most n. Polynomial division gives p_A = q·m_A + r with deg(r) < deg(m_A). Evaluating at A: p_A(A) − q(A)·m_A(A) = r(A), so r(A) = 0. By minimality of m_A, r must be zero. Hence m_A divides p_A. The roots of m_A are exactly the eigenvalues, but with multiplicities ≤ those in p_A.

Question 4

How is the theorem used to compute matrix powers?

Accepted Answer

Cayley-Hamilton gives Aⁿ = c_(n−1) A^(n−1) + … + c_1 A + c_0 I as a linear combination of lower powers — coefficients read off the characteristic polynomial. For higher powers, multiply both sides by A: A^(n+1) is again expressible as a combination of A^(n−1), …, I. This means every Aᵏ for k ≥ n lies in the n-dimensional span {I, A, A², …, A^(n−1)}. Computing A^1000 reduces to finding 1000-step recurrences on n coefficients — far cheaper than n×n matrix multiplications repeated 1000 times.

Question 5

What's the relationship to the Jordan canonical form?

Accepted Answer

Cayley-Hamilton is one ingredient in the proof that every matrix over an algebraically closed field is similar to a Jordan canonical form. The characteristic polynomial factors as ∏(λ − λᵢ)^(mᵢ) where mᵢ is the algebraic multiplicity. For each eigenvalue λᵢ, the kernel of (A − λᵢI)^(mᵢ) is the generalized eigenspace, and these decompose ℂⁿ. Picking generalized eigenvector chains gives the Jordan basis. Conversely, JCF lets you read off both polynomials: characteristic polynomial = ∏ (λ−λᵢ)^(mᵢ), minimal polynomial = ∏ (λ−λᵢ)^(largest block size at λᵢ).

Question 6

Why does it work over any commutative ring?

Accepted Answer

The proof uses only: determinants exist, the adjugate identity (λI − A)·adj(λI − A) = det(λI − A)·I holds, and polynomial coefficients commute. All three hold over any commutative ring R — even ones without inverses or zero divisors. So the theorem applies to matrices over ℤ, ℤ/nℤ, polynomial rings R[x], rings of continuous functions, and so on. This generality makes it a foundational result, not just a fact about real or complex matrices.

Cayley-Hamilton Theorem

Watch the 60-second explainer

Why Cayley-Hamilton matters

Common misconceptions

Worked example

Frequently asked questions

Watch the 60-second explainer

Why Cayley-Hamilton matters

Common misconceptions

Worked example

Frequently asked questions

Related concepts