Random Matrix Theory

Random Matrix

Symmetric matrices with random entries — eigenvalues fill a universal semicircle

A random matrix has random entries. The eigenvalue density of a symmetric N×N matrix converges to Wigner's semicircle ρ(λ) = (1/π)√(4σ²N − λ²) — universal in the entry distribution.

  • Semicircle lawρ(λ) = (1/(2πσ²N))√(4σ²N − λ²)
  • Support[−2σ√N, 2σ√N]
  • Three ensemblesGOE (real), GUE (Hermitian), GSE (quaternion)
  • UniversalityLimit depends on σ² only — not on entry distribution
  • Largest eigenvalue2σ√N + σN^(−1/6) · Tracy-Widom
  • FoundersWigner 1955, Dyson 1962, Mehta 1960s

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

What is a random matrix?

A random matrix is a matrix whose entries are random variables. The simplest example: pick an N×N matrix M, fill each entry independently from a fixed distribution (say, standard normal), and look at it. You now have one specimen of an ensemble — a probability distribution over matrices.

What random matrix theory studies is not any single matrix but the statistics of functionals of M as N grows: the eigenvalues, the determinant, the largest singular value, the spacings between consecutive eigenvalues, the condition number. The headline result is that these quantities have well-defined limits that are often surprisingly insensitive to the choice of entry distribution. That insensitivity — universality — is what makes the theory predictive.

To get real eigenvalues we usually symmetrize: take an arbitrary N×N matrix A with i.i.d. entries, and form M = (A + Aᵀ)/√2. The result is real symmetric, so its eigenvalues are real. This is the Gaussian Orthogonal Ensemble (GOE) when the underlying entries are Gaussian; with Bernoulli ±1 entries it is the Wigner ensemble. Symmetric matrices arise naturally in physics (Hamiltonians), statistics (covariance matrices), and graph theory (adjacency matrices of random graphs).

Wigner's semicircle law

Fix σ > 0. Let M be an N×N symmetric matrix with independent entries (for i ≤ j) of mean 0 and variance σ². Let λ₁, λ₂, …, λ_N be its eigenvalues. The empirical spectral measure:

μ_N = (1/N) Σ δ_{λᵢ}

puts a unit mass at each eigenvalue divided by N. As N → ∞, μ_N converges weakly to the semicircle distribution:

ρ(λ) = (1/(2πσ²N)) √(4σ²N − λ²)    on [−2σ√N, 2σ√N]

The density is a half-ellipse — flat near the origin, falling smoothly to zero at the spectral edges ±2σ√N. The shape is independent of the choice of entry distribution. Whether your entries are Gaussian, ±1 Bernoulli, uniform on [−√3, √3], or Cauchy-truncated, you get the same semicircle (provided the entries have mean 0 and variance σ²).

The proof — Wigner's original — uses the method of moments. Compute the 2k-th moment of μ_N as N → ∞:

∫ λ^(2k) dμ_N(λ) = (1/N) tr(M^(2k))
                  = (1/N) Σ_{i_1, ..., i_(2k)} E[M_{i_1 i_2} M_{i_2 i_3} ... M_{i_(2k) i_1}]

As N → ∞ the only surviving terms correspond to non-crossing pair partitions of {1, …, 2k}, of which there are C_k (the k-th Catalan number). The result: ∫ λ^(2k) dρ(λ) = σ^(2k) N^k C_k — the moments of the semicircle. Odd moments vanish by symmetry. Carleman's theorem then identifies the limit measure uniquely.

Moment computation — worked example

For σ = 1 and N = 1000, the second moment is (1/N) Σ λᵢ² = (1/N) tr(M²) = (1/N) Σ Mᵢⱼ² ≈ N · Var(Mᵢⱼ) = N — so the typical eigenvalue squared is order N, the typical eigenvalue order √N. Concretely with σ = 1, the spectrum spans roughly [−2√1000, 2√1000] = [−63.2, +63.2].

Rescaling λ' = λ/√N puts the spectrum on [−2σ, 2σ] for any N, and the empirical density of λ' converges to the standard semicircle (1/(2πσ²))√(4σ² − λ²). This is the usual statement: after the natural N^(1/2) rescaling, the spectrum is bounded.

Random matrix ensembles compared

EnsembleSymmetryEntry typeSpectrum limitEdge fluctuationPhysics motivation
GOESymmetric (Mᵀ = M)Real GaussianSemicircleTracy-Widom F₁Time-reversal-invariant
GUEHermitian (M* = M)Complex GaussianSemicircleTracy-Widom F₂No time-reversal
GSEQuaternionic self-dualQuaternionic GaussianSemicircleTracy-Widom F₄Spin-orbit coupled
Wigner ensembleSymmetric ±1Bernoulli ±1Semicircle (universal)Tracy-Widom F₁Same as GOE
Wishart / LOES = XXᵀ/n, X ~ NReal GaussianMarchenko-PasturTracy-Widom F₁Sample covariance
GinibreNo symmetry constraintComplex GaussianUniform disk in CEdge eigenvalue Gumbel-likeOpen quantum systems

The semicircle is the universal bulk limit for symmetric matrices with finite-variance entries. Marchenko-Pastur is the analog for sample covariance matrices. Ginibre matrices have no symmetry, complex eigenvalues, and fill a uniform disk — the circular law.

The largest eigenvalue and Tracy-Widom

The bulk density tells you where eigenvalues are typically located. The largest eigenvalue is a separate, harder question. For a GOE matrix with σ = 1:

λ_max = 2√N + N^(−1/6) · ξ_TW

where ξ_TW follows the Tracy-Widom F₁ distribution. The distribution has mean ≈ −1.21 and variance ≈ 1.61, is asymmetric, and has heavy left tail (large deviations below the typical value are far more likely than above). Tracy-Widom describes the rescaled fluctuations of:

  • The largest eigenvalue of a random matrix.
  • The length of the longest increasing subsequence of a uniform random permutation.
  • The height of randomly-grown interfaces (KPZ universality).
  • The free energy of certain directed polymers.
  • The largest principal component direction in PCA on noisy data.

This is universality of a deeper kind: utterly unrelated probabilistic models converge to the same Tracy-Widom limit.

Where random matrices appear

  • Nuclear physics (Wigner, 1955). Heavy nuclei have Hamiltonians too complex to compute. Their resonance level spacings statistically match GOE — the original motivation for random matrix theory.
  • Quantum chaos (BGS conjecture). Quantum systems with chaotic classical dynamics have eigenvalue statistics indistinguishable from a Gaussian ensemble — the Bohigas-Giannoni-Schmit conjecture.
  • Riemann zeta zeros (Montgomery, 1972). The spacing between consecutive non-trivial zeros of ζ(s) matches GUE pair correlation. Confirmed numerically by Odlyzko to many decimal places.
  • Wireless communications. The capacity of a MIMO channel with random fading depends on the spectrum of HH* where H is the channel matrix. Random matrix theory gives sharp capacity formulas.
  • Principal component analysis. When you compute the eigenvalues of a sample covariance matrix, the bulk follows Marchenko-Pastur and outliers (true signal directions) stick out beyond the edge. This is the basis for choosing how many principal components to keep.
  • Neural network theory. The Hessian of a trained network has bulk spectrum predicted by Marchenko-Pastur plus outliers. Initialization schemes (orthogonal init, dynamical isometry) engineer the spectrum to avoid vanishing or exploding gradients.
  • Number theory (L-functions). The Katz-Sarnak philosophy classifies families of L-functions by which random matrix ensemble governs their low-lying zeros.
  • Graph theory. Adjacency matrices of large random graphs have semicircle bulk plus an outlier at degree √(np). Used to analyze spectral clustering and community detection.

Python — sampling and verifying the semicircle

import numpy as np
import matplotlib.pyplot as plt

def wigner_matrix(N, sigma=1.0, dist='gaussian'):
    """Symmetric N×N random matrix from a Wigner ensemble."""
    if dist == 'gaussian':
        A = np.random.randn(N, N) * sigma
    elif dist == 'bernoulli':
        A = np.random.choice([-1.0, +1.0], size=(N, N)) * sigma
    elif dist == 'uniform':
        # Mean 0, variance σ²: U[-√3 σ, +√3 σ]
        A = (np.random.rand(N, N) - 0.5) * 2 * np.sqrt(3) * sigma
    M = (A + A.T) / np.sqrt(2)
    return M

def semicircle_density(lam, sigma, N):
    """Wigner semicircle PDF on [-2σ√N, 2σ√N]."""
    R = 2 * sigma * np.sqrt(N)
    rho = np.where(np.abs(lam) < R,
                   (1.0 / (2 * np.pi * sigma**2 * N)) * np.sqrt(R**2 - lam**2),
                   0.0)
    return rho

# Verify universality: same semicircle for three distributions
N = 1500
sigma = 1.0
for dist in ['gaussian', 'bernoulli', 'uniform']:
    M = wigner_matrix(N, sigma, dist)
    eigs = np.linalg.eigvalsh(M)  # symmetric → real eigenvalues
    plt.hist(eigs, bins=80, density=True, alpha=0.4, label=dist)

lam_grid = np.linspace(-2 * sigma * np.sqrt(N), 2 * sigma * np.sqrt(N), 400)
plt.plot(lam_grid, semicircle_density(lam_grid, sigma, N),
         'k-', lw=2, label='semicircle')
plt.legend()
plt.title(f'Wigner semicircle, N={N}')
plt.show()

All three histograms — Gaussian, Bernoulli, uniform — overlay the same semicircle. That is universality made visible in one plot.

Proof sketch — moments and Catalan numbers

The semicircle law follows from showing that the moments of the empirical spectral measure converge to the semicircle moments. For a symmetric matrix M with i.i.d. zero-mean variance-σ² entries:

m_(2k) := E[(1/N) tr(M^(2k))]
       = (1/N) Σ E[M_{i_1 i_2} M_{i_2 i_3} ... M_{i_(2k) i_1}]

The expectation is non-zero only when each Mᵢⱼ appears an even number of times. The dominant contribution in N comes from cyclic sequences whose pairing structure is a non-crossing matching — and the count of non-crossing matchings of 2k objects is the Catalan number C_k = (1/(k+1)) C(2k, k). The resulting moment formula m_(2k) → σ^(2k) N^k C_k matches:

∫_(−2σ√N)^(2σ√N) λ^(2k) ρ(λ) dλ = σ^(2k) N^k C_k

Odd moments vanish. Since the semicircle has compact support and all moments determine the distribution (Carleman), this proves convergence.

Common pitfalls

  • Wrong rescaling. The eigenvalues of a Wigner matrix span [−2σ√N, +2σ√N], not [−2σ, +2σ]. The semicircle radius grows like √N. Forgetting this leaves you wondering why "everything blows up."
  • Asymmetric matrices. If M is i.i.d. without symmetrization, its eigenvalues are complex and follow the circular law (uniform disk of radius σ√N). The semicircle applies only to symmetric/Hermitian ensembles.
  • Sparse matrices. Random graph adjacency matrices have entries Bernoulli(p). If p is small (sparse regime), the semicircle breaks down and Erdős-Rényi-specific spectral laws kick in.
  • Heavy tails. If entries have infinite variance (Cauchy, Lévy stable), the semicircle fails and stable laws emerge instead. The α-Wigner law of Cizeau-Bouchaud handles heavy-tailed entries.
  • Mistaking bulk for edge. The semicircle describes typical eigenvalues. The largest eigenvalue lives at the edge and has separate statistics (Tracy-Widom). Confusing them gives wrong tail predictions.
  • Forgetting outliers. Real data covariance matrices often have a few outlier eigenvalues representing genuine signal, sitting outside the Marchenko-Pastur bulk. PCA component selection rests on detecting them above the spectral edge.

History

Eugene Wigner introduced random matrices in 1955 to model neutron resonances in heavy nuclei — the energy levels themselves were too complicated to compute, but their statistics, he argued, should reflect only the system's symmetry class. Freeman Dyson developed the three-fold classification (orthogonal, unitary, symplectic) in 1962. Madan Lal Mehta wrote the first systematic treatise (1967, expanded 2004). Vladimir Marchenko and Leonid Pastur derived the covariance-matrix analog in 1967. Hugh Montgomery's 1972 conjecture linked random matrices to the Riemann zeros; Andrew Odlyzko verified it numerically for billions of zeros. Craig Tracy and Harold Widom derived the edge eigenvalue distribution in the 1990s, and the Tracy-Widom distribution has since proliferated through KPZ universality and statistical physics. The 2000s brought rigorous proofs of universality (Erdős-Schlein-Yau, Tao-Vu) showing that local eigenvalue statistics are independent of the entry distribution.

Frequently asked questions

What is the Wigner semicircle law?

Take an N×N symmetric matrix M with independent entries Mᵢⱼ (for i ≤ j) drawn from any distribution with mean 0 and variance σ². Compute its N real eigenvalues. The histogram of those eigenvalues, rescaled to fit in a fixed interval, converges as N → ∞ to the semicircle density ρ(λ) = (1/(2πσ²N))√(4σ²N − λ²) supported on [−2σ√N, 2σ√N]. The density looks like the top half of an ellipse — flat near zero, falling smoothly to zero at the edges. Wigner proved this in 1955 to model the energy levels of heavy nuclei. The result holds for Gaussian entries (the Gaussian Orthogonal Ensemble), Bernoulli entries (±1 sign matrices), uniform entries — any distribution with the right two moments.

What does "universality" mean in random matrix theory?

Universality is the phenomenon that the limiting spectral distribution of a random matrix depends only on a few coarse statistics of the entries (typically the mean and variance), not on the full distribution. Identical semicircles emerge whether you draw entries from a Gaussian, a Bernoulli ±1, a uniform [−√3, √3], or any other zero-mean unit-variance distribution. The universality goes further: local statistics — gap distribution between consecutive eigenvalues, behavior at the spectral edge — also converge to universal limits (the sine kernel in the bulk, the Tracy-Widom distribution at the edge). This is the random-matrix analog of the central limit theorem.

How is random matrix theory used in physics?

Wigner introduced random matrices to model the Hamiltonians of heavy nuclei — too complicated to compute exactly, but their statistical behavior should reflect the symmetries of the system. The Gaussian Orthogonal Ensemble (real symmetric, time-reversal-invariant), Gaussian Unitary Ensemble (Hermitian, no time-reversal), and Gaussian Symplectic Ensemble (quaternionic, spin-orbit-coupled) classify nuclear spectra. The Bohigas-Giannoni-Schmit conjecture posits that quantum systems whose classical limit is chaotic have eigenvalue statistics identical to a Gaussian ensemble — verified experimentally in nuclei, billiards, and disordered conductors.

What is the Marchenko-Pastur law and how does it relate?

Marchenko-Pastur (1967) is the analog of Wigner's law for sample covariance matrices. Take a p×n data matrix X with i.i.d. zero-mean unit-variance entries, form the sample covariance S = (1/n)XXᵀ. As p, n → ∞ with ratio p/n → c ∈ (0, 1], the eigenvalues of S have empirical density ρ(λ) = (1/(2πcλ))√((λ₊ − λ)(λ − λ₋)) on [λ₋, λ₊] with λ± = (1 ± √c)². When c is small (n ≫ p) the bulk concentrates near 1 — sample covariance approximates true covariance. When c approaches 1 the spectrum spreads wildly, with a hard edge at zero.

What's the connection to the Riemann zeta function?

In 1972 Hugh Montgomery conjectured that the spacings between consecutive non-trivial zeros of the Riemann zeta function (when rescaled to unit average) follow the same statistical law as the eigenvalue spacings in the Gaussian Unitary Ensemble. Freeman Dyson recognized this on first hearing, having computed the GUE pair correlation function himself. Andrew Odlyzko later computed 10¹⁰ zeros numerically and found agreement to many decimal places. This Montgomery-Odlyzko law remains a conjecture, but it has shaped modern thinking about the zeros of L-functions.

How does random matrix theory apply to machine learning?

In high dimensions, the spectrum of trained neural network weight matrices, of empirical loss Hessians, and of activation covariance matrices closely matches predictions from random matrix theory. Marchenko-Pastur explains the bulk of weight spectra during early training; outlier eigenvalues at the spectral edge correspond to learned features. The spectral edge sets the largest eigenvalue, which controls the local condition number and the largest learning rate that won't diverge. Random matrix theory also models the dynamics of stochastic gradient descent.

What is the Tracy-Widom distribution?

The Tracy-Widom distribution describes the rescaled position of the largest eigenvalue of a Gaussian random matrix. For an N×N GOE matrix, the largest eigenvalue λ_max ≈ 2σ√N + σN^(−1/6) · ξ where ξ follows the Tracy-Widom F₁ distribution. The Tracy-Widom distribution is asymmetric, heavy-left-tailed, with mean ≈ −1.21 and variance ≈ 1.61. It governs the fluctuations of extreme eigenvalues, the longest increasing subsequence of a random permutation, the height of randomly-grown surfaces in the KPZ universality class, and the largest principal component in PCA on noisy data.