Cosmology

Halo Mass Function

A power law of tiny halos, an exponential cliff of giant ones — and counting clusters on that cliff weighs the universe

The halo mass function n(M, z) counts dark-matter halos per unit mass, with a steep exponential cutoff at high mass — the basis of cluster cosmology.

High-mass behaviorexponential cutoff ∝ exp(−δc²/2σ²)
Low-mass slopedn/dM ∝ M⁻¹·⁹ (power law)
First analytic formPress-Schechter (1974)
Modern fitsSheth-Tormen (1999), Tinker (2008)
Collapse thresholdδc ≈ 1.686 (spherical)
Cosmology levercluster counts constrain σ8 & Ωm

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

A census of dark matter halos

Imagine taking a representative cube of the universe — a hundred megaparsecs on a side, comoving, so it grows with the cosmic expansion — and sorting every gravitationally bound clump of dark matter inside it by mass. You would find an enormous number of small halos, the kind that host single dwarf galaxies at 10⁹ or 10¹⁰ solar masses. You would find fewer halos at the mass of the Milky Way, around 10¹² solar masses. And you would find a precious handful of the giants — galaxy clusters at 10¹⁴ to 10¹⁵ solar masses, the largest gravitationally collapsed objects in existence. The function that quantifies that census, the number density of halos per unit mass at each cosmic epoch, is the halo mass function, written n(M, z) or dn/dM.

Its shape is the signature of how cosmic structure grows. Toward low mass it rises as a gentle power law, roughly dn/dM ∝ M⁻¹·⁹, so smaller halos always vastly outnumber bigger ones. But toward high mass it does not merely steepen — it collapses, falling off with a steep exponential cutoff. That cutoff is the single most important feature of the mass function. It is why galaxy clusters are rare, why the most massive cluster in any given survey volume is a near-unique object, and why counting those rare objects turns out to be one of the sharpest tools in observational cosmology. The mass function is, quite literally, the foundation of structure counts.

How it works: rare peaks crossing a threshold

The physics behind the mass function is the statistics of a Gaussian random field. In the early universe, the density field is nearly Gaussian, with small fractional fluctuations δ = (ρ − ρ̄)/ρ̄. Gravity amplifies overdense regions. A patch of the universe collapses into a virialized halo once its linearly extrapolated overdensity reaches a critical value, the collapse threshold

δc ≈ 1.686   (spherical collapse, weakly cosmology-dependent)

To talk about halos of a particular mass M, you smooth the density field with a filter enclosing a mass M. The fluctuations on that scale have a root-mean-square amplitude σ(M), the mass variance, which is a decreasing function of M: small regions are lumpy, big regions are smooth. Defining the dimensionless peak height

ν(M, z) = δc / σ(M, z)

a halo of mass M at redshift z is a region whose density fluctuation crossed δc. For massive halos σ(M) is tiny, so ν is large — you need a fluctuation many σ above the mean. The probability of such a rare excursion in a Gaussian falls off as exp(−ν²/2) = exp(−δc²/2σ²). That is the origin of the exponential cutoff: the most massive halos are the rarest peaks, and rare peaks are exponentially suppressed.

The mean matter density ρ̄m converts "fraction of mass in collapsed regions" into "number of halos per unit volume." Putting the pieces together gives the generic form

dn/dM = (ρ̄m / M) f(σ) |dlnσ⁻¹/dM|

where f(σ) is the dimensionless multiplicity function — the fraction of mass collapsing into halos per logarithmic interval of σ. All the cosmology and all the collapse physics live in f(σ). Different theoretical models and simulation fits are simply different choices of f(σ).

The Press-Schechter form

The first analytic derivation came from William Press and Paul Schechter in 1974, two decades before anyone could simulate it properly. They assumed spherical collapse and counted Gaussian regions above the threshold, arriving at

dn/dM = √(2/π) · (ρ̄m / M) · (δc / σ²) · |dσ/dM| · exp(−δc² / 2σ²)

The exponential factor is the rare-peak suppression; the prefactor counts the regions. There was one notorious wrinkle. The naive calculation accounted for only half the mass — it missed material that ends up inside larger collapsed regions, the so-called cloud-in-cloud problem — so Press and Schechter inserted a factor of two by hand. That fudge was put on rigorous footing in 1991 by the excursion-set (extended Press-Schechter) formalism of Bond, Cole, Efstathiou and Kaiser, who modeled δ as a random walk in smoothing scale and counted first up-crossings of the threshold. The factor of two emerged naturally.

Press-Schechter is remarkable for getting the qualitative shape right from pure theory: the power law, the knee, the exponential tail. But quantitatively it is off. Compared to N-body simulations it overpredicts halos near and below the characteristic mass and underpredicts the most massive halos — exactly where cosmologists most want accuracy.

Sheth-Tormen and the move to ellipsoidal collapse

Ravi Sheth and Giuseppe Tormen (1999) traced the discrepancy to the spherical-collapse assumption. Real proto-halos are not spheres; they collapse along their shortest axis first under the local tidal field, an ellipsoidal collapse in which the effective threshold depends on the peak height itself. Encoding this, they wrote the multiplicity function as

ν f(ν) = A √(2a/π) · [ 1 + (a ν²)^(−p) ] · ν · exp(−a ν² / 2)
         with  ν = δc/σ,  a ≈ 0.707,  p ≈ 0.3,  A ≈ 0.322

Setting a = 1, p = 0, A = ½ recovers Press-Schechter exactly, so Sheth-Tormen is a generalization, not a replacement. The parameter a < 1 makes the exponential cutoff less severe, boosting massive halos; the (aν²)⁻ᵖ term raises the low-ν (high-mass) abundance and lowers the count near the knee. The net effect matches simulations to the 10–20% level, and Sheth-Tormen was the workhorse fit for two decades.

The current precision standard is the Tinker et al. (2008) form, calibrated directly against large dark-matter-only N-body simulations and — importantly — parameterized by the spherical-overdensity threshold Δ, so the same fit can be evaluated for M200m, M500c, and so on. Tinker reaches roughly the few-percent level, which is what modern cluster surveys demand. The lineage Press-Schechter → excursion set → Sheth-Tormen → Tinker is the story of the mass function maturing from analytic insight into a calibrated precision tool.

Worked example: how rare is a 10¹⁵ M☉ cluster?

Take a present-day (z = 0) ΛCDM universe with σ8 ≈ 0.81 and Ωm ≈ 0.31. The mass variance σ(M) on cluster scales is approximately:

σ(M = 10¹⁴ M☉) ≈ 0.62
σ(M = 5×10¹⁴ M☉) ≈ 0.45
σ(M = 10¹⁵ M☉) ≈ 0.38

Form the peak height ν = δc/σ with δc = 1.686 and evaluate the dominant exponential suppression exp(−ν²/2):

M = 10¹⁴:  ν = 1.686/0.62 = 2.72,  exp(−ν²/2) = exp(−3.70) = 2.5×10⁻²
M = 5×10¹⁴: ν = 1.686/0.45 = 3.75,  exp(−ν²/2) = exp(−7.03) = 8.9×10⁻⁴
M = 10¹⁵:  ν = 1.686/0.38 = 4.44,  exp(−ν²/2) = exp(−9.86) = 5.2×10⁻⁵

From 10¹⁴ to 10¹⁵ solar masses — one decade in mass — the exponential factor drops by nearly 500×. The number density of clusters plummets correspondingly: roughly 10⁻⁵ per (Mpc/h)³ above 10¹⁴ M☉, but only of order 10⁻⁸ per (Mpc/h)³ above 10¹⁵ M☉. In a survey volume of one cubic gigaparsec there are millions of group-scale halos but only a few hundred 10¹⁵-solar-mass monsters. The biggest clusters are genuinely near-unique objects, perched on the exponential cliff.

Now the cosmology. Suppose σ8 were 10% higher, 0.89 instead of 0.81. Then σ(10¹⁵) rises by ~10% to ≈ 0.42, ν drops to 4.01, and exp(−ν²/2) becomes exp(−8.04) = 3.2×10⁻⁴ — a factor of ~6 more of the most massive clusters from a mere 10% change in σ8. This brutal sensitivity is precisely why cluster counts pin down σ8 so tightly.

Why cluster counts constrain σ8 and Ωm

The exponential cutoff is where the cosmological information lives. Its location and steepness are controlled almost entirely by two parameters:

σ8 — the fluctuation amplitude. By convention σ8 is the rms linear density fluctuation in spheres of radius 8 h⁻¹ Mpc, which enclose roughly a cluster's worth of mass. Because σ(M) at cluster scales scales directly with σ8, the parameter sits inside the exponent. As the worked example shows, a 10% change in σ8 moves the count of the rarest clusters by a factor of several. The cutoff is an exquisitely sensitive amplitude meter.
Ωm — the matter density. Ωm enters twice: through the mean matter density ρ̄m in the prefactor (more matter → more halos at fixed σ), and through the linear growth factor D(z) that sets how σ(M, z) and the characteristic mass M*(z) evolve. The redshift evolution of the cutoff is therefore a clock measuring the growth of structure, which depends on Ωm and on dark energy.

Counting clusters in bins of mass and redshift breaks the σ8–Ωm degeneracy: the amplitude at z = 0 fixes one combination, the rate of disappearance of massive clusters toward high z fixes another. Cluster surveys conventionally report the well-constrained combination S8 = σ8 √(Ωm/0.3). Major efforts — ROSAT and eROSITA in X-rays, the Planck, SPT and ACT Sunyaev-Zeldovich catalogs, and weak-lensing-calibrated optical samples from DES, KiDS and the Rubin Observatory's LSST — all use the same machinery: predict dn/dM(z) from a Tinker-class mass function, fold in a mass-observable relation, and fit σ8, Ωm and (increasingly) the dark-energy equation of state and tests of general relativity.

Regimes and variants

Form / regime	Year	Key idea	Accuracy vs N-body
Press-Schechter	1974	Spherical collapse, Gaussian peaks, ad-hoc ×2	~50% (shape only)
Extended PS / excursion set	1991	Random-walk first-crossing; fixes cloud-in-cloud	~30–50%
Sheth-Tormen	1999	Ellipsoidal collapse, moving barrier	~10–20%
Jenkins et al.	2001	Empirical fit to f(σ), FOF halos	~10–15%
Warren et al.	2006	Finite-volume / sampling corrections	~5–10%
Tinker et al.	2008	SO-mass, Δ-parameterized, simulation-calibrated	~5% (few % in core range)
Despali / emulator forms	2016+	Universality tests; ML emulators incl. baryons	~1–3%

Two regimes deserve a separate word. At the very low-mass end the mass function is steep and abundant but observationally hard to probe directly with dark matter alone; it connects to the missing-satellites and reionization questions. At the very high-mass, high-redshift end, the exponential cutoff is so steep that the existence of even one unexpectedly massive cluster early in cosmic time is a potential tension with ΛCDM — the kind of result that periodically makes headlines and then is usually resolved by improved mass estimates.

Common pitfalls and misconceptions

Forgetting that "halo mass" is not unique. The mass function depends on the halo definition — M200c, M200m, M500c (spherical overdensity), or a friends-of-friends linking length. The same universe gives different dn/dM for different definitions. A prediction and a measurement must use the same definition, or the inferred σ8 and Ωm will be biased. Tinker's Δ-parameterization exists precisely to manage this.
Confusing the exponential with the power law. The mass function is not a single power law. It is a power law at low mass and an exponential cutoff at high mass, with the transition at the nonlinear mass M*(z). Treating it as a pure power law badly overpredicts clusters.
Ignoring baryonic effects. The classic mass functions are calibrated on dark-matter-only simulations. Gas cooling, AGN feedback and the mass in baryons shift halo masses at the ~1–3% level near cluster scales — non-negligible for percent-level cosmology, and the reason modern emulators include hydrodynamic corrections.
Treating "universality" as exact. The hope that f(σ) is a single function independent of cosmology and redshift is approximately true but breaks at the few-percent level. Demanding sub-percent accuracy requires cosmology-dependent, redshift-dependent calibration.
Mistaking the mass-observable relation for a detail. You never observe halo mass directly; you observe X-ray luminosity, SZ decrement or lensing shear and infer mass. On the steep exponential tail, even a small scatter or bias in the mass-observable relation (e.g. Eddington bias scattering abundant low-mass halos up) translates into large errors in the inferred abundance. Mass calibration is now the dominant systematic in cluster cosmology.

Observational status and applications

Cluster-count cosmology. The flagship application. SZ catalogs (Planck, SPT, ACT) and X-ray catalogs (eROSITA's first all-sky survey delivering 10⁴–10⁵ clusters) fit the mass function to measure σ8, Ωm and S8. Results have historically sat slightly low in S8 relative to the Planck CMB primary value — part of the broader "S8 tension" debate, now thought to be driven largely by mass-calibration systematics rather than new physics.
Dark energy and growth. Because the high-mass abundance evolves so strongly with redshift, cluster counts measure the growth rate of structure and constrain the dark-energy equation of state w, complementing supernovae and baryon acoustic oscillations.
Tests of gravity and neutrino mass. Massive neutrinos suppress small-scale power and thus σ(M), lowering the high-mass abundance; the mass function is one of the cleaner cosmological probes of the summed neutrino mass Σmν. Modified-gravity models predict altered collapse thresholds and growth, directly imprinted on the cutoff.
Galaxy-halo connection. Halo occupation distribution and abundance-matching models stack the galaxy luminosity function onto the halo mass function to link galaxies with their host halos — a backbone of galaxy-formation and large-scale-structure modeling.
Surveys ahead. Rubin/LSST, Euclid and the Roman Space Telescope will deliver weak-lensing-calibrated cluster samples of unprecedented size and redshift reach, pushing the mass function as a cosmological probe toward sub-percent demands.

Quantitative analysis: building dn/dM from the variance

To turn the schematic form into a usable prediction you need the mass variance σ(M), which comes from the linear matter power spectrum P(k) filtered on the relevant scale. With a real-space top-hat window W of comoving radius R (where M = (4/3)π ρ̄m R³),

σ²(R) = (1 / 2π²) ∫₀^∞ P(k) Ŵ²(kR) k² dk,    Ŵ(x) = 3(sin x − x cos x)/x³

By definition σ8 = σ(R = 8 h⁻¹ Mpc), so σ8 sets the overall normalization of P(k) and hence of σ(M) on every scale. The redshift dependence is carried entirely by the linear growth factor: σ(M, z) = σ(M, 0) · D(z)/D(0), with D(z) decreasing toward high z. The full Sheth-Tormen number density is then

dn/dlnM = (ρ̄m / M) · ν f(ν) · |dlnσ⁻¹/dlnM|
ν f(ν) = A √(2a/π) [1 + (aν²)^(−p)] ν exp(−aν²/2),  ν = δc/σ(M,z)

The chain of dependence is worth tracing. Cosmology fixes P(k); P(k) and σ8 fix σ(M); σ(M) and δc fix ν; ν drives the exponential exp(−aν²/2) that is the cutoff; and ρ̄m ∝ Ωm sets the prefactor normalization. Every cosmological parameter you want to measure leaves a fingerprint on the abundance of massive halos, and the exponential makes those fingerprints loud. That is the whole reason a count of clusters — a few hundred rare objects — can compete with measurements built from millions of galaxies: the rare tail is where small changes in the universe produce large, observable changes in the numbers.

Frequently asked questions

What is the halo mass function?

The halo mass function n(M, z), often written dn/dM or dn/dlnM, is the comoving number density of gravitationally bound dark-matter halos per unit mass interval, at a given mass M and redshift z. Comoving means counted in a volume that expands with the universe, so the density is not diluted by cosmic expansion. The shape is a power law rising toward low mass and a steep exponential decline at the high-mass end — in a representative cubic megaparsec there are many dwarf halos but only a tiny fraction of a cluster.

Why is there an exponential cutoff at high mass?

Because halos form from rare high peaks in a nearly Gaussian density field. A region collapses into a halo of mass M once its linearly-extrapolated overdensity reaches δc ≈ 1.686. The typical fluctuation amplitude on scale M is σ(M), which decreases as M increases, so forming a very massive halo requires a fluctuation many σ above the mean. In a Gaussian the probability of that falls as exp(−δc²/2σ²(M)), producing the steep cutoff. The most massive clusters sit far out on this exponential tail, which is why they are rare.

What is the Press-Schechter form?

Press & Schechter (1974) derived dn/dM = √(2/π) (ρ̄m/M)(δc/σ²)|dσ/dM| exp(−δc²/2σ²). The exponential is the Gaussian rare-peak suppression; the prefactor counts regions. The derivation assumed spherical collapse with δc = 1.686 and required an ad-hoc factor of two for the cloud-in-cloud problem, later justified by the excursion-set formalism of Bond, Cole, Efstathiou & Kaiser (1991). It gets the shape right but underpredicts massive halos and overpredicts low-mass ones versus simulations.

How does the Sheth-Tormen form improve on Press-Schechter?

Sheth & Tormen (1999) replaced spherical collapse with ellipsoidal collapse, in which the collapse threshold depends on a region's tidal environment and shape. This boosts massive-halo abundance and lowers it near the knee. The multiplicity function is f(ν) = A√(2a/π)[1 + (aν²)⁻ᵖ] ν exp(−aν²/2) with ν = δc/σ and a ≈ 0.707, p ≈ 0.3, A ≈ 0.322. It matches N-body simulations to 10–20%. The Tinker et al. (2008) form, calibrated on large simulations and tied to a spherical-overdensity definition, reaches the few-percent level.

How does the halo mass function constrain σ8 and Ωm?

The exponential cutoff depends almost entirely on σ8, the amplitude of linear fluctuations on 8 h⁻¹ Mpc scales, because σ(M) at cluster scales scales with σ8. A 10% change in σ8 can change the number of the most massive clusters by a factor of several. Ωm enters through ρ̄m in the prefactor and through the growth rate, which sets how the cutoff mass M*(z) evolves. Counting clusters in mass and redshift bins breaks the σ8–Ωm degeneracy; surveys report the combination S8 = σ8 √(Ωm/0.3).

What halo mass definition does the mass function assume?

There is no single universal "halo mass," and the mass function depends on the choice. Spherical-overdensity masses (M200c, M200m, M500c) use the mass within a radius where the mean enclosed density is a set multiple of the critical or background density; friends-of-friends masses link nearby particles. M500c suits X-ray and SZ clusters; M200m is common in theory. The Tinker form is parameterized by the overdensity Δ. A prediction and a measurement must use the same definition or the inferred σ8 and Ωm will be biased.

How does the mass function evolve with redshift?

In hierarchical CDM, structure builds over time. Earlier, fluctuations have grown less, so σ(M, z) = σ(M, 0) × D(z) is smaller and the nonlinear mass M*(z) is lower. Massive halos are therefore exponentially rarer at high redshift: the comoving density of 10¹⁵-solar-mass clusters drops by orders of magnitude from z = 0 to z = 1, while 10¹²-solar-mass galaxy halos are common by z = 2. This strong, mass-dependent evolution turns a multi-redshift cluster survey into a measurement of the growth rate of structure and a test of dark energy.