Observation

Photometric Redshift

Estimate a galaxy's distance from its brightness in a handful of colored filters — coarse compared to a spectrum, but fast enough to redshift a billion galaxies at once

A photometric redshift estimates a galaxy's distance from its brightness in a handful of broadband filters instead of a full spectrum. The 4000-angstrom break shifts to redder filters with distance, so colors alone encode redshift — fast enough to map billions of galaxies with σ ≈ 0.02–0.05 (1+z) precision.

  • MethodBroadband colors → redshift
  • Key feature4000 Å break, Lyman break
  • Typical precisionσ ≈ 0.02–0.05 (1+z)
  • Outlier rate~1–10 % Δz > 0.15
  • Scale~10¹⁰ galaxies (LSST)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The idea: colors are a coarse spectrum

A spectroscopic redshift is conceptually simple: spread a galaxy's light into hundreds or thousands of wavelength bins, find a known emission or absorption line — the calcium H and K doublet, the [O II] 3727 Å line, hydrogen-alpha — and measure how far it has shifted from its rest wavelength. The redshift is z = (λ_obs − λ_rest) / λ_rest, and it is good to four or five significant figures. The catch is photons: to resolve a line you have to collect enough light to fill many narrow bins, and that takes minutes of telescope time per object even for a bright galaxy.

A photometric redshift throws away almost all of that wavelength resolution and keeps only the throughput. Instead of thousands of bins you measure the galaxy's brightness through perhaps five or six broad filters — the ugrizy set spanning the ultraviolet to the near-infrared — capturing the whole spectrum in a single deep exposure of the entire field. Each filter integrates the flux over a band hundreds of angstroms wide. You no longer see individual lines, but you do see the gross shape of the spectrum, encoded in the ratios of flux between filters. Those ratios are the galaxy's colors, and it turns out the colors alone carry enough information to pin the redshift to a few percent.

The reason is that real galaxy spectra are not featureless. They have a small number of broad, powerful features — chiefly the 4000-angstrom break and the Lyman break — that survive being blurred into broadband colors. As a galaxy recedes, those features slide to longer wavelengths in lockstep with (1+z), sweeping past the edges of the filters and changing the colors in a predictable, monotonic way. Read the colors, invert the relation, and you have the redshift. This is the trade at the heart of every modern imaging survey: precision sacrificed for the ability to redshift every galaxy in the sky.

The 4000-angstrom break: a redshift clock

The workhorse feature for galaxies at z < 1.5 is the 4000-Å break, a step-down in flux just blueward of rest-frame 4000 Å. It arises because the atmospheres of cool stars are crowded with ionized-metal absorption lines — most importantly the Ca II H and K lines at 3968 and 3934 Å — that blanket the spectrum below 4000 Å, while the Balmer break of hotter stars adds to the discontinuity. The strength of the break, often measured by the D4000 index (the ratio of flux just redward to just blueward of the break), tracks stellar age: an old, red, quiescent elliptical has a strong break (D4000 ≈ 2), while a young, blue, star-forming galaxy has a weak one (D4000 ≈ 1.2).

Because the break is intrinsically sharp but broad enough to register in broadband photometry, its redshifted position acts as a clock. At z = 0 the break sits at 4000 Å, near the boundary between the u and g filters. By z = 0.5 it has moved to 6000 Å, between g and r. By z = 1 it reaches 8000 Å, between r and i:

λ_observed = 4000 Å × (1 + z)

z = 0.0   →  4000 Å   (u/g edge)
z = 0.5   →  6000 Å   (g/r edge)
z = 1.0   →  8000 Å   (r/i edge)
z = 1.5   → 10000 Å   (i/z edge)

So the photo-z problem reduces, in its simplest form, to "between which two filters does the galaxy's color suddenly change?" The galaxy is bright redward of the break and faint blueward of it; the filter pair that brackets the flux drop tells you where 4000 Å has landed, and dividing by 4000 gives (1+z). Real estimators do this more carefully — fitting the whole color vector, not just one break — but this is the physical core.

The Lyman break: photo-z for the early universe

Beyond z ≈ 3, the 4000-Å break has redshifted into the infrared where ground-based imaging is hard, and a different feature takes over: the Lyman break at rest-frame 912 Å. Neutral hydrogen — both within the galaxy and in the intergalactic medium along the line of sight — absorbs essentially all photons shortward of the Lyman limit. The result is a galaxy that is invisible in any filter sampling below the redshifted 912 Å and abruptly visible above it.

This is the basis of the Lyman-break technique (the "dropout" method) pioneered by Charles Steidel in the 1990s: a galaxy at z ≈ 3 "drops out" of the u-band but appears in g and redder. At z ≈ 4 it drops out of g; at z ≈ 6 out of i. JWST has pushed dropout selection past z = 10, identifying galaxies whose Lyman break sits beyond 1 micron. The Lyman break gives much cleaner photo-z's than the 4000-Å break because the flux contrast across it is total — but it also introduces the most dangerous degeneracy, discussed below.

Two ways to invert colors into redshift

Given a galaxy's measured colors, there are two established strategies to turn them into a redshift.

Template fitting. Take a library of galaxy spectral energy distributions (SEDs) — observed or synthetic spectra spanning ellipticals, spirals, starbursts. Redshift each template across a fine grid of z, convolve it with the survey's filter curves to predict the fluxes, and find the (template, redshift) combination that best reproduces the observed fluxes by minimizing a χ²:

χ²(z, T) = Σ_filters  [ F_obs,i − a · F_template,i(z, T) ]² / σ_i²

Marginalizing over template and the normalization a yields a full posterior P(z) — not just a single number but a probability distribution, which is what cosmology actually needs. Codes like BPZ (Bayesian Photometric Redshift), EAZY, and LePhare implement this. The advantage is that it requires no training data and extrapolates naturally to the faintest galaxies; the weakness is that it is only as good as the templates and the photometric calibration.

Machine learning. Alternatively, take a subset of galaxies that already have spectroscopic redshifts, treat their colors as inputs and their spectroscopic z as the label, and train a regressor — a random forest, a neural network, a Gaussian process, or a self-organizing map. The trained model then predicts photo-z for the far larger photometric sample. Where the training set is dense this is more accurate than template fitting and needs no SED library. Its fatal flaw is extrapolation: it cannot reliably predict redshifts for galaxies fainter, redder, or more distant than anything in its training set — precisely the regime where the biggest surveys live and where spectroscopic training data are sparsest.

Spectroscopic vs photometric redshift

PropertySpectroscopic redshiftPhotometric redshift
What is measuredPosition of resolved linesFluxes in ~5–9 broad filters
Precision (σ_z)0.0001 – 0.0010.02 – 0.05 × (1+z)
Distance error at z = 1≲ 1 Mpc~100–300 Mpc
Outlier rate< 0.1 %~1–10 % (Δz > 0.15)
Time per objectMinutes of slit/fiber timeFree — one image redshifts all
Throughput (modern)~10⁷ redshifts (DESI, years)~10⁹–10¹⁰ (LSST, Euclid)
Faint-end reachLimited by S/N per lineReaches the imaging depth limit
OutputSingle sharp valueFull posterior P(z)

The two are complementary, not competing. Spectroscopy provides the ground truth that calibrates and trains photometric estimators; photometry provides the sheer numbers that statistical cosmology demands. A weak-lensing survey, for example, does not care that any single galaxy's redshift is uncertain by ±0.05 — it cares that the mean redshift of a tomographic bin of ten million galaxies is known to ±0.002, which a well-calibrated photo-z pipeline can deliver.

Real surveys and their numbers

Photometric redshifts are not a niche technique; they are the backbone of survey cosmology.

SurveyFiltersGalaxies with photo-zTypical σ/(1+z)
SDSSu g r i z~2 × 10⁸~0.03
DES (Dark Energy Survey)g r i z Y~3 × 10⁸~0.03
KiDSu g r i (+ VIKING NIR)~10⁸~0.02–0.04
COSMOS (30+ bands)UV → mid-IR~2 × 10⁶~0.007
Rubin LSST (planned)u g r i z y~2 × 10¹⁰~0.02–0.05
Euclid (planned)VIS + Y J H> 10⁹ (lensing)< 0.05

The COSMOS field is the instructive extreme: with more than thirty photometric bands — effectively a very low-resolution spectrum — its photo-z scatter drops to σ ≈ 0.007 (1+z), approaching the precision of a spectrum. This illustrates the general rule that photo-z accuracy is set by how finely you sample the spectrum. Five broad optical filters give σ ≈ 0.03–0.05; adding the u-band and near-infrared JHK bands roughly halves the scatter and slashes the outlier rate, because they bracket both the 4000-Å break and the Lyman break from both sides.

The degeneracy that wrecks photo-z

The single most important pitfall is the break degeneracy. A broadband filter set sees a "step" in the spectrum between two filters, but it cannot, by itself, tell which step it is. A low-redshift galaxy at z ≈ 0.2 with a 4000-Å break can produce nearly identical optical colors to a high-redshift galaxy at z ≈ 3 whose Lyman break (912 Å) has redshifted to the same observed wavelength. With only optical data, the χ² fit finds two nearly equal minima — a bimodal P(z) — and choosing the wrong one produces a catastrophic outlier with Δz ≈ 2–3.

The cure is wavelength leverage. Adding a u-band detection rules out the Lyman-break interpretation for a low-z galaxy (which should still be visible in u), and near-infrared photometry samples the spectrum redward of the 4000-Å break for genuinely high-z galaxies. This is exactly why Euclid pairs its optical imaging with YJH infrared bands and why LSST's u-band, though shallow, is disproportionately valuable for photo-z. A few percent of objects will still scatter catastrophically — from blends, photometric noise, or unmodeled emission lines boosting a single band — and a major part of any photo-z pipeline is estimating and propagating that outlier fraction into the cosmological error budget.

Worked example: bracketing a redshift from two filters

Suppose you image an elliptical galaxy and find it is faint in g (centered ~4750 Å) and abruptly bright in r (centered ~6250 Å). The flux jumps between these two bands, so the 4000-Å break must lie between them — somewhere around the g/r boundary near 5500 Å. Then:

1 + z = λ_observed / λ_rest = 5500 Å / 4000 Å = 1.375
z ≈ 0.38

To check whether this is sensible, convert to a distance. With H₀ = 70 km/s/Mpc, a redshift of 0.38 corresponds to a recession velocity of roughly cz ≈ 1.1 × 10⁵ km/s (the low-z Hubble approximation), and dividing by H₀ gives a comoving distance of order 1.6 thousand megaparsecs — about 5 billion light-years (a full ΛCDM calculation gives a slightly smaller ≈1.5 Gpc). Now propagate the uncertainty: because the break is only localized to within the ~1500-Å-wide gap between the g and r filter centers (i.e. ±750 Å about the midpoint), the redshift is constrained to only roughly z = 0.38 ± 0.19 from these two bands alone — a distance uncertainty of order a gigaparsec. Adding the i and z bands to confirm the galaxy stays bright redward, and the u band to confirm it stays faint blueward, tightens the estimate and rules out the Lyman-break alias at z ≈ 3. This is the entire photo-z workflow in miniature: locate the break, divide by the rest wavelength, and use every available band to suppress the degeneracies.

Where photometric redshifts power discovery

  • Weak-lensing cosmology. Surveys like DES, KiDS, and HSC measure the subtle distortion of billions of galaxy shapes by foreground dark matter. The signal depends on the distances of source and lens, so each galaxy must be placed in a redshift bin — only photo-z can supply distances for that many galaxies. Photo-z calibration is now the leading systematic in lensing-derived values of the S₈ parameter.
  • Baryon acoustic oscillations. The ~150 Mpc standard ruler imprinted in galaxy clustering can be measured photometrically by binning galaxies in photo-z shells, trading some radial resolution for vastly more sky coverage.
  • Galaxy cluster finding. Algorithms like redMaPPer identify clusters as overdensities of red-sequence galaxies sharing a common photo-z, then use the tight color–redshift relation of those red galaxies to estimate cluster distances to σ ≈ 0.01.
  • The high-redshift frontier. Lyman-break dropout selection from deep imaging built the first samples of z > 6 galaxies; JWST photo-z's now flag candidates beyond z = 10 for spectroscopic confirmation, having reshaped expectations about how early galaxies assembled.
  • Transient triage. When a gravitational-wave event or a kilonova alert arrives, photo-z catalogs of galaxies inside the localization volume let observers prioritize which host galaxies to follow up in real time.

Common misconceptions and edge cases

  • "A photo-z is just a worse spectroscopic redshift." Better to think of it as a probability distribution P(z), often broad or bimodal. Cosmological analyses use the full P(z), not a single point estimate; collapsing it to one number throws away the information that prevents biased results.
  • "More filters always help equally." What matters is wavelength coverage, not filter count. Adding a sixth optical band near existing ones helps little; adding a single near-infrared or u-band band that brackets a break helps enormously by breaking the Lyman/4000-Å degeneracy.
  • "Machine-learning photo-z is strictly better than templates." Only inside the training distribution. Faint, distant survey galaxies often have no spectroscopic analogues, so ML methods silently extrapolate and bias the result, while template methods degrade more gracefully.
  • "Photo-z gives a velocity, so it gives a peculiar velocity too." No. Photo-z scatter (hundreds of Mpc) dwarfs peculiar-velocity signals (hundreds of km/s), so photometric redshifts carry essentially no usable peculiar-velocity information — that requires spectroscopy.
  • "Emission lines don't matter for broadband photometry." A strong line such as Hα or [O III] can fall inside one filter and boost its flux by tens of percent, shifting a color and skewing the fit. Surveys with medium bands or careful templates must model this; ignoring it produces a characteristic ridge of outliers.

Frequently asked questions

How can color alone tell you a galaxy's distance?

A galaxy's spectrum is not smooth — it has strong features, most importantly the 4000-angstrom break, a sudden drop in flux blueward of 4000 Å caused by the pile-up of metal absorption lines in stellar atmospheres. As the galaxy recedes, every wavelength is stretched by a factor (1+z), so that break slides to longer (redder) wavelengths. A set of broadband filters samples the spectrum coarsely, and the changing position of the break changes the ratios of fluxes between filters — the galaxy's colors. Measuring those colors and asking "at what redshift does a galaxy template reproduce them?" yields a photometric redshift. No individual spectral line is resolved; the redshift is inferred from the overall shape.

How accurate is a photometric redshift compared with a spectroscopic one?

A spectroscopic redshift from resolved emission or absorption lines is good to σ_z ≈ 0.0001–0.001, essentially exact for cosmology. A photometric redshift is far coarser: typical scatter is σ ≈ 0.02–0.05 × (1+z), meaning a galaxy at true z = 1 has an uncertainty of order ±0.04–0.10 in redshift, equivalent to hundreds of megaparsecs in distance. There is also a "catastrophic outlier" tail — a few percent of galaxies whose photo-z is wrong by Δz > 0.15 because a Lyman break has been mistaken for a 4000-Å break, or vice versa. The trade is precision for throughput.

What is the difference between template fitting and machine-learning photo-z?

Template fitting (BPZ, EAZY, LePhare) compares a galaxy's observed colors to a library of model galaxy spectra redshifted across a grid, returning the redshift (and a full probability distribution P(z)) that best reproduces the fluxes. It needs no training data and extrapolates to faint galaxies, but it depends on having templates that match real galaxies and on accurate filter calibration. Machine learning (random forests, neural networks, Gaussian processes) instead learns the color-to-redshift mapping directly from a spectroscopic training set. It is typically more accurate where training data are dense, but it fails to extrapolate beyond the training distribution — exactly where the faintest, most distant survey galaxies live.

Why do surveys need photometric redshifts at all if spectra are more accurate?

Throughput. A spectrograph must collect enough photons to resolve lines, so even a massively multiplexed instrument like DESI measures a few thousand spectra per pointing and a few tens of millions of redshifts over years. A photometric survey images millions of galaxies per night and assigns a photo-z to every one. The Vera C. Rubin Observatory's LSST will catalogue roughly 20 billion galaxies; Euclid targets over a billion for weak lensing. There is no instrument on Earth that could take that many spectra. For statistical cosmology — weak-lensing tomography, baryon acoustic oscillations, cluster counts — a coarse redshift for every galaxy beats an exact redshift for a tiny subset.

What is the 4000-angstrom break and why does it matter for photo-z?

The 4000-Å break (often quantified as the D4000 index) is a discontinuity in a galaxy's spectrum at rest-frame 4000 Å, produced by the accumulation of ionized-metal absorption lines (especially Ca II H and K) in the atmospheres of cool stars, plus the Balmer break of hot stars. Old, red, quiescent galaxies have a strong break; young, blue, star-forming galaxies have a weak one. It is the single most informative feature for photo-z of galaxies at z < 1.5 because it is broad enough to be detected by broadband filters and its redshifted position is a clean clock: locate the break between two adjacent filters and you have bracketed the redshift.

What causes catastrophic photo-z outliers?

The most common failure is color–redshift degeneracy: a low-redshift galaxy with a 4000-Å break can mimic the colors of a high-redshift galaxy with a Lyman break (912 Å), because both produce a step in flux between two filters. With only optical filters, the fit cannot tell which step it is seeing, and the redshift can be wrong by Δz ≈ 2–3. Adding near-infrared or u-band photometry breaks the degeneracy by sampling the spectrum on both sides of the ambiguous feature. Photometric noise, blending of overlapping galaxies, and template mismatch contribute the rest of the outlier tail.