Phonetics
Formants
The acoustic resonances that distinguish vowels — F1, F2, F3 and the shape of the vocal tract
Formants are concentrations of acoustic energy at particular frequencies in the speech signal, produced by resonances of the vocal tract. The first three formants (F1, F2, F3) carry most of the information needed to identify vowels. F1 correlates inversely with tongue height — high vowels [i, u] have low F1, low vowels [a] have high F1. F2 correlates with tongue backness — front vowels [i] have high F2, back vowels [u] have low F2. F3 contributes to rounding and rhotic quality. Gunnar Fant's 1960 Acoustic Theory of Speech Production formalized the source-filter model: vocal cords produce a buzz; vocal tract filters it into formants. Visible on spectrograms as dark horizontal bands.
- Source-filter theoryGunnar Fant (1960) — Acoustic Theory of Speech Production
- F1Inversely correlated with vowel height (low F1 = high vowel)
- F2Correlated with vowel frontness (high F2 = front vowel)
- F3Lip rounding, rhoticity (English /r/)
- Spectrogram visibilityDark horizontal bands at formant frequencies
- Typical adult male F1250-800 Hz; F2 600-2300 Hz
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
Why formants matter
- Vowel identification. F1-F2 patterns are the primary perceptual cues distinguishing vowels.
- Speech synthesis. Formant synthesizers (Klatt 1980) generate intelligible speech by directly modeling formants.
- Speech recognition. ASR features (MFCCs) summarize spectral envelope including formants.
- Forensic phonetics. Speaker identification uses formant patterns as biometric features.
- Sociolinguistics. Vowel shifts (Labov, Northern Cities Shift) are tracked by formant changes over time.
- Clinical phonetics. Speech disorders (dysarthria, hearing-impaired speech) show distinctive formant patterns.
- Language documentation. Acoustic vowel descriptions for under-documented languages depend on formant measurement.
Common misconceptions
- Formants are pitches. Pitch is fundamental frequency (F0); formants are resonances independent of F0.
- Vowels have universal formant values. Frequencies vary by speaker; phonemic identity is relative position in F1-F2 space.
- Only vowels have formants. Sonorant consonants (nasals, liquids) have formants too; obstruents have spectral noise instead.
- F1, F2 are the only relevant ones. F3 distinguishes [r] from [l]; F4 and above contribute to voice quality.
- Spectrograms are pictures of speech. They are mathematical transforms; reading them requires training.
- Higher formants mean higher tongue. The relationship is between F1 and height, but F2 reflects backness — higher F2 means more front, not higher.
Frequently asked questions
How does the source-filter model work?
Speech production has two components. The source — usually voiced vocal-fold vibration producing a buzz with harmonics at multiples of the fundamental frequency (F0). The filter — the vocal tract (pharynx, mouth, lips) shaping that source through resonance. The filter's resonant frequencies are formants. Fant (1960) modeled this mathematically. The model predicts vowel formant frequencies from vocal-tract length and shape, and is foundational to acoustic phonetics.
Why does F1 track vowel height?
F1 reflects the size of the back cavity (pharynx) of the vocal tract. Lowering the tongue (as in [a]) constricts the pharynx and enlarges the front cavity, raising F1 to ~700-900 Hz. Raising the tongue (as in [i] or [u]) widens the pharynx and lowers F1 to ~250-350 Hz. The inverse correlation with height is consistent across speakers and languages, after normalization.
What does F2 tell us?
F2 reflects the location of the tongue's main constriction along the front-back axis. Front vowels [i, e] push the tongue forward, shortening the front cavity and raising F2 toward 2200-2500 Hz. Back vowels [u, o] retract the tongue, lengthening the front cavity and dropping F2 to 700-1000 Hz. F1 vs. F2 plots are the standard way phoneticians display vowel spaces.
How do F1-F2 plots represent vowels?
Phoneticians plot F2 on the horizontal axis (decreasing left to right) and F1 on the vertical axis (decreasing top to bottom). The result roughly mirrors the IPA vowel chart — a quadrilateral with [i] top-left, [u] top-right, [a] bottom-center. Each speaker's plot differs in absolute frequencies but shows the same topology after normalization. This visualization revolutionized acoustic phonetics in the 1950s.
How are formants measured?
From a spectrogram showing energy as a function of frequency and time, the dark horizontal bands are formants. Linear Predictive Coding (LPC) algorithms extract formant frequencies automatically. Praat (Boersma and Weenink) is the standard free phonetics tool. Manual correction is often needed — automatic tracking fails near silences, fricatives, or rapid transitions.
Why are children's formants higher?
Vocal tract length determines formant frequencies — shorter tract, higher formants. Adult males have ~17 cm vocal tracts, females ~15 cm, children 10-13 cm. Children's formants are 30-50% higher than adult males'. This poses a problem for speech recognition: the same vowel has different absolute frequencies across speakers. Vocal tract length normalization handles this in ASR.
What about consonant formants?
Consonants influence formants in adjacent vowels through transitions. Locus theory (Delattre, Liberman, Cooper 1955) holds that each place of articulation has a characteristic F2 starting point. Bilabial /b/ pulls F2 toward ~700 Hz; alveolar /d/ toward ~1700 Hz; velar /g/ toward ~3000 Hz. These transitions are the primary perceptual cue to consonant place — vowels carry the consonant's signature.