Phonology

Intonation

The pitch melody that turns "you're going" into a statement, a question, or a challenge

Intonation is the linguistically meaningful variation in pitch across an utterance. The same five words — "you're going to the store" — can be a flat statement, a rising yes/no question, an incredulous exclamation, or a contrastively focused correction, depending entirely on the pitch contour. English signals yes/no questions with a final rising boundary tone (H%), declaratives with a fall (L%), and continuation with a slight rise. Janet Pierrehumbert's 1980 autosegmental-metrical framework — formalized as ToBI annotation by Beckman, Hirschberg, and Shattuck-Hufnagel in the 1990s — decomposes any English contour into pitch accents on stressed syllables and boundary tones at phrase edges. Cross-linguistically, intonation contrasts with lexical tone (Mandarin) and pitch accent (Japanese), but all three systems use the same fundamental machinery: vocal-fold vibration rate as a meaning-bearing signal.

  • DefinitionPost-lexical pitch contour conveying utterance-level meaning
  • English question rise~30-50% F0 increase on final stressed syllable
  • Standard annotationToBI — pitch accents (H*, L*) and boundary tones (H%, L%)
  • Foundational sourcePierrehumbert 1980; Beckman and Pierrehumbert 1986
  • Declination rate~1-3 Hz/s downward drift across utterances
  • Cross-linguistic typesIntonation (English), tone (Mandarin), pitch-accent (Japanese, Swedish)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How intonation works

Pitch in speech is the listener's percept of fundamental frequency (F0) — how fast the vocal folds vibrate. A typical adult male voice averages 100-130 Hz; female 180-220 Hz; child 250-300 Hz. Intonation is the systematic variation around this baseline, contributing meaning above and beyond the segmental string.

The autosegmental-metrical (AM) model decomposes any pitch contour into two kinds of events:

  1. Pitch accents on stressed syllables. Tagged H* (high accent), L* (low accent), L+H* (rising accent peaked on the stressed syllable), L*+H (rising accent peaking after), H+!H* (downstepped from a previous high). These are the salient melodic peaks and valleys.
  2. Boundary tones at the edges of prosodic phrases. Tagged H% or L% at full intonational phrase boundaries, H- or L- at intermediate phrase boundaries. Boundary tones determine whether a phrase sounds finished (low) or continuing (high).

The simplest English declarative is H* L-L%: a single high accent on the stressed syllable, followed by a fall to a low boundary. "I'm leaving." has the F0 peak on "lea-" then falls. A yes/no question is L* H-H%: a low accent then a final rise. "Are you leaving?" stays low through "leaving" then rises.

Between the accents and boundary tones, F0 interpolates linearly — there are no further targets. This is the autosegmental claim: the tonal tier carries discrete events (H, L) anchored to the segmental tier, and the surface contour is a smooth interpolation. The model handles long sentences with many accents by chaining events along the timeline.

Why intonation matters

  • Sentence type. Same words, different intonation — different speech act. Statement vs question vs exclamation.
  • Focus. "I gave the BOOK to Mary" (not the pen) vs "I gave the book to MARY" (not John).
  • Discourse structure. Intonation marks topic-comment, given-new, turn-taking, hesitation.
  • Speech technology. Realistic TTS requires intonational modeling; ASR uses prosody for disfluency detection.
  • L2 learning. Adult learners' segments may nativize while intonation remains foreign — the source of "accent."
  • Clinical assessment. Aprosodia (right-hemisphere damage) and Parkinson's flatten intonation, impairing communication.
  • Sociolinguistics. Intonational patterns mark dialect, register, gender, age — uptalk being the most-discussed example.

Intonation vs tone vs pitch accent

FeatureIntonation language (English)Tone language (Mandarin)Pitch-accent language (Japanese)
Pitch carries lexical meaningNoYes (every syllable)Partial (one accent per word)
Minimal pairs by pitchNone segmentalmā / má / mǎ / mà (4 tones)háshi (chopsticks) vs hashí (bridge)
Number of pitch contrastsNone at word level; 2 boundary tones (H%, L%)4 tones + neutral in Mandarin; 6 in Cantonese2 (accent location, presence vs absence)
Sentence-level pitch usePrimary signalSuperimposed on lexical tonesSuperimposed on accent contours
Question markingFinal rise (H%)Question particle + slight pitch raiseQuestion particle (-ka) + slight rise
Frequency in world languages~40% (WALS)~40% (mostly tonal in Africa, E. Asia, Mesoamerica)~10% (rare; Japanese, Swedish, Norwegian, Lithuanian)
Acquisition orderIntonation precedes segmentsTones acquired by age 2-3Accent location acquired late (5-6)
Famous example contrast"You're going?" vs "You're going."mā 'mother' vs mǎ 'horse'háshi 'chopsticks' vs hashí 'bridge'

Pierrehumbert and Beckman (1988) argued that all three systems are realizations of the same machinery: H and L tones associated to syllables. The systems differ in which syllables can host tones (every syllable for tone languages, one per word for pitch-accent, only stressed syllables for intonation languages) and what semantic role those tones play (lexical, post-lexical, or both).

Cross-linguistic examples

English (intonation language). The yes/no question "Did you eat?" rises ~50% in F0 on "eat." The declarative "You ate." falls. WH-questions ("What did you eat?") fall — the WH-word handles the interrogative force, so intonation doesn't need to. Contrastive focus uses an L+H* (steep rise) on the focused word.

Mandarin (tone language with intonation). Each syllable has one of four lexical tones: high level (1), rising (2), low dipping (3), high falling (4). Yes/no questions are formed with the particle "ma" or with A-not-A construction, plus a slight overall pitch raise. The tones are preserved — speakers maintain the contour shape while shifting the pitch range upward. This is "intonation atop tone."

Japanese (pitch-accent). Each word has one accented syllable (or none). The accented syllable is high; the next is low. The minimal pair háshi ("chopsticks", accent on first syllable) vs hashí ("bridge", accent on second) vs unaccented hashi ("edge") demonstrates the system. Question marking adds a final rise via the particle "ka" and a slight intonational rise.

Swedish and Norwegian (pitch-accent). A two-way contrast — Accent 1 (single peak) vs Accent 2 (double peak). Stockholm Swedish anden with Accent 1 means "the duck"; with Accent 2, "the spirit." Norwegian bønder "farmers" (Accent 2) vs bønner "beans" (Accent 1).

Hungarian (intonation, but inverted from English). Yes/no questions in Hungarian are signaled by a falling-rising contour with a peak on the penultimate syllable, then a sharp drop on the final. The English-style final rise sounds wrong; Hungarian speakers learning English struggle with the question intonation pattern.

Worked examples — ToBI annotation

Declarative. "Marsha bought a beautiful BLUE car."

      H*                    L+H*    L-L%
Mar - sha bought a beau-ti-ful BLUE car.

The first H* marks the topic accent on "Marsha"; the L+H* (steep contrastive rise) on "BLUE" picks out blue from possible alternatives; the L-L% boundary signals utterance completion. Listeners infer this corrects an assumption that the car is some other color.

Yes/no question. "Did Marsha buy a blue car?"

L*                                          H-H%
Did Mar-sha buy a blue car?

The L* low accent on "Marsha" stays low through the rest of the utterance, then the H-H% boundary rises sharply on "car." The lack of internal accents (compared to the declarative's H* peak) is itself a cue — listeners parse the flat-then-rising contour as interrogative.

Continuation rise. "When Marsha left, the meeting ended."

      H*       H-      H*           L-L%
When Mar-sha left, the meet-ing en-ded.

The intermediate H- after "left" tells the listener the sentence isn't finished — keep parsing. The final L-L% closes the utterance.

Incredulity. "You ate the WHOLE cake?"

          L*+H                       L-H%
You ate the WHOLE cake?

The L*+H rise that peaks after the stressed syllable, plus an L-H% (low-then-rise) boundary, conveys disbelief. This contour has no good orthographic representation — it's purely prosodic. Pierrehumbert and Hirschberg (1990) catalogued the meaning of every English ToBI contour combination in their compositional theory of intonational meaning.

Variants and edge cases

  • Uptalk / HRT. High Rising Terminal — declaratives ending with a rising contour. Australian, New Zealand, Valley Girl American. Functionally turn-holding, not interrogative.
  • Calling contour. "JOOOH-nnnyyy!" — the high-then-mid stylized contour for calling someone over distance. Universal-ish; languages differ in exact intervals.
  • Vocative chant. Listing intonation. "We have apples, oranges, pears, and BANANAS." Each item has a slight rise, the final has a fall. The pattern is iconic: open-ended rises until final closure.
  • Echo questions. "She said WHAT?" — single steep rise on the WH-word, marking incredulity rather than information-seeking.
  • Tag questions with rise vs fall. "It's raining, isn't it?" with a final rise asks for confirmation; with a final fall, asserts agreement-seeking. The same syntax, two opposite speech acts via intonation.
  • Sarcasm. Intonation contour stretched and exaggerated, with breathiness — a style choice not captured by ToBI's discrete categories.

Common pitfalls

  • Treating intonation as gradient or paralinguistic. ToBI shows it is categorical and grammaticalized — distinct contours signal distinct meanings.
  • Assuming all languages use rising tunes for questions. Hungarian, Russian falling-tune questions falsify the universal claim.
  • Conflating pitch with stress. Stress involves duration, intensity, and vowel quality too; pitch accent is one acoustic correlate, not the whole picture.
  • Ignoring declination. Comparing absolute F0 values across an utterance without correcting for the natural downward drift gives misleading "low" tones at the end of long sentences.
  • Treating ToBI as describing the acoustic signal. ToBI labels phonological categories; the acoustic implementation varies by speaker, rate, and emotion.
  • Generalizing English uptalk findings. HRT in Australian and Valley Girl English have different distributions and meanings; collapsing them obscures the dialect-specific functions.

Frequently asked questions

Why does English raise pitch on yes/no questions?

English signals yes/no questions with a final rising boundary tone (H%): "You're going?" rises on "going" and ends high. Falling tones (L%) signal completed declaratives. The rise/fall distinction is grammaticalized — the same words with different tunes mean different things. Cross-linguistically, around 70% of languages use rising tunes for yes/no questions (Bolinger 1978), but the correlation is not universal. Hungarian and Russian use falling tunes for yes/no questions; their grammar marks interrogation through word order or particles instead. Embodied-cognition accounts (the "frequency code," Ohala 1984) link rising pitch to small-larynx submission and questioning postures, but the universality is contested.

What is ToBI annotation?

ToBI (Tones and Break Indices) is a transcription system for English intonation, formalized by Mary Beckman, Julia Hirschberg, and Stefanie Shattuck-Hufnagel in the early 1990s. It builds on Janet Pierrehumbert's 1980 autosegmental analysis. ToBI labels pitch accents (H*, L*, L+H*, L*+H, H+!H*) at stressed syllables, phrase accents (H-, L-) at intermediate phrase boundaries, and boundary tones (H%, L%) at intonational phrase boundaries. Break indices (0-4) mark the strength of prosodic boundaries. ToBI is the de facto standard for English intonation labeling in linguistics and speech technology — used in corpora like Switchboard and Boston University Radio News.

What is declination?

Declination is the gradual lowering of pitch across an utterance, even when no high-toned events occur. In English, the average F0 of a long sentence drops about 1-3 Hz per second from start to finish. The phenomenon is partly physiological (subglottal pressure decreases as lungs deplete) and partly grammaticalized — listeners normalize for declination when interpreting tones. Liberman and Pierrehumbert (1984) modeled declination as a downward-sloping baseline against which pitch accents are measured. In tone languages, lexical tones still distinguish from each other relative to the declining baseline. Reset (a return to high pitch) marks new prosodic phrases.

How does intonation differ from tone?

Tone is lexical — it distinguishes words. Mandarin /ma/ with high tone means "mother," with rising "hemp," with falling-rising "horse," with falling "scold." Intonation is post-lexical — it marks utterance type, focus, and discourse structure. English "cat" is /kæt/ regardless of pitch contour; intonation indicates whether "cat" is a statement, question, or contrastively stressed. The two systems can coexist — Mandarin has both lexical tone and superimposed intonational pitch range expansion for questions. Japanese is intermediate (pitch-accent), assigning a contour to one syllable per word but otherwise resembling intonation languages.

What is uptalk or HRT?

High Rising Terminal (HRT), colloquially "uptalk," is a rising contour at the end of declarative sentences — making statements sound like questions. Documented in Australian English, New Zealand English, Valley Girl American English, and increasingly in mainstream American and British speech. Functionally, HRT does not signal questioning; instead, it marks turn-holding, soliciting confirmation, or signaling the speaker is not finished. Cynthia McLemore's 1991 dissertation argued HRT is a discourse-level meta-comment, not an interrogative. Critics interpret HRT as conveying uncertainty or low status; sociolinguists argue this is a gendered and ageist misreading of a normal prosodic strategy.

How do children acquire intonation?

Infants are sensitive to pitch from birth. Mehler et al. (1988) showed newborns discriminate languages by prosody alone, before any segmental knowledge. Babbling around 6-9 months reflects native-language intonation contours — French babies produce rising contours, English babies falling. By 18 months, children use rising tunes for questions even before they have grammatical interrogatives. Pitch range expands gradually through age 4-5. Exceptional cases — autism spectrum, Williams syndrome — show altered intonational patterns, suggesting prosody recruits neural systems partly distinct from segmental phonology. Adult L2 learners often retain L1 intonation patterns even when segmental phonology nativizes.

Can intonation be lost in aphasia?

Yes — selectively. Right-hemisphere damage often causes "aprosodia" — flat or inappropriate intonation despite intact segmental speech. Left-hemisphere damage in classical Broca's or Wernicke's regions impairs grammar and word retrieval but typically spares emotional intonation. The dissociation suggests intonation has multiple components: linguistic (controlled by left perisylvian) and emotional/affective (controlled by right hemisphere). Clinical aprosodia disrupts question marking, focus, and emotional inflection — speakers describe it as "sounding monotone" to family. Therapy using imitation of pitch contours can partially restore prosody, especially for the emotional component.