Phonology

Prosody

The rhythm, stress, and timing that turns a string of sounds into the music of language

Prosody is the suprasegmental layer of speech — what survives if you abstract away from the individual consonants and vowels. It encompasses rhythm (how syllables space in time), stress (which syllables get prominence), intonation (the pitch melody), and the boundaries that group syllables into feet, words, and phrases. Languages classify into rhythm types: stress-timed English roughly equalizes intervals between stressed syllables; syllable-timed French gives each syllable nearly equal weight; mora-timed Japanese subdivides further. The prosodic hierarchy — mora ⊂ syllable ⊂ foot ⊂ prosodic word ⊂ phonological phrase ⊂ intonational phrase — was formalized by Selkirk, Nespor and Vogel, and Hayes in the 1980s and remains the backbone of suprasegmental analysis. Prosody is what makes a sentence parse-able, what makes "old men and women" ambiguous, and what infants use to bootstrap their first language before they know a single word.

DefinitionSuprasegmental organization — rhythm, stress, intonation, boundaries
Three rhythm typesStress-timed (English), syllable-timed (French), mora-timed (Japanese)
Prosodic hierarchyMora ⊂ syllable ⊂ foot ⊂ word ⊂ phrase ⊂ intonational phrase
Acoustic correlatesF0, duration, intensity, vowel quality
Foundational sourcesSelkirk 1980, 1986; Nespor and Vogel 1986; Hayes 1989
AcquisitionNewborns discriminate languages by prosody alone

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

What prosody is

Take any spoken utterance. Strip away the consonants and vowels — the segments — and what's left is prosody. The pitch rises and falls. Some syllables are louder, longer, higher than others. Pauses break the stream into chunks. The chunks group into larger chunks. Even with the segmental content removed, listeners can identify the language, the speech act, the speaker's emotional state, and often the syntactic structure of the sentence.

Prosody operates over four interrelated dimensions:

Pitch (F0). Pitch accents on stressed syllables, boundary tones at phrase edges, declination across an utterance. The intonation system.
Duration. Stressed syllables are typically 30-50% longer than unstressed. Final syllables of phrases lengthen ("phrase-final lengthening"). The rhythm system.
Intensity. Louder syllables tend to be perceived as more prominent, though intensity is the weakest cue to stress in English (Fry 1955).
Vowel quality. Stressed vowels are full; unstressed vowels in English reduce to schwa [ə]. Vowel reduction is itself a prosodic correlate.

These four dimensions don't act independently — stress in English is signaled jointly by all four, with pitch and duration carrying the most weight. Different languages weight the cues differently: Polish stress is mainly duration-cued; Russian uses vowel reduction more heavily than English; Mandarin sentence stress relies on duration since pitch is reserved for lexical tone.

Why prosody matters

Sentence parsing. Prosodic boundaries align with syntactic boundaries; listeners use them to disambiguate.
Lexical contrast. English permit (noun, stress on first syllable) vs permit (verb, stress on second).
Discourse function. Focus, given-new, turn-taking, hesitation — all prosodically marked.
Acquisition. Infants segment continuous speech using prosodic cues before they have any vocabulary.
L2 acquisition. The largest residual barrier for advanced learners — segments nativize, prosody often doesn't.
Speech technology. TTS naturalness depends on prosodic modeling; ASR uses prosody for segmentation.
Clinical applications. Aprosodia (right-hemisphere stroke), Parkinson's flat speech, autism spectrum atypicality.

Three rhythm types compared

Feature	Stress-timed (English)	Syllable-timed (French)	Mora-timed (Japanese)
Isochronous unit	Interstress interval (foot)	Syllable	Mora
Vowel reduction	Yes — unstressed → schwa	Minimal	None — vowels remain full
Syllable complexity	High — CCCVCC permitted	Moderate — typically CV(C)	Low — almost exclusively CV(N)
Stressed/unstressed contrast	Strong	Weak (final-syllable lengthening only)	Pitch-accent only; no stress reduction
Example utterance	"the BLACK cat SAT" (3 stresses, ~equal time between)	"le chat noir était assis" (each syllable similar length)	"kuro-i ne-ko-ga su-wa-ru" (each mora ~120 ms)
Other languages of this type	German, Dutch, Russian, Arabic	Spanish, Italian, Yoruba, Hindi	Hawaiian, Tamil, Slovak
Acoustic correlate of rhythm	%V (proportion of vocalic intervals) ~40%, ΔC (consonantal variability) high	%V ~50%, ΔC moderate	%V ~52%, ΔC low
L1 → L2 transfer effect	English→French: under-reducing French syllables	French→English: failing to reduce English unstressed	Japanese→English: mora-timed English (very stilted)

Strict isochrony has not been confirmed acoustically (Roach 1982; Dauer 1983) — measured interstress intervals in English vary by 30-50%. The classification persists because rhythmic differences are perceptually robust, even if not literally isochronous. Ramus, Nespor, and Mehler (1999) introduced quantitative metrics — %V (proportion of time spent in vowels) and ΔC (variability of consonantal interval durations) — that successfully cluster languages into the traditional three classes.

Cross-linguistic examples

English (stress-timed). "The BLACK cat SAT on the MAT." Three stresses, with the unstressed syllables "the," "on the" reduced to schwa-laden quick mumbles. Total duration is roughly proportional to the number of stressed syllables, not the total syllable count. Removing function words barely changes utterance duration; removing a stressed content word makes it much shorter.

French (syllable-timed). "Le chat noir était assis sur le tapis." Eleven syllables, each roughly 150-180 ms, no schwa reduction (in careful speech). The final syllable lengthens (~250 ms), but the rest are uniform. French speakers learning English under-reduce — pronouncing "the" as full [ðɛ] instead of [ðə], producing the characteristic French-accented English rhythm.

Japanese (mora-timed). Tokyo is to-u-kyo-u — four morae, each ~120 ms. The long vowel oo in "Tokyo" contributes two morae, not one. The geminate consonant in itta ("said") contributes a mora to the consonant. Haiku poetry counts morae, not syllables — the famous 5-7-5 structure means 5 morae, 7 morae, 5 morae. Furu ike ya ("ancient pond") is fu-ru-i-ke-ya, exactly 5 morae.

Mandarin (syllable-timed-ish, with tone). Mandarin syllables are roughly equal in duration, but the four lexical tones are superimposed on each. Tone 3 (low dipping) is acoustically longer than tones 1, 2, 4. Sentence-level prosody compresses each syllable's tone shape but preserves identity. Foreign learners hear "monotone" Mandarin precisely because they fail to track the lexical tones distinctly from sentence-level pitch range.

Spanish (syllable-timed). Like French, no significant vowel reduction. Spanish stress falls on the antepenultimate, penultimate, or final syllable (predictably from word shape and orthography). Bilingual Spanish-English children learn to switch rhythmic mode when code-switching, often producing Spanish in syllable-timed mode and English in stress-timed mode within the same utterance.

Worked examples — prosodic structure

Hierarchy of "Mary kissed John's brother yesterday."

Utterance:           [Mary kissed John's brother yesterday.]
Intonational phrase: [Mary kissed John's brother yesterday.]
Phonological phrase: (Mary)(kissed John's brother)(yesterday)
Prosodic word:       Mary  kissed John's  brother  yesterday
Foot:                (Ma.ry)(kis.sed)(John's)(bro.ther)(yes.ter.day)
Syllable:             Ma  ry  kissed  John's  bro  ther  yes  ter  day
Mora:                 Ma  ry  kis  sed  John's  bro  ther  yes  ter  day

Each level is built from units of the level below. Phonological rules apply at specific levels: English aspiration of voiceless stops applies foot-initially ([pʰ] in "Peter" but unaspirated in "supper" because the [p] is foot-medial). Final lengthening applies at the intonational phrase boundary — "yesterday" lengthens by 30-50% over its non-final pronunciation.

Prosodic disambiguation of "I saw the man with the telescope."

Reading 1 (man has telescope):
  [I saw [the man with the telescope]]
  Boundary after "saw"; "with the telescope" attached to "man"

Reading 2 (I used telescope):
  [I saw [the man] [with the telescope]]
  Boundary after "man"; "with the telescope" attached to "saw"

Listeners reliably distinguish these readings by prosodic cues alone (Lehiste 1973). Reading 1 has continuous prosody after "the man"; reading 2 has phrase-final lengthening on "man" and a pitch reset on "with."

Mora count in haiku. "Furu ike ya / kawazu tobikomu / mizu no oto" (Bashō). 5-7-5: fu-ru-i-ke-ya (5), ka-wa-zu-to-bi-ko-mu (7), mi-zu-no-o-to (5). English translations cannot preserve the count — "An old pond / a frog jumps in / the sound of water" has 5-5-7 syllables, mismatched. Trying to force 5-7-5 in English produces stilted lines because English rhythm is stress-timed, not mora-timed.

Variants and special cases

Mixed-rhythm languages. Catalan and European Portuguese have stress-timed-like vowel reduction but syllable-timed-like rhythmic structure — they sit between the categories. Brazilian Portuguese is more syllable-timed, European more stress-timed.
Tone-timed languages. Some scholars argue Mandarin is "tone-timed" — durational equality among tone-bearing units rather than syllables or morae. The classification is contested.
Whistled and signed prosody. Silbo Gomero (the whistled Spanish of La Gomera) preserves prosodic boundaries through pitch and timing alone. Sign languages have prosody implemented through facial expression, head movement, and signing speed (Sandler 1999).
Singing and speech. Singing imposes external rhythm on speech, but languages constrain singable rhythms — English songs accommodate stress-timing better than syllable-timing, while French chansons fit syllable-timed templates.
Pathological rhythm. Foreign Accent Syndrome can shift a speaker's rhythm class — a stroke patient may sound syllable-timed despite native English background.
Code-switching prosody. Bilinguals shift rhythm mid-utterance, sometimes within a single sentence. The shift is often perceptually salient; listeners use it as a code-switch cue.

Common pitfalls

Treating rhythm types as strict isochrony. Acoustic studies refuted literal equal-time intervals decades ago; rhythm is a tendency, not a metronome.
Confusing prosody with intonation. Intonation is one component (pitch); prosody is the broader category including rhythm, stress, duration, and timing.
Assuming prosody is "just emotion." Linguistic prosody is grammaticalized — it disambiguates structure, marks lexical contrasts, and obeys phonological rules.
Ignoring the prosodic hierarchy when stating phonological rules. Many rules apply at specific prosodic levels; stating them at the word level alone misses generalizations.
Conflating L1 transfer in segments with L1 transfer in prosody. The two are partially independent — speakers can have nativelike segments and foreign prosody, or vice versa.
Treating mora and syllable as interchangeable. Tokyo has two syllables but four morae; haiku counts morae, not syllables.

Frequently asked questions

What is the prosodic hierarchy?

The prosodic hierarchy is the layered structure phonologists use to describe suprasegmental organization. From smallest to largest: mora < syllable < foot < prosodic word < phonological phrase < intonational phrase < utterance. Each level dominates the levels below — a phonological phrase contains one or more prosodic words, each of which contains one or more feet, and so on. Selkirk (1980, 1986), Nespor and Vogel (1986), and Hayes (1989) formalized this hierarchy as the framework for prosodic phonology. Phonological rules can target specific levels (English aspiration applies at the foot level; French liaison at the phonological-phrase level). The hierarchy mediates between syntax and phonology.

What is stress timing vs syllable timing vs mora timing?

Pike (1945) and Abercrombie (1967) classified languages into three rhythm types. Stress-timed (English, German, Dutch, Russian, Arabic) targets isochronous intervals between stressed syllables, compressing or expanding unstressed syllables in between. Syllable-timed (French, Spanish, Italian, Yoruba) gives roughly equal duration to each syllable. Mora-timed (Japanese, Hawaiian, Tamil) gives equal duration to each mora — Tokyo is to-u-kyo-u, four morae of equal length. Strict isochrony has not been confirmed in acoustic measurements (Roach 1982; Dauer 1983). Modern accounts treat rhythm as a tendency, with stress timing reflecting vowel reduction and consonant cluster richness rather than literal equal timing.

How does prosody disambiguate sentences?

Prosodic boundaries align with major syntactic boundaries, helping listeners parse ambiguous strings. "Old men and women" has two readings: ((old men) and women) vs (old (men and women)). The first inserts a stronger boundary after "men"; the second after "old". Lehiste (1973) showed listeners reliably interpret which reading is intended from prosody alone. Cooper and Paccia-Cooper (1980) demonstrated similar effects across many ambiguity types. Prosody also distinguishes restrictive from non-restrictive relatives ("the boy who cried" vs "the boy, who cried"), and integrates with information structure to mark focus. The neural basis of prosodic parsing involves the right superior temporal gyrus.

Do all languages have prosody?

Yes — prosody is a linguistic universal in the sense that all languages have suprasegmental organization. What varies is the inventory of prosodic categories, the placement of boundaries, and the acoustic correlates. English uses fundamental frequency, duration, and intensity for stress; French primarily uses duration and final-syllable lengthening; Mandarin combines lexical tone with sentence intonation. Sign languages have prosody too — facial expression, head movement, and body shifts mark prosodic boundaries (Sandler 1999). Even whistled languages (Silbo Gomero) maintain prosodic distinctions. The universal applies despite enormous variation in implementation.

How do infants use prosody?

Prosody is the first phonological dimension infants attend to. Mehler et al. (1988) showed newborns discriminate French and Russian from prosodic information alone. By 6 months, infants prefer their native language's rhythm. Prosodic bootstrapping (Morgan and Demuth 1996) hypothesizes that prosodic boundaries help infants segment speech and parse syntactic structure before they have any vocabulary. Mothers worldwide use "motherese" — exaggerated prosody with higher pitch range, slower tempo, and clearer boundaries — that aids language acquisition. Children with autism spectrum show altered prosodic processing.

Why is L2 prosody harder than L2 segments?

Adult L2 learners often acquire native-like segments while retaining L1 prosody, producing the impression of a "foreign accent" even with phonemically perfect speech. The asymmetry has several causes. Prosody is encoded earlier in life (newborns, before any segmental knowledge) and may be more deeply consolidated. Prosodic targets are gradient and context-dependent, harder to learn from explicit instruction. L2 prosodic transfer is well-documented: Spanish-English speakers tend to syllable-time their English; French speakers stretch final syllables; Japanese speakers nativelike segments often retain mora-timed rhythm. Deliberate prosodic training has shown some success in late-acquired L2 prosody.

How is prosody affected by neurological disorders?

Several disorders disrupt prosody differentially. Right-hemisphere stroke causes aprosodia — flat speech with reduced pitch range and disrupted timing. Parkinson's disease produces "monoloud, monopitch" speech with reduced prosodic contrast. Foreign Accent Syndrome — a rare consequence of stroke or trauma — alters prosody in ways listeners interpret as a non-native accent, despite the speaker having no L2 history. Autism spectrum often involves prosodic atypicality, both in production (unusual stress patterns) and perception (difficulty interpreting affective prosody). Williams syndrome shows the opposite — exaggerated, hypersocial prosody. The dissociations suggest prosody recruits neural networks partly distinct from segmental phonology.