Phonetics

Coarticulation

Why no speech sound is ever produced in isolation — gestural overlap and anticipation

Coarticulation is the overlapping articulation of adjacent speech sounds, where the articulators (lips, tongue, velum, larynx) anticipate or carry over the configuration of neighboring segments. The /k/ in "key" is articulated farther forward on the palate than the /k/ in "caw" because the tongue is already preparing for the front vowel /i/ versus the back vowel /ɔ/. Documented systematically by Öhman (1966) using X-ray cineradiography. Distinguished from phonological assimilation by being gradient and physical rather than categorical and phonemic. Underlies why speech recognition is hard: every phoneme is shaped by its context.

  • PioneerSven Öhman (1966) — vowel-to-vowel coarticulation in Swedish
  • DirectionAnticipatory (rightward) and carryover (leftward); both universal
  • DomainLips, tongue body, tongue tip, velum, larynx
  • Time scale50-200 ms — gestures overlap continuously
  • Classic example/k/ in "key" [k̟] vs. "caw" [k̠] — palatal vs. velar place
  • Theoretical frameArticulatory Phonology (Browman, Goldstein 1986)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why coarticulation matters

  • Speech recognition. ASR uses triphone or biphone models because phonemes alone do not generalize.
  • Speech synthesis. Concatenative TTS must blend diphones; neural TTS learns coarticulation implicitly from data.
  • Phonetic transcription. Narrow IPA must capture context effects; broad transcription cannot.
  • Sound change. Coarticulatory tendencies become phonologized assimilations historically.
  • Clinical phonetics. Apraxia and dysarthria disrupt coarticulatory timing distinctively.
  • Forensic linguistics. Speaker-specific coarticulation patterns aid voice identification.
  • Sign language. Movement-hold-movement coarticulation in ASL parallels spoken-language gestures.

Common misconceptions

  • Phonemes are uniform abstract beads on a string. Each surface phone is shaped by its neighbors; the bead model is a useful fiction.
  • Coarticulation is sloppiness. It is a fundamental property of all motor behavior, not optional or careless.
  • Slow speech eliminates it. Even hyperarticulated speech has coarticulation, just reduced.
  • Only consonants affect vowels. Vowel-to-vowel coarticulation across consonants (Öhman 1966) is robust.
  • Coarticulation is the same as assimilation. Gradient vs. categorical, physical vs. phonemic.
  • Children produce coarticulation natively. Adult-like coordination develops slowly, with anticipatory coarticulation maturing into adolescence.

Frequently asked questions

How does coarticulation differ from assimilation?

Coarticulation is gradient and continuous — the tongue is in many places at once, blending. Assimilation is categorical — a phoneme is replaced by another phoneme. English "input" pronounced [ɪmpʊt] is assimilation (n → m). The fact that the [n] in "Anna" is slightly retracted because of the following /a/ is coarticulation. Phonologization is when coarticulation becomes assimilation through reanalysis by learners.

What is anticipatory coarticulation?

Articulators move toward an upcoming target before completing the current segment. Lip rounding for /u/ in "stew" begins during /s/, [s] sounds slightly labialized. In French "tu", lip rounding can begin even earlier. This requires planning — the speaker's motor system has staged future segments. Children acquire anticipatory coarticulation gradually; it stabilizes by age 8.

What does Articulatory Phonology say?

Browman and Goldstein (1986, 1992) propose that the basic units of speech are gestures — coordinated movements of articulators — not abstract phonemes. Gestures overlap in time-varying score-like representations. Coarticulation is built into the formalism rather than added as a process. The framework makes predictions verifiable with X-ray microbeam and EMA (electromagnetic articulography) data.

How does coarticulation affect listeners?

Listeners use coarticulatory cues to predict upcoming segments — Mann and Repp (1980) showed listeners report more /k/ versus /t/ depending on coarticulatory information from neighboring vowels. The McGurk effect demonstrates audiovisual integration. Listeners do not undo coarticulation; they exploit it. This is why isolated phonemes sound unnatural and synthesized speech without coarticulation sounds robotic.

Why is /n/ in "tenth" different from /n/ in "ten"?

In "tenth" the /n/ is dental [n̪] because the following /θ/ is dental — the tongue tip is already at the teeth. In "ten" the /n/ is alveolar [n]. This place coarticulation happens automatically; speakers do not consciously plan it. It is a clear case of anticipatory tongue tip placement adjusting to upcoming context.

Do all languages coarticulate equally?

No. Languages differ in coarticulatory degree. Beddor's work shows English nasalizes vowels strongly before nasal consonants, while French and Polish nasalize less because they have phonemic nasal vowels (the contrast must be preserved). Manuel (1990) showed Bantu languages with five-vowel systems have more coarticulation than seven- or nine-vowel systems where vowels must stay distinct.

How does coarticulation help speech recognition?

It provides redundant cues. The /b/ in "bee" leaves traces in the formant transitions of the /i/. ASR systems trained on context-dependent triphones (each phoneme conditioned on neighbors) outperform context-independent models by orders of magnitude. The phenomenon that makes recognition hard — context dependency — is also what gives the recognizer extra signal to work with.