Question 1

How is VOT measured?

Accepted Answer

VOT is measured from a wide-band spectrogram or waveform. The release of the stop is identified as a transient burst — a brief spike of energy as the closure opens. The onset of voicing is identified as the first appearance of regular vocal-fold vibration, visible as periodic pulses in the waveform or as a low-frequency striation pattern in the spectrogram. The interval between these two events is the VOT. Positive VOT means voicing follows release; negative VOT (prevoicing) means voicing precedes release. Praat (Boersma 1992 onward) is the standard tool, with semi-automated VOT measurement scripts in widespread use.

Question 2

What languages have what VOT systems?

Accepted Answer

Lisker and Abramson's 1964 cross-language study identified three categories. Two-way voiced/voiceless systems (Spanish, French, Russian, Hungarian) contrast prevoiced [b] (negative VOT) with voiceless unaspirated [p] (short positive). Two-way aspirated systems (English, German) contrast short-lag [b] (close to zero) with long-lag [pʰ] (positive aspirated). Three-way systems (Thai, Hindi, Korean — though Korean is technically four-way with tense/lenis/aspirated distinctions) use all three regions. Some languages have implosives or ejectives — different airstream mechanisms beyond the VOT axis. The VOT continuum is one of the cleanest cross-linguistic acoustic dimensions.

Question 3

What is categorical perception of VOT?

Accepted Answer

Alvin Liberman, Pierre Delattre, and Frank Cooper at Haskins Laboratories (1957), and later Liberman, Harris, Hoffman, Griffith (1957), generated synthetic stops with VOTs varying continuously and asked English-speaking listeners to identify them. Listeners showed a sharp boundary around +25 ms — stimuli below were heard as [b], above as [p], with little gradient response. Discrimination was likewise enhanced across the boundary. The phenomenon — categorical perception — was initially considered uniquely linguistic but later shown for other domains (color, faces). Patricia Kuhl's infant studies (1980s) showed infants from non-English backgrounds also discriminate at English boundaries before language-specific tuning narrows perception.

Question 4

How do bilinguals handle different VOT systems?

Accepted Answer

Bilinguals show partial separation of VOT distributions across their two languages, but rarely full nativelike values. Fred Genesee's and James Flege's work shows L2 learners produce intermediate VOTs, blending native and target patterns. A Spanish-English bilingual may produce English [p] with shorter VOT than monolingual English speakers. Code-switching often triggers VOT shifts mid-utterance. Forensic phonetics uses VOT as one feature in speaker identification — speakers' VOT distributions are reasonably stable signatures within a language but vary across registers.

Question 5

How does VOT change developmentally?

Accepted Answer

Children's early productions show shorter VOT distinctions than adult productions. English-learning toddlers may produce [b] and [p] both with short-lag VOT, gradually extending the [p] distribution to long-lag values by age four. Fundamental motor control over the timing of laryngeal and oral gestures matures over years. Susan Curtin and colleagues have documented the gradual VOT separation in longitudinal studies. By kindergarten, most children produce nativelike VOT distributions, though some delays persist into elementary school.

Question 6

What are the articulatory mechanisms underlying VOT?

Accepted Answer

VOT depends on the relative timing of two events: release of the supralaryngeal closure (lips for labials, tongue tip for coronals, tongue back for velars) and initiation of vocal-fold vibration. Vocal-fold vibration requires (a) adduction (the folds touch with appropriate tension) and (b) airflow across them creating Bernoulli oscillation. Voiced stops require active expansion of the supraglottal cavity (larynx lowering, pharynx widening) to maintain airflow against intraoral pressure. Voiceless aspirated stops require glottal abduction at release. Languages select one of these articulatory configurations as default for each phoneme.

Question 7

How is VOT used in clinical and forensic phonetics?

Accepted Answer

VOT is a sensitive marker of motor control and is altered in speech disorders. Apraxia of speech often shows abnormally variable VOT. Parkinson's disease patients tend toward shortened VOT for voiceless stops (reduced aspiration). Stuttering involves VOT disruptions during disfluency. Speech-language pathologists measure VOT to track therapy progress. Forensic linguists use VOT distributions as one feature in speaker comparison — though VOT is too overlapping across populations to be a unique identifier, it contributes to multivariate profiles. Bilingual VOT signatures can sometimes identify a speaker's L1.

Voice Onset Time

Interactive visualization

Watch the 60-second explainer

Why VOT matters

Common misconceptions

Frequently asked questions

Interactive visualization

Watch the 60-second explainer

Why VOT matters

Common misconceptions

Frequently asked questions

Related concepts