Syntax
Constituency
Why "the old dog" hangs together as a unit — phrase structure and hierarchical syntax
Constituency is the property by which words group into hierarchical units — phrases — that behave syntactically as wholes. In "The old dog barked at the cat", "the old dog" is a noun phrase (NP) constituent: it can be replaced by "it", moved as a unit ("Was it the old dog that barked?"), and questioned ("What barked?"). Wundt and Bloomfield laid the groundwork; Leonard Bloomfield's "immediate constituent analysis" (1933) formalized it; Chomsky's Syntactic Structures (1957) embedded it in generative phrase-structure grammar. Constituency is tested by substitution, movement, coordination, and ellipsis. Competes with dependency grammar (Tesnière) as the dominant alternative theory of syntactic structure.
- Formalized byBloomfield (1933) — Immediate Constituent Analysis
- Generative formChomsky (1957) — Phrase Structure Grammar in Syntactic Structures
- Standard testsSubstitution, movement, coordination, ellipsis, clefting
- NotationTree diagrams, labeled brackets [NP [Det the][N dog]]
- Universal claimAll languages use hierarchical constituent structure (controversial)
- Main competitorDependency Grammar (Tesnière 1959)
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
Why constituency matters
- Syntactic theory. Phrase-structure grammars assume hierarchical constituent organization.
- Parsing. Probabilistic context-free grammars and treebank parsers rely on constituent annotations.
- Semantic composition. Compositional semantics builds meaning along the syntactic tree (Heim and Kratzer).
- Ambiguity resolution. Bracketing ambiguity in PP attachment is a classic NLP problem.
- Translation. Tree-based MT uses constituent alignment between languages.
- Language acquisition. Children's errors reveal emerging constituent structure (Crain).
- Treebanks. Penn Treebank, Stanford Tregex, etc., underpin computational linguistics work.
Common misconceptions
- Words are linearly strung together. Hierarchy is real and testable; flat lists fail constituency tests.
- Tree structure equals word order. The same word order can have different bracketings; ambiguity is structural.
- Every adjacent pair is a constituent. "Dog barked" in "the dog barked" is not a constituent — only "the dog" is.
- Constituency is universal and uncontroversial. Dependency grammar offers a different account; non-configurational languages challenge strict PSG.
- Constituents are defined by meaning. They are defined by syntactic behavior; meaning aligns approximately, not perfectly.
- Trees are just diagrams. They encode formal claims about hierarchy, scope, and movement that have empirical consequences.
Frequently asked questions
What is the substitution test?
A constituent can be replaced by a single proform (pronoun, do-so, there). "The old dog with floppy ears" is a constituent because we can replace it with "it" — "It barked." If a string cannot be replaced by a proform, it likely is not a constituent. The test must be applied with care; some constituents have no available proform, and some non-constituents accidentally allow substitution.
How does the movement test work?
If a string can be fronted, clefted, or topicalized as a unit, it is a constituent. "On Tuesday, John went home" — "on Tuesday" can move because it is a PP constituent. "*John on Tuesday went home" is awkward because the partial string is not movable as a unit. Wh-questions also test constituency: "Whom did John see?" extracts the object NP.
What is the coordination test?
Only like constituents can be coordinated by "and" or "or". "John ate apples and oranges" — both are NPs. "John walks slowly and carefully" — both are AdvPs. If two strings can be coordinated, they are likely constituents of the same type. Coordination provides one of the strongest constituency tests, though across-the-board exceptions (Ross 1967) complicate matters.
How does Chomsky's PSG formalize this?
Phrase-structure rules rewrite categories: S → NP VP, NP → Det N, VP → V NP. Applied recursively, they generate trees. Each non-terminal node represents a constituent. The 1957 Syntactic Structures version was context-free; Aspects (1965), Government and Binding (1981), Minimalism (1993) refined the formalism but kept hierarchical constituent structure central.
Are all languages constituent-based?
Mainstream generative theory says yes. But "non-configurational" languages (Warlpiri, Latin, free-word-order languages) challenge fixed constituent structure. Hale (1983) proposed flat structures; Pullum and Borsley argued universal constituency. Hauser, Chomsky, Fitch (2002) attribute constituency-supporting recursion to FLN, the faculty of language narrow.
What about constituency in head-final languages?
Japanese and Korean place heads at the end of phrases. "watashi ga sushi o tabeta" (I sushi ate) — VP is "sushi o tabeta", with object preceding verb. Constituent tests apply, just mirrored. Right-headed phrase structure is the parameter; constituency itself is the universal. Kayne's Antisymmetry program even argues all underlying structure is left-headed, with movement deriving surface variation.
How do trees handle ambiguity?
Multiple bracketings yield multiple meanings. "I saw the man with the telescope" has two trees: PP attaches to NP (the man holding a telescope) or to VP (I used the telescope). Bracketing ambiguity is handled by giving the sentence multiple structural analyses, each with its own constituent structure. PP-attachment ambiguity is a classic problem in NLP parsing.