Syntax

Constituency

Why "the old dog" hangs together as a unit — phrase structure and hierarchical syntax

Constituency is the property by which words group into hierarchical units — phrases — that behave syntactically as wholes. In "The old dog barked at the cat", "the old dog" is a noun phrase (NP) constituent: it can be replaced by "it", moved as a unit ("Was it the old dog that barked?"), and questioned ("What barked?"). Wundt and Bloomfield laid the groundwork; Leonard Bloomfield's "immediate constituent analysis" (1933) formalized it; Chomsky's Syntactic Structures (1957) embedded it in generative phrase-structure grammar. Constituency is tested by substitution, movement, coordination, and ellipsis. Competes with dependency grammar (Tesnière) as the dominant alternative theory of syntactic structure.

Formalized byBloomfield (1933) — Immediate Constituent Analysis
Generative formChomsky (1957) — Phrase Structure Grammar in Syntactic Structures
Standard testsSubstitution, movement, coordination, ellipsis, clefting
NotationTree diagrams, labeled brackets [NP [Det the][N dog]]
Universal claimAll languages use hierarchical constituent structure (controversial)
Main competitorDependency Grammar (Tesnière 1959)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why constituency matters

Syntactic theory. Phrase-structure grammars assume hierarchical constituent organization.
Parsing. Probabilistic context-free grammars and treebank parsers rely on constituent annotations.
Semantic composition. Compositional semantics builds meaning along the syntactic tree (Heim and Kratzer).
Ambiguity resolution. Bracketing ambiguity in PP attachment is a classic NLP problem.
Translation. Tree-based MT uses constituent alignment between languages.
Language acquisition. Children's errors reveal emerging constituent structure (Crain).
Treebanks. Penn Treebank, Stanford Tregex, etc., underpin computational linguistics work.

Common misconceptions

Words are linearly strung together. Hierarchy is real and testable; flat lists fail constituency tests.
Tree structure equals word order. The same word order can have different bracketings; ambiguity is structural.
Every adjacent pair is a constituent. "Dog barked" in "the dog barked" is not a constituent — only "the dog" is.
Constituency is universal and uncontroversial. Dependency grammar offers a different account; non-configurational languages challenge strict PSG.
Constituents are defined by meaning. They are defined by syntactic behavior; meaning aligns approximately, not perfectly.
Trees are just diagrams. They encode formal claims about hierarchy, scope, and movement that have empirical consequences.

Frequently asked questions

What is the substitution test?

A constituent can be replaced by a single proform (pronoun, do-so, there). "The old dog with floppy ears" is a constituent because we can replace it with "it" — "It barked." If a string cannot be replaced by a proform, it likely is not a constituent. The test must be applied with care; some constituents have no available proform, and some non-constituents accidentally allow substitution.

How does the movement test work?

If a string can be fronted, clefted, or topicalized as a unit, it is a constituent. "On Tuesday, John went home" — "on Tuesday" can move because it is a PP constituent. "*John on Tuesday went home" is awkward because the partial string is not movable as a unit. Wh-questions also test constituency: "Whom did John see?" extracts the object NP.

What is the coordination test?

Only like constituents can be coordinated by "and" or "or". "John ate apples and oranges" — both are NPs. "John walks slowly and carefully" — both are AdvPs. If two strings can be coordinated, they are likely constituents of the same type. Coordination provides one of the strongest constituency tests, though across-the-board exceptions (Ross 1967) complicate matters.

How does Chomsky's PSG formalize this?

Phrase-structure rules rewrite categories: S → NP VP, NP → Det N, VP → V NP. Applied recursively, they generate trees. Each non-terminal node represents a constituent. The 1957 Syntactic Structures version was context-free; Aspects (1965), Government and Binding (1981), Minimalism (1993) refined the formalism but kept hierarchical constituent structure central.

Are all languages constituent-based?

Mainstream generative theory says yes. But "non-configurational" languages (Warlpiri, Latin, free-word-order languages) challenge fixed constituent structure. Hale (1983) proposed flat structures; Pullum and Borsley argued universal constituency. Hauser, Chomsky, Fitch (2002) attribute constituency-supporting recursion to FLN, the faculty of language narrow.

What about constituency in head-final languages?

Japanese and Korean place heads at the end of phrases. "watashi ga sushi o tabeta" (I sushi ate) — VP is "sushi o tabeta", with object preceding verb. Constituent tests apply, just mirrored. Right-headed phrase structure is the parameter; constituency itself is the universal. Kayne's Antisymmetry program even argues all underlying structure is left-headed, with movement deriving surface variation.

How do trees handle ambiguity?

Multiple bracketings yield multiple meanings. "I saw the man with the telescope" has two trees: PP attaches to NP (the man holding a telescope) or to VP (I used the telescope). Bracketing ambiguity is handled by giving the sentence multiple structural analyses, each with its own constituent structure. PP-attachment ambiguity is a classic problem in NLP parsing.

Interactive visualization

Watch the 60-second explainer

Why constituency matters

Common misconceptions

Frequently asked questions

Related concepts