Syntax

Constituency

Why "the old dog" hangs together as a unit — phrase structure and hierarchical syntax

Constituency is the property by which words group into hierarchical units — phrases — that behave syntactically as wholes. In "The old dog barked at the cat", "the old dog" is a noun phrase (NP) constituent: it can be replaced by "it", moved as a unit ("Was it the old dog that barked?"), and questioned ("What barked?"). Wundt and Bloomfield laid the groundwork; Leonard Bloomfield's "immediate constituent analysis" (1933) formalized it; Chomsky's Syntactic Structures (1957) embedded it in generative phrase-structure grammar. Constituency is tested by substitution, movement, coordination, and ellipsis. Competes with dependency grammar (Tesnière) as the dominant alternative theory of syntactic structure.

  • Formalized byBloomfield (1933) — Immediate Constituent Analysis
  • Generative formChomsky (1957) — Phrase Structure Grammar in Syntactic Structures
  • Standard testsSubstitution, movement, coordination, ellipsis, clefting
  • NotationTree diagrams, labeled brackets [NP [Det the][N dog]]
  • Universal claimAll languages use hierarchical constituent structure (controversial)
  • Main competitorDependency Grammar (Tesnière 1959)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why constituency matters

  • Syntactic theory. Phrase-structure grammars assume hierarchical constituent organization.
  • Parsing. Probabilistic context-free grammars and treebank parsers rely on constituent annotations.
  • Semantic composition. Compositional semantics builds meaning along the syntactic tree (Heim and Kratzer).
  • Ambiguity resolution. Bracketing ambiguity in PP attachment is a classic NLP problem.
  • Translation. Tree-based MT uses constituent alignment between languages.
  • Language acquisition. Children's errors reveal emerging constituent structure (Crain).
  • Treebanks. Penn Treebank, Stanford Tregex, etc., underpin computational linguistics work.

Common misconceptions

  • Words are linearly strung together. Hierarchy is real and testable; flat lists fail constituency tests.
  • Tree structure equals word order. The same word order can have different bracketings; ambiguity is structural.
  • Every adjacent pair is a constituent. "Dog barked" in "the dog barked" is not a constituent — only "the dog" is.
  • Constituency is universal and uncontroversial. Dependency grammar offers a different account; non-configurational languages challenge strict PSG.
  • Constituents are defined by meaning. They are defined by syntactic behavior; meaning aligns approximately, not perfectly.
  • Trees are just diagrams. They encode formal claims about hierarchy, scope, and movement that have empirical consequences.

Frequently asked questions

What is the substitution test?

A constituent can be replaced by a single proform (pronoun, do-so, there). "The old dog with floppy ears" is a constituent because we can replace it with "it" — "It barked." If a string cannot be replaced by a proform, it likely is not a constituent. The test must be applied with care; some constituents have no available proform, and some non-constituents accidentally allow substitution.

How does the movement test work?

If a string can be fronted, clefted, or topicalized as a unit, it is a constituent. "On Tuesday, John went home" — "on Tuesday" can move because it is a PP constituent. "*John on Tuesday went home" is awkward because the partial string is not movable as a unit. Wh-questions also test constituency: "Whom did John see?" extracts the object NP.

What is the coordination test?

Only like constituents can be coordinated by "and" or "or". "John ate apples and oranges" — both are NPs. "John walks slowly and carefully" — both are AdvPs. If two strings can be coordinated, they are likely constituents of the same type. Coordination provides one of the strongest constituency tests, though across-the-board exceptions (Ross 1967) complicate matters.

How does Chomsky's PSG formalize this?

Phrase-structure rules rewrite categories: S → NP VP, NP → Det N, VP → V NP. Applied recursively, they generate trees. Each non-terminal node represents a constituent. The 1957 Syntactic Structures version was context-free; Aspects (1965), Government and Binding (1981), Minimalism (1993) refined the formalism but kept hierarchical constituent structure central.

Are all languages constituent-based?

Mainstream generative theory says yes. But "non-configurational" languages (Warlpiri, Latin, free-word-order languages) challenge fixed constituent structure. Hale (1983) proposed flat structures; Pullum and Borsley argued universal constituency. Hauser, Chomsky, Fitch (2002) attribute constituency-supporting recursion to FLN, the faculty of language narrow.

What about constituency in head-final languages?

Japanese and Korean place heads at the end of phrases. "watashi ga sushi o tabeta" (I sushi ate) — VP is "sushi o tabeta", with object preceding verb. Constituent tests apply, just mirrored. Right-headed phrase structure is the parameter; constituency itself is the universal. Kayne's Antisymmetry program even argues all underlying structure is left-headed, with movement deriving surface variation.

How do trees handle ambiguity?

Multiple bracketings yield multiple meanings. "I saw the man with the telescope" has two trees: PP attaches to NP (the man holding a telescope) or to VP (I used the telescope). Bracketing ambiguity is handled by giving the sentence multiple structural analyses, each with its own constituent structure. PP-attachment ambiguity is a classic problem in NLP parsing.