Theory

Poverty of the Stimulus Argument

Children acquire grammatical knowledge that goes beyond what input alone could support

The Poverty of the Stimulus argument, formulated by Noam Chomsky in his 1971 Harvard William James Lectures and elaborated across subsequent work, claims that children acquire grammatical knowledge richer than what input plus general-purpose induction could yield. The classic case is auxiliary inversion in English yes-no questions: from "The man who is tall is happy" children produce "Is the man who is tall happy?", never the linear-rule error "Is the man who tall is happy?" — even though, by Chomsky's argument, the input does not contain decisive examples to choose between the structure-dependent and linear rules. If grammatical knowledge outstrips input, something must come from inside the learner.

  • OriginatorNoam Chomsky, William James Lectures 1971
  • Canonical caseAuxiliary inversion in yes-no questions
  • Acquisition dataCrain & Nakayama (Language, 1987)
  • PopularizerPinker, The Language Instinct (1994)
  • Empirical criticPullum & Scholz (Linguistic Review, 2002)
  • Bayesian replyPerfors, Tenenbaum, Regier (2011)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How the argument works

The Poverty of the Stimulus argument has a fixed shape. It takes some piece of grammatical knowledge K that adult speakers reliably possess, examines the input a child plausibly receives, argues the input under-determines K (multiple incompatible hypotheses are consistent with what the child hears), and concludes K must be supplied by something other than induction over input — for Chomsky, by Universal Grammar.

The skeleton:

  1. Identify a generalization adults make. Adults form yes-no questions by fronting the matrix auxiliary, even when other auxiliaries are present in embedded clauses.
  2. Identify a simpler alternative the data also fit. A linear rule "front the first auxiliary" fits all simple input ("The man is tall" → "Is the man tall?").
  3. Show input under-determines the choice. Most child input is simple sentences where the two rules agree. Sentences with embedded auxiliaries that decisively distinguish the rules are claimed to be rare or absent.
  4. Show children make the right choice anyway. Crain and Nakayama (1987) elicited yes-no questions from children with embedded auxiliaries; none produced the linear error.
  5. Conclude knowledge is partly innate. Either the structure-dependent rule itself is innate (strong UG), or the bias toward hierarchical generalizations is.

The argument is not unique to syntax. Parallel cases exist for binding, anaphora, scope, control, ellipsis. Each constitutes its own empirical case. The collective claim — across many domains, knowledge outstrips input — is the Plato's-problem framing Chomsky has emphasized since Knowledge of Language (1986).

Worked example: auxiliary inversion

Consider how a learner might form a yes-no question from a complex declarative. The declarative "The man who is tall is happy" contains two auxiliaries — the embedded is inside the relative clause and the matrix is. Two candidate rules predict different outputs:

  • Linear rule. Front the first auxiliary in the sentence. Output: "Is the man who tall is happy?" — ungrammatical.
  • Structure-dependent rule. Front the matrix-clause auxiliary, ignoring auxiliaries in embedded clauses. Output: "Is the man who is tall happy?" — grammatical.

The linear rule is computationally simpler — it operates on word position. The structure-dependent rule requires hierarchical parsing. A purely distributional learner with no bias for hierarchy might prefer the linear rule on grounds of parsimony.

Crain and Nakayama's 1987 elicited-production study tested this directly. They showed children (aged 3 to 5) pictures depicting situations like a man with a trait and asked them to question Jabba the Hutt. Across 12 items requiring complex declaratives with embedded auxiliaries, children produced the structure-dependent form. They never produced the linear-rule error, even at the youngest ages.

Chomsky's claim: children must respect hierarchy from the start because something inside them prefers structure-dependent generalizations. Whether that something is specifically grammatical (UG) or a more general bias toward hierarchical inference is the live debate.

Other classical poverty-of-stimulus cases

  • Anaphor binding. Children allow "John saw himself" with himself = John, but disallow "John said Mary saw himself" with himself = John (Condition A: anaphors need a local antecedent). Negative input — that the second is bad — is rarely provided; children avoid it from the start.
  • Strong crossover. "Who does he think likes Mary?" cannot be interpreted with he = who. The constraint is subtle and cross-linguistically robust.
  • That-trace effects. "Who do you think left?" is fine; "*Who do you think that left?" is bad in standard English. Children rarely receive this contrast in input but acquire the constraint.
  • Constraints on wh-movement (island effects). Ross's (1967) island constraints — Complex NP, Wh-Island, Adjunct Island — are observed cross-linguistically with little overt instruction.
  • Subset principle cases. Where one grammar is a strict subset of another, learners default to the smaller grammar (Berwick 1985; Manzini and Wexler 1987) — a pre-empirical bias not derivable from input alone.

Poverty of the stimulus vs alternative accounts of acquisition

Poverty of Stimulus (UG)Statistical learningBayesian inferenceConnectionismUsage-basedConstruction Grammar
OriginatorsChomsky 1971, 1980, 1986Saffran, Aslin, Newport 1996Tenenbaum, Regier 2000sRumelhart, McClelland 1986Tomasello 2003Goldberg 1995
Innate componentSpecifically linguisticDomain-general statisticsHypothesis-space priorGeneral learningDomain-general cognitionNone specifically grammatical
Input roleTriggering experienceDistributional inputBayesian updateTraining dataItem-based learningFrequency-driven
Auxiliary inversionStructure-dependence innateTrigrams insufficientHierarchical hypothesis winsRNN learns from corpusConstructions over timeItem-based
Child error dataPredicts no linear errorsPredicts errors with sparse inputStrong prior, no errorsSometimes errorsErrors expectedErrors expected
Empirical strengthStrong on corpus rarityStrong on segmentationStrong on hierarchyMixed on syntaxStrong on early stagesStrong on argument structure
Major textsAspects (1965), Knowledge of Language (1986)Saffran et al. Science 1996Perfors et al. Cognition 2011Elman 1990, 1991Tomasello 2003Goldberg 1995, 2006

Empirical counterarguments

Geoffrey Pullum and Barbara Scholz launched the most cited empirical critique in The Linguistic Review (2002). Their target: the premise that input contains no examples to distinguish the structure-dependent rule from the linear one. They surveyed the Wall Street Journal corpus and the CHILDES child-directed-speech corpus. They found yes-no questions with embedded auxiliaries do occur — not in every utterance, but at rates consistent with normal acquisition timeframes. The premise of input poverty was, in their view, overstated.

Florencia Reali and Morten Christiansen (Cognitive Science, 2005) trained simple recurrent neural networks on CHILDES data and showed the networks distinguished structure-dependent from linear yes-no questions, replicating the auxiliary-inversion judgment without any innate grammar module. They argued distributional cues — bigram frequencies, word-class statistics — suffice.

Amy Perfors, Joshua Tenenbaum, and Terry Regier (Cognition, 2011) approached the problem Bayesianly. Given a hypothesis space containing both regular-grammar (linear) and context-free (hierarchical) candidates, and given realistic CHILDES input, the Bayesian posterior strongly favors the hierarchical grammar. The structure-dependent generalization wins on rational inference grounds — no need to posit innate grammar-specific bias, only a hypothesis space that includes hierarchy.

The defenses: Lasnik and Lidz (in The Oxford Handbook of Universal Grammar, 2017) argue the corpus-rate finding does not address whether children rely on those rare examples. The neural-network results model some surface patterns but not the abstract structure-dependent constraint operating across all derivations. The Bayesian models presuppose hypothesis spaces a learner must already possess — that prior, the defense holds, is itself the Universal Grammar.

Variants of the argument

  • Original (Chomsky 1971). Auxiliary inversion as canonical case. Argument largely informal.
  • Logical form (Chomsky 1986, Knowledge of Language). Plato's problem framing. Knowledge outstrips experience generally, not just in syntax. Innateness defended philosophically and empirically.
  • Acquisition-grounded (Crain & Nakayama 1987 onward). Empirical case from elicited-production studies in syntactic islands, anaphora, ellipsis. Stephen Crain and Rosalind Thornton's Investigations in Universal Grammar (1998) is the methods reference.
  • Bayesian (Perfors et al. 2011). Acknowledges input richness; argues UG is the prior over hypothesis spaces, not specifically linguistic content. Compatible with Chomsky's later "third-factor" remarks (2005, Linguistic Inquiry).
  • Connectionist rebuttals (Lewis & Elman 2001, Reali & Christiansen 2005). Domain-general learners trained on realistic input acquire many syntactic patterns, suggesting weaker innate component than Chomsky proposed.
  • Usage-based (Tomasello 2003). Children build grammar item by item from frequent constructions. Generalization is gradual and statistically driven; no syntax-specific UG needed.

Common pitfalls in evaluating the argument

  • Treating "poverty" as zero-input. The claim is not that decisive examples never occur but that they are too sparse for reliable induction without bias. Counting tokens does not settle the issue — what matters is whether the available evidence picks the right hypothesis from a realistic hypothesis space.
  • Conflating UG with specific grammatical content. Chomsky's later writings (especially the 2005 "three factors" paper in Linguistic Inquiry) treat much variation as third-factor (computational, biological, general-cognitive). UG narrows over time; the argument's force does not require rich innate syntax.
  • Mistaking elegant alternatives for refutations. A neural network reproducing a pattern shows the pattern is learnable from input given that architecture. It does not show humans use that mechanism.
  • Equating "innate bias toward hierarchy" with "innate grammar". Many critics (Bayesians, Tomasello) accept the former and reject the latter. The argument's strongest reading is the weaker bias claim.
  • Skipping the linguistic specifics. Each poverty case rests on careful linguistic analysis. Generic discussion of "innate language" is not the argument; auxiliary inversion, anaphor binding, island constraints are.
  • Reading negative evidence claims as universal. Brown and Hanlon (1970) and Marcus (1993) showed explicit correction is rare; they did not show indirect feedback is absent. Chouinard and Clark (2003) found subtle reformulation feedback in parent-child interaction. The negative-evidence question is empirically open.

Legacy and current status

The Poverty of the Stimulus argument shaped fifty years of debate over innateness, the structure of language acquisition, and the boundary between linguistic and general cognition. It motivated Universal Grammar, then Principles and Parameters, then Minimalism. Outside generative linguistics, it provoked the strongest sustained reply from connectionists, statistical learners, and usage-based theorists.

Pinker's The Language Instinct (1994) brought the argument to a wide audience and remains the most-read defense. Recent work has shifted from binary innatist/empiricist debate toward fine-grained empirical questions: which specific patterns are under-determined, which can be learned from realistic input given which biases, and where biological constraint enters. The argument has lost rhetorical purity but gained empirical depth. Both Chomskyans and their critics now treat acquisition as a problem of learner architecture meeting realistic input, with the locus of innate constraint as the remaining issue.

Frequently asked questions

What is the auxiliary-inversion argument exactly?

Chomsky's classic case (1971 Harvard lectures, repeated in many writings) considers how children form yes-no questions. Given declarative "The man who is tall is happy", the question is "Is the man who is tall happy?" — the matrix-clause auxiliary fronts, not the embedded one. A linear rule ("front the first auxiliary") would yield ungrammatical "Is the man who tall is happy?" Children never produce that error, suggesting they apply the structure-dependent rule from the start. The argument: children generalize structure-dependently, but input rarely contains complex yes-no questions of the kind that would unambiguously pick the structure-dependent rule over the linear one. The structure-dependent rule must be innately favored.

What is the strongest empirical reply?

Pullum and Scholz (Linguistic Review, 2002) examined the Wall Street Journal and CHILDES corpora and found auxiliary-inversion examples with embedded clauses are not vanishingly rare — they appear in input children plausibly hear. Reali and Christiansen (Cognitive Science, 2005) showed simple recurrent neural networks trained on child-directed speech produce structure-respecting yes-no questions without explicit hierarchy. Lewis and Elman (Journal of Child Language, 2001) made a similar point. The empirical premise that input lacks evidence for the structure-dependent rule is contested.

What did Crain and Nakayama show experimentally?

Stephen Crain and Mineharu Nakayama (Language, 1987) elicited yes-no questions from children aged 3 to 5. They showed children pictures and asked them to question the puppet. Crucially, the test items required moving an auxiliary across an embedded clause. None of the children produced the linear-rule error; all respected structure dependence. The finding remains a benchmark in the debate. Critics argue children may have learned the structure-dependent rule from indirect input cues, but the elicited-production result is uncontested.

Does the argument require Universal Grammar?

Chomsky's interpretation: yes — innate principles, not domain-general learning, drive structure dependence. Critics offer alternative learning mechanisms. Statistical learners (Saffran, Aslin, Newport, 1996, on word segmentation) can extract distributional regularities. Bayesian learners (Perfors, Tenenbaum, Regier, 2011) given hierarchical hypothesis spaces converge on structure-dependent rules from realistic input. The conclusion that grammatical knowledge outstrips input is widely accepted; the inference to specifically linguistic UG is contested.

How is poverty of the stimulus related to Plato's problem?

Chomsky frames the argument as a contemporary version of Plato's problem — how do humans know so much given so little experience? Plato's Meno had Socrates demonstrate a slave boy already knows geometry; Chomsky's analogue is that children know grammar before sufficient input. The philosophical move is the same: knowledge outstrips experience, so something must be innate. Whether the innate component is grammatical or domain-general is the empirical question.

What about negative evidence?

Brown and Hanlon (1970) found parents do not systematically correct grammatical errors — they correct truth, not form. Marcus (1993) confirmed in larger studies that negative feedback is rare and unreliable. The poverty argument leans on this: even if input contains structure-dependent forms, the absence of correction means children cannot learn what is ungrammatical from explicit instruction. They must avoid the linear-rule error from internal constraint, not from feedback. Recent work (Chouinard and Clark, 2003) finds subtle reformulation feedback exists, complicating but not refuting the original claim.

Is the argument one argument or many?

Many. Auxiliary inversion is canonical but not unique. Other cases: anaphor binding (children avoid "John saw himself" with himself coreferring with someone else), constraints on wh-movement (island violations), strong cross-over, parasitic gaps, and the distribution of empty categories. Each constitutes a separate poverty-of-stimulus case with its own empirical questions. Lasnik and Lidz (2017, in The Oxford Handbook of Universal Grammar) survey the inventory. Some cases are robust; others (auxiliary inversion among them) are contested. The collective force depends on how many survive scrutiny.