Discourse
Anaphora and Cataphora
Backward and forward reference — how a pronoun finds its owner
Anaphora is a referring expression whose interpretation depends on something earlier in the discourse; cataphora reverses the direction, depending on something later. Together they let speakers thread a single referent through long stretches of text without renaming it, and they bridge syntax (binding) with discourse (cohesion).
- Term originGreek ana- "back" + kata- "down/forward"
- Direction (anaphora)Pronoun → earlier antecedent
- Direction (cataphora)Pronoun → later postcedent
- Cover termEndophora (vs exophora = outside text)
- Languages with zero anaphoraJapanese, Mandarin, Korean, Spanish, Italian
- Computational taskCoreference resolution (~80% F1 on OntoNotes)
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How anaphora and cataphora work
Every connected text reuses the same characters dozens of times. If we said the full noun phrase each time — Marie Curie, Marie Curie, Marie Curie — discourse would collapse under repetition. Instead, languages provide pro-forms (she, her, herself) that piggyback on a previous mention. The relation between the pro-form and that mention is anaphora.
The pro-form is the anaphor. The expression it depends on is the antecedent if it appears earlier, the postcedent if it appears later. The interpretive instruction is the same in both cases: find the appropriate full description and substitute it. What differs is the direction of search.
Linguists divide the territory into two layers. The narrow, syntactic layer is governed by Binding Theory: reflexives like herself must be bound inside a tight local domain; pronouns like her must be free in that same domain; full names must be free everywhere. The broader, discourse layer covers reference across clauses and sentences, where cohesion devices, topic continuity and salience take over.
A simple example of anaphora: Mary walked into the room. She was tired. Here she is the anaphor and Mary is the antecedent. The reverse — cataphora — needs more setup: When she walked into the room, Mary was tired. The pronoun she appears first, but the listener holds it as a placeholder until the noun phrase Mary resolves it.
Why this matters
- Discourse cohesion. Anaphora is the most frequent grammatical cohesion device by orders of magnitude.
- Syntax-pragmatics interface. Binding theory and discourse coherence meet here.
- Cross-linguistic typology. Pro-drop and zero anaphora distinguish whole language families.
- Reading comprehension. Pronoun resolution is a major source of misreading in low-literacy adults.
- Machine translation. Translating from a zero-anaphora language to English forces the system to invent pronouns.
- Coreference resolution. A standard NLP task, baseline for most question-answering pipelines.
- Stylistics. Cataphoric openings shape suspense in fiction and journalism leads.
Anaphora vs cataphora
| Anaphora | Cataphora | |
|---|---|---|
| Direction of dependence | Backward — to earlier text | Forward — to later text |
| Frequency in spoken English | Roughly 95 percent of pronouns | Roughly 1–2 percent |
| Typical environment | Anywhere across sentences | Subordinate or fronted clause |
| Discourse function | Maintain a topic without repetition | Build suspense, foreground description |
| Processing cost | Low — antecedent already in working memory | Higher — listener holds an unresolved variable |
| Canonical example | Mary arrived. She was tired. | When she arrived, Mary was tired. |
| Restrictions | Must agree in person, number, gender | Same agreement, plus Binding Condition C (no c-command) |
| Cross-sentence allowed? | Routinely | Rare and stylistically marked |
Cross-linguistic variation
English is a pronoun-prominent language: every finite clause has an overt subject, and in the third person we have to choose between he, she, it, they. The grammar enforces overt anaphora. Other languages drop the pronoun whenever the referent is recoverable.
Japanese is a zero-anaphora language. The continuation of a topic across clauses is the unmarked option, and an overt pronoun would actually feel emphatic or odd. Compare:
English: Tanaka came. He drank coffee.
Japanese: 田中さんは来た。コーヒーを飲んだ。
(Tanaka-san came. ∅ coffee drank.)
The second clause has no subject and no pronoun; Japanese listeners default to the most recent topic-marked referent. Surveys of conversational Japanese find that 40–60 percent of all argument positions are filled by zero anaphora, depending on register.
Mandarin Chinese shows similar behaviour, with topic chains spanning many clauses: 张三买了一本书,∅ 读了三天,∅ 然后送给了朋友 — Zhang San bought a book, ∅ read it for three days, ∅ then gave it to a friend. Subjects and objects can both be dropped once they are tracked.
Romance pro-drop languages (Spanish, Italian, Portuguese) drop the pronoun but keep verbal agreement. Spanish vino "(he/she/it) came" is licensed because the verb morphology already specifies the person and number. The dropped pronoun still counts as anaphoric; it just lacks a phonological exponent.
Languages with grammatical gender like German and French resolve more ambiguity than English. German der Tisch "the table" is masculine, so referring back with er ("he/it") rather than sie ("she/it") rules out competing inanimate referents. English speakers must rely on world knowledge and recency.
Languages with switch-reference like many Australian and Papuan languages mark on the verb itself whether the next subject is the same as the current one. The morphology does the resolution work that pronoun choice does in English.
Worked examples
Cataphoric subordinate clause. When she arrived at the embassy, Mary was already an hour late. The fronted when-clause contains she; the main clause delivers Mary. Reverse the order and the cataphora vanishes (Mary was already an hour late when she arrived is plain anaphora).
Cataphoric introducer. Here's what I think: Bill is wrong. The pro-form what looks forward to the entire content clause that follows. Article leads use this constantly: This is the picture: a billion users, no profit.
Reflexive (Binding Condition A). Mary cut herself. The reflexive must find an antecedent inside its local clause; herself can only mean Mary, never some other person mentioned earlier.
Pronoun (Binding Condition B). Mary thinks she is winning. She can refer to Mary or to someone else previously introduced; the local clause forbids only the reflexive interpretation if herself would be expected.
Crossover blocking. *She kissed Mary cannot mean Mary kissed herself. A pronoun cannot refer cataphorically to a name it c-commands. This is a syntactic, not a discourse, restriction.
Bridging anaphora. I bought a car. The steering wheel was loose. The steering wheel is anaphoric in spirit — the listener infers that it belongs to the car — but there is no explicit antecedent. Clark (1975) called this an associative or bridging inference.
Donkey anaphora. Every farmer who owns a donkey beats it. The pronoun it is bound by a quantifier (a donkey) that lies inside a relative clause; classical predicate logic struggled with this, motivating Discourse Representation Theory (Kamp 1981) and dynamic semantics.
Related variants
- Endophora vs exophora. Endophora is internal: anaphora and cataphora. Exophora points outside the text — pointing at a coffee cup and saying that.
- One-anaphora. I want a red one. The pro-form one stands for a noun phrase rather than a pronoun.
- Verb-phrase ellipsis. Mary left, and John did too. Did recovers the verb phrase from earlier — anaphora at the predicate level.
- Sloppy and strict identity. Mary loves her mother, and Sue does too. Strict: Sue loves Mary's mother. Sloppy: Sue loves her own mother. Both readings are licit.
- Logophors. Special pronouns that report a perspective (Bill said that he himself was wrong) — common in West African languages like Ewe.
- Resumptive pronouns. The man that I saw him yesterday — ungrammatical in standard English, normal in Hebrew and Arabic relative clauses.
Common pitfalls
- Conflating anaphora with coreference. Coreference is identity of reference; anaphora is a specific kind of dependence. Two full noun phrases can corefer without either being anaphoric.
- Assuming cataphora is symmetric to anaphora. It is highly restricted: typically a pronoun in a subordinate clause looking into the main clause.
- Treating zero anaphora as ellipsis. Ellipsis deletes recoverable material; zero anaphora is the unmarked anaphoric form in the language. Different mechanism.
- Mistaking exophora for poor writing. Look at this! in a tweet relies on the surrounding image — a normal pragmatic move, not a referential failure.
- Forgetting bridging. Many texts are coherent only because readers infer associative anaphora. Stripping these out makes the text feel disjointed.
- Translating zero subjects literally. A Japanese-to-English MT system that emits no subject pronouns produces ungrammatical English; one that always inserts he introduces gender bias.
Frequently asked questions
What's the difference between anaphora and cataphora?
Direction. Anaphora points back: the antecedent appears before the pronoun (Mary arrived. She was tired). Cataphora points forward: the postcedent appears after (When she arrived, Mary was tired). Both are forms of endophora — reference inside the text. Exophora, by contrast, points outside the text to the situational context (Look at that).
Why does English use cataphora at all?
Cataphora creates suspense and topic-fronting effects. Opening a paragraph with Although he denied it for years, the senator finally admitted... foregrounds the description before naming the actor. It is also licensed by syntax: a pronoun in a fronted subordinate clause can refer forward to a name in the main clause. Backward reference inside the same clause is usually blocked (*She kissed Mary cannot mean Mary kissed herself).
What is zero anaphora?
Reference to an established referent with no overt pronoun at all. Japanese, Mandarin, Korean and Spanish license it routinely. Japanese: 田中さんは来た。コーヒーを飲んだ。 — literally Tanaka came. Drank coffee. The second clause has no subject; the listener supplies Tanaka from the prior sentence. Up to half of all referring expressions in spoken Japanese discourse are zero. English requires an overt pronoun in matching contexts.
How is anaphora different from coreference?
Coreference is the broader semantic relation: two expressions pick out the same real-world entity. Anaphora is the specific dependence of one expression on another for its interpretation. Barack Obama gave a speech. The 44th president was eloquent is coreferential but not strictly anaphoric — the 44th president stands on its own. He was eloquent is anaphoric: the pronoun has no referent without the antecedent.
What is bound-variable anaphora?
When a pronoun is bound by a quantifier rather than referring to a fixed entity. Every student thinks she will pass — she varies with each student, like a logical variable. This is a syntactic relation governed by Binding Theory (c-command), not a discourse relation. Bridging cases like Mary thinks she will pass allow either the bound or referential reading.
How do listeners resolve ambiguous pronouns in real time?
Multiple cues compete: recency (last mentioned wins by default), salience (subjects beat objects), parallelism (same grammatical role across clauses), gender and number agreement, and world knowledge. Centering Theory (Grosz, Joshi & Weinstein 1995) models this as a transition between discourse states; computational coreference resolvers combine these features with neural context embeddings, reaching about 80 percent accuracy on benchmarks like OntoNotes.
Can cataphora occur across sentences?
Rarely, and only for stylistic effect. The standard restriction is that cataphora is confined to a single sentence with the pronoun in a subordinate or fronted clause. Cross-sentence cataphora exists in literary openings (He had not slept. The coffee was cold. Marlowe stood at the window) but readers experience it as a deliberate withholding, not a default option.