Discourse
Cohesion vs Coherence
Surface ties on the page versus sense in the head
Cohesion is the visible web of pronouns, conjunctions, and lexical chains that ties a text together at the surface. Coherence is the underlying sense the reader constructs by integrating those signals with world knowledge. Cohesion lives in the text; coherence lives in the mind reading it.
- Cohesion isSurface, in the text
- Coherence isConceptual, in the mind
- Cohesion theoryHalliday & Hasan 1976
- Coherence theoryvan Dijk 1980; Sanders et al. 1992
- DecouplingEither can exist without the other
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The distinction
It is tempting to treat cohesion and coherence as synonyms — both have to do with a text "hanging together." Linguistically they are sharply different. M.A.K. Halliday and Ruqaiya Hasan's Cohesion in English (1976) reserves cohesion for the network of surface devices that tie sentences to each other: pronouns, conjunctions, repeated content words, ellipsis, and substitution. Coherence, in contrast, is the property of being meaningful as a unified discourse — a property that depends on the reader's mental model, world knowledge, and inferential capacity.
The cleanest demonstration of the difference is that you can have either without the other:
- Cohesive but incoherent. "The car is red. It runs on solar physics. Therefore, the dictionary swims." There are pronouns (it) and conjunctions (therefore), so cohesion is high. There is no coherent topic, so coherence is zero.
- Coherent but cohesion-light. "Sarah missed the bus. The interview was cancelled." No pronoun, no conjunction; cohesion is minimal. Yet readers immediately bridge: missing the bus caused her to be late, the interviewer cancelled. Coherence is high.
The two notions are connected — heavy cohesion usually accompanies coherent text, since writers signpost their reasoning — but the connection is empirical, not definitional.
Halliday & Hasan's five cohesive ties
Halliday and Hasan classify English cohesion into five types:
- Reference. One element points to another for its interpretation. Personal (he, she, it), demonstrative (this, that, these), comparative (another, similar, the same). "John lost his keys. He was furious."
- Substitution. One element replaces another to avoid repetition. "Do you want the red shirt or the blue one?" (one substitutes for shirt).
- Ellipsis. An element is omitted but recoverable from prior text. "Did you finish?" — "I tried to ⌀."
- Conjunction. An explicit logical or temporal link between clauses. and, but, however, therefore, then, meanwhile, furthermore.
- Lexical cohesion. Repetition or semantic relatedness of content words across the text. Reiteration (car… car… vehicle) and collocation (doctor… patient… surgery).
Their book lists hundreds of subtypes and works through each on real text. The result is an inventory of the surface mechanisms English uses to weave clauses into discourse. Crucially, all five mechanisms are visible — you can find them with a careful re-read or a regex.
Coherence: the models
Where cohesion is mostly inventory, coherence is mostly inference. Several frameworks try to formalise it.
- Macrostructure (van Dijk 1980). A text's coherence is captured by its global semantic structure — the gist a reader could give if asked to summarise. Van Dijk specifies macro-rules (deletion of irrelevant detail; generalisation of similar propositions; construction of a proposition that subsumes a sequence) by which readers derive the macrostructure from the explicit microstructure.
- Coherence relations (Hobbs 1985; Mann & Thompson 1988). Discourse is held together by relations like cause, contrast, elaboration, exemplification, evidence, which connect spans of text. Rhetorical Structure Theory (RST) is the most developed formalism in this family; an RST tree assigns every clause to a relation with one or more "satellite" clauses.
- Centering theory (Grosz, Joshi & Weinstein 1995). Local coherence is modelled by tracking the entity in focus. Each utterance has a "backward-looking center" (most salient referent from the prior utterance) and "forward-looking centers" (referents available for the next). Sequences that maintain the same center are most coherent; abrupt shifts are jarring.
- Mental-model theory (Garnham 1987; Zwaan & Radvansky 1998). Coherence emerges as the reader updates a situation model with information about who, when, where, why. A text is coherent when each new sentence integrates cleanly into the model.
- Cognitive approaches to coherence relations (Sanders, Spooren & Noordman 1992). Coherence relations decompose into primitive cognitive parameters: basic operation (causal vs additive), source of coherence (semantic vs pragmatic), order, and polarity. The taxonomy generates 12 to 16 relations from these features.
Comparing the two notions
| Property | Cohesion | Coherence |
|---|---|---|
| Locus | In the text — surface forms | In the reader — mental model |
| Detection | Identifiable by inspection of forms | Diagnosed by reader judgement / behavioural data |
| Theory | Halliday & Hasan 1976 | van Dijk 1980; Mann & Thompson 1988; Sanders et al. 1992 |
| Devices / instruments | Reference, substitution, ellipsis, conjunction, lexical cohesion | Coherence relations, macrostructure, situation models |
| Reader-dependent? | No (the ties are there or they aren't) | Yes — relies on world knowledge, schemas, inference |
| Failure mode | Dangling pronouns, broken connectives, missing antecedents | Topic drift, contradictions, missing causal links |
| Improvable by | Editing surface signposts | Reorganising argument, adding bridging information, matching reader background |
A worked example
Consider three short paragraphs.
Paragraph A (high cohesion, high coherence):
Sarah missed the bus this morning. As a result, she arrived at the office an hour late. By then, her interview candidate had already gone home.
Cohesive devices: she (reference), as a result (conjunction, causal), by then (reference + conjunction, temporal). Coherence: each clause integrates cleanly into the situation model — bus → late arrival → missed meeting.
Paragraph B (low cohesion, high coherence):
Sarah missed the bus this morning. The interview was cancelled.
No pronouns, no conjunctions linking the sentences. Cohesion minimal. Yet readers infer the causal bridge: missing bus → late → no time for interview → cancelled. Coherence is high; the reader supplies it.
Paragraph C (high cohesion, low coherence):
Sarah missed the bus this morning. As a result, the colour blue contains seven syllables. By then, the dictionary had decided.
The same connectives as A. But the propositions can't be integrated into a unified situation model. Cohesion is high, coherence is zero. The text is — in the technical sense — non-discourse.
Cross-linguistic notes
- Pro-drop languages. Spanish, Italian, Japanese, and Mandarin can omit subject pronouns whose reference is clear from context — a form of cohesion through ellipsis that English uses far more sparingly. Tracking the discourse referent then leans more on verb morphology and topic continuity than on overt pronouns.
- Topic-prominent languages. Mandarin and Japanese organise discourse around a topic that may persist across many sentences without re-mention. English requires more frequent overt re-reference.
- Connectives. The inventory of conjunctions varies: French distinguishes donc (so), alors (so/then), ainsi (thus); German layers also, deshalb, daher, somit. The basic coherence relations are the same; the lexical signposts differ.
- Switch-reference. Many indigenous languages of the Americas (e.g. Choctaw, Quechua) grammaticalise whether the next clause shares a subject with the current one — a cohesion device built into verb morphology rather than separate connectives.
- Discourse-configurational languages. Hungarian, Czech, and Russian use word order to mark topic and focus, which serves both information structure and discourse-level cohesion.
Why the distinction matters
- Writing pedagogy. Students often hear "improve the flow"; teasing apart cohesion and coherence gives concrete handles. Adding connectives is a cohesion fix; reorganising the argument is a coherence fix.
- NLP and summarisation. Coherence-driven approaches (RST trees, situation models) outperform purely cohesion-driven ones at extractive vs abstractive summarisation. Modern LLMs implicitly learn coherence relations as part of training.
- Reading comprehension. Reader-text fit explains why an expert finds a paper coherent that confuses a novice — same cohesion, different background knowledge.
- Translation. A translation can preserve cohesion (translate every however) and lose coherence (because the reader's cultural schemas are different).
- Dementia and aphasia diagnostics. Spontaneous-speech cohesion is one of the earliest disrupted skills in primary progressive aphasia; coherence breaks down later. Quantitative measures (Wright et al. 2014) use this for assessment.
Common pitfalls
- Treating "cohesion" and "coherence" as the same word. They are technical terms with non-overlapping definitions in linguistics. Using them interchangeably loses the diagnostic power.
- Believing more connectives = better writing. Cohesion alone cannot rescue an incoherent argument. Worse, over-cohesive prose ("Firstly… secondly… moreover… in conclusion…") often signals weak underlying structure.
- Assuming coherence is "in the text". It is jointly constructed by the text and the reader's mind. The same prose is coherent for one reader and not for another.
- Confusing coherence with elegance or rhetoric. Coherence is a binary or graded property of being interpretable as unified discourse; elegance is style.
- Ignoring lexical cohesion. Pronouns and connectives are the showy ties; lexical chains (repeated and semantically related content words) are the most pervasive.
Frequently asked questions
What is the difference between cohesion and coherence?
Cohesion is the surface network of explicit linguistic links — pronouns, conjunctions, lexical repetition — that tie sentences together. Coherence is the underlying conceptual sense a reader constructs as they integrate the text with their world knowledge. Cohesion is in the text; coherence is in the mind.
Can a text be cohesive but incoherent?
Yes. A string of sentences can have heavy cohesive ties yet make no sense ("The car is red. It runs on solar physics. Therefore, the dictionary swims."). Conversely, a text can be perfectly coherent with very few cohesive markers — short poems and lab reports rely on the reader to bridge.
What are Halliday and Hasan's five types of cohesion?
In Cohesion in English (1976), they identify reference, substitution, ellipsis, conjunction, and lexical cohesion. Together these five types of tie account for nearly all the surface devices that make English text hang together.
What is van Dijk's macrostructure?
Teun van Dijk's term for the global semantic structure of a text — the gist or theme a reader extracts after applying macro-rules (deletion, generalization, construction) to the explicit propositions. Macrostructure is one formal account of coherence.
Why does coherence depend on the reader?
Coherence requires the reader to bring world knowledge, schemas, and inferences to the text. The same paragraph can be coherent for an expert and incoherent for a novice because they bring different background assumptions to bridge the explicit content.
Is cohesion enough for good writing?
No. Cohesion is necessary but not sufficient. A well-edited text needs cohesive ties strong enough to guide the reader, plus an underlying coherence — a logical and conceptual progression. Cramming in "however", "therefore", and "in conclusion" will not rescue an incoherent argument.