Evolution
Phylogenetic Trees
Reading the branching history of life
A phylogenetic tree is a branching diagram that shows the inferred evolutionary relationships among species, where the tips are the organisms being compared, the internal nodes are their common ancestors, and the branches trace lines of descent through time. You read it at the nodes — two species are close relatives because they share a recent ancestor, not because they look alike or sit near each other on the page. Trees are reconstructed from morphology or, increasingly, from aligned DNA and protein sequences using parsimony, maximum likelihood, and Bayesian inference, and they form the backbone of modern taxonomy, epidemiology, and the entire concept of the tree of life.
- First drawn byDarwin's notebook sketch, 1837
- Read atNodes (ancestors), not tips
- CladeAncestor + all its descendants
- Branch lengthSubstitutions/site or millions of years
- Built byParsimony, likelihood, Bayesian inference
- ConfidenceBootstrap / posterior > 95% = strong
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
What a phylogenetic tree actually shows
A phylogenetic tree is a hypothesis about history. It claims that the species sitting at its tips descend, through a series of splitting events, from shared ancestors that no longer exist. Every fork in the tree — every node — represents a speciation event: a moment when one ancestral lineage divided into two that then evolved independently. The lines between nodes are branches, each tracing an unbroken chain of parent-to-offspring inheritance. The single deepest node is the root, the common ancestor of everything on the tree, and it orients the whole diagram in time: everything flows from root toward tips, from the past toward the present.
The single most common mistake is reading the tree along the tips instead of through the nodes. The left-to-right order of the leaves carries no information — any node can be spun around its branch like a hanging mobile without changing a single relationship. A human does not become more closely related to a mushroom by being drawn next to it. Relatedness is purely about how recently two lineages last shared a node. Humans and chimpanzees are sister taxa because their lineages split most recently, roughly 6–7 million years ago; both connect to gorillas through a deeper node near 9 million years ago, and to orangutans through one deeper still, near 16 million years ago. The tree encodes nested sets of relationship, and those sets are the only thing that matters.
Clades: the unit of evolutionary truth
The key abstraction a tree gives you is the clade — a common ancestor plus every one of its descendants, a complete branch you could remove from the tree with a single cut. A clade is also called a monophyletic group, and it is the only kind of group that corresponds to real evolutionary history. Birds, crocodilians, and the extinct dinosaurs together form the clade Archosauria; mammals plus their nearest fossil relatives form Synapsida. Because birds are nested deep inside the reptile branch, the familiar group "reptiles" — lizards, snakes, turtles, crocodiles, but not birds — is paraphyletic: it shares an ancestor but quietly leaves some descendants out. Worse are polyphyletic groups like "warm-blooded animals," which lumps birds and mammals together even though warm-bloodedness evolved separately on each lineage. Cladistics — the discipline of classifying organisms strictly by branching order — insists that we name only clades, because only clades carry information you can trust.
This is why molecular phylogenetics quietly rewrote the textbooks. When ribosomal RNA sequences were first compared across life in the late 1970s, Carl Woese discovered that the prokaryotes were not one group but two deeply divergent clades — Bacteria and Archaea — and that Archaea are, surprisingly, our closer cousins. The old five-kingdom scheme gave way to the three-domain tree of life. The lesson is general: superficial resemblance is a treacherous guide, and the only reliable map of relationship is the one drawn by shared descent.
Homology, convergence, and how the signal is read
Trees are inferred from characters — heritable features that vary among taxa. The useful ones are shared derived characters (synapomorphies): novel traits that arose in a common ancestor and were inherited by all of its descendants. Hair is a synapomorphy that unites mammals; feathers unite birds and many dinosaurs; an amniotic egg unites the amniotes. Such traits are homologous — similar because of common ancestry. The danger is homoplasy, similarity that arose independently: the streamlined body of a shark, a dolphin, and an extinct ichthyosaur is convergent, not inherited, and a naïve analysis that treated "fish-shaped body" as a single character would wrongly pull them together. Distinguishing homology from convergence is the central craft of building a good tree, and it is exactly why molecular data are so powerful — a genome offers millions of independent characters, drowning out the handful that happen to converge.
Most reconstruction now runs on aligned DNA or protein sequences. Three families of method dominate. Maximum parsimony picks the tree that explains the data with the fewest evolutionary changes, an elegant idea that stumbles when lineages evolve at very different rates ("long-branch attraction"). Maximum likelihood and Bayesian inference instead adopt an explicit statistical model of how nucleotides substitute over time — accounting for the fact that transitions outnumber transversions, that some sites mutate fast and others are frozen — and search for the tree (and branch lengths) that best fit the observed sequences. Likelihood reports a single best tree; Bayesian methods sample a distribution of trees and report each clade's posterior probability. Either way, confidence is essential: a bootstrap analysis resamples the alignment hundreds of times and reports how often each clade reappears. A node with 99% bootstrap support is solid; one with 55% is barely better than a coin flip, and a tree drawn without support values should be read with suspicion.
What branch length means — and what it costs
Not every tree uses its branches the same way, and conflating the three types is a frequent error.
| Tree type | What branch length encodes | What you can read off it |
|---|---|---|
| Cladogram | Nothing — lengths are arbitrary | Only the branching order (who is related to whom) |
| Phylogram | Inferred genetic change (substitutions per site) | How much a lineage has evolved; rate differences between lineages |
| Chronogram (time tree) | Elapsed time, in millions of years | When lineages diverged; ages of clades |
Turning genetic distance into calendar time requires a molecular clock: the assumption that mutations accumulate at a roughly steady rate, so the number of substitutions between two sequences is proportional to the time since they diverged. Real clocks are not strictly constant — rates vary across lineages and genes — so modern "relaxed-clock" models let the rate drift along the tree, then anchor the absolute scale using calibration points, typically dated fossils or known geological events (the rise of the Isthmus of Panama, the breakup of Gondwana). This is how we estimate that placental mammals began radiating around the Cretaceous–Paleogene boundary 66 million years ago, or that SARS-CoV-2 lineages diverge at roughly 1–2 substitutions per genome every two weeks — a clock fast enough that public-health labs build new phylogenetic trees in near real time to track which variants are spreading.
Why trees matter: from epidemics to conservation
Phylogenetics is not a museum exercise. During an outbreak, sequencing pathogen genomes and placing them on a tree — phylodynamics — reveals who infected whom, how many separate introductions seeded an epidemic, and how fast transmission is accelerating; this is how Ebola, influenza, HIV, and COVID-19 transmission chains have been reconstructed. In medicine, the same logic tracks how a tumor's cell lineages diverge as cancer evolves and acquires drug resistance, and how antibiotic-resistance genes spread across bacterial clades. In conservation, trees quantify phylogenetic diversity: protecting a lone species perched on a long, isolated branch — a tuatara, a coelacanth, an echidna — preserves far more unique evolutionary history than protecting one of many near-identical tips. Trees also underpin comparative biology itself: any claim that a trait "evolved to do X" is really a claim about where on a tree that trait appeared, and how many times.
The image goes back to a single page in Darwin's 1837 notebook, scrawled above the words "I think," showing a branching diagram of descent with modification. The only figure in On the Origin of Species (1859) is a tree. Nearly two centuries later the metaphor has been complicated — horizontal gene transfer braids the bacterial branches into a partial web, and hybridization adds reticulations even among animals — but the core insight is unchanged: all of life is connected by an actual, physical genealogy, and the tree is our best map of it.
Frequently asked questions
How do you read a phylogenetic tree?
Read it at the nodes, not the tips. The tips (leaves) are the taxa being compared; each internal node is a hypothesized common ancestor where one lineage split into two. Two tips are more closely related the more recently they share a node, regardless of how close they sit on the page. The horizontal left-to-right order of tips is arbitrary — you can rotate any node around its branch like a mobile without changing the relationships. To find the relatives of a species, trace back to its most recent node and read off everything descended from it.
What is a clade?
A clade (or monophyletic group) is a single common ancestor plus every one of its descendants — a complete branch you could snip off the tree with one cut. Birds + crocodiles + dinosaurs form the clade Archosauria. Groups that leave descendants out are paraphyletic (e.g., "reptiles" excluding birds), and groups assembled from unrelated lineages by superficial resemblance are polyphyletic (e.g., "warm-blooded animals"). Modern classification aims to name only clades, because only clades reflect real evolutionary history.
Do branch lengths mean anything?
It depends on the tree. In a cladogram, branch lengths are meaningless — only the branching order matters. In a phylogram, branch length is proportional to inferred genetic change (substitutions per site). In a chronogram (a time-calibrated tree), branch length is proportional to elapsed time in millions of years, usually derived from a molecular clock anchored to fossil or biogeographic calibration points. Always check the scale bar before interpreting how long a branch looks.
How are phylogenetic trees built?
First a data matrix is assembled — aligned DNA or protein sequences, or scored morphological characters. Then a tree-building method searches the space of possible topologies: maximum parsimony picks the tree requiring the fewest evolutionary changes; maximum likelihood and Bayesian inference fit an explicit model of how sequences mutate and find the tree that best explains the data. Confidence is measured with bootstrap percentages (resampling the data) or Bayesian posterior probabilities, with values above ~95% considered strong support.
Why do gene trees sometimes disagree with species trees?
Because the history of a single gene is not always the history of the species carrying it. Incomplete lineage sorting (ancestral polymorphism passing through speciation events), horizontal gene transfer (genes jumping between lineages, common in bacteria), and hybridization can all make one gene's tree differ from the organism's true tree. Phylogenomics addresses this by combining hundreds or thousands of genes and using coalescent methods that explicitly model the spread of gene trees inside the species tree.
What is the root of a tree and how is it placed?
The root is the single deepest node — the common ancestor of everything on the tree — and it sets the direction of time. Most tree-building methods produce an unrooted tree (relationships without a starting point); the root is added separately, usually by including an outgroup, a taxon known to lie outside the study group. The branch connecting the outgroup to the rest marks the root. Without a root, you can see who is related to whom but not who came first.