Microbiology

Viral Capsid Self-Assembly

Dozens to thousands of identical proteins spontaneously snap into a symmetric icosahedral shell that packages the genome — no enzymes, no ATP

Viral capsid self-assembly is the spontaneous process by which many copies of one or a few identical coat-protein subunits associate into a closed, highly symmetric protein shell — almost always an icosahedron — that packages and protects the viral genome. A tiny virus genome can't afford to encode a giant single-molecule container, so it encodes one small subunit (often 25–60 kDa) and uses it 60 × T times, where the triangulation number T = 1, 3, 4, 7, 13, 16… fixes the allowed sizes (60 subunits in parvovirus up to 960 in herpesvirus). Assembly is driven purely by weak, reversible interactions — hydrophobic burial, hydrogen bonds, salt bridges — so mis-built intermediates fall apart and try again until the shell reaches its error-free, lowest-energy state. Fraenkel-Conrat and Williams proved the point in 1955 by reconstituting infectious tobacco mosaic virus from nothing but purified RNA and coat protein, and the same trick now builds vaccine virus-like particles and gene-therapy vectors in the clinic.

  • Subunit count60 × T (T = 1, 3, 4, 7…)
  • Smallest shell~18–20 nm (parvovirus, T=1)
  • Largest icosahedral100+ nm (herpesvirus, T=16)
  • Energy / enzymesNone — spontaneous
  • In-vitro timescaleSeconds to minutes
  • Geometry ruleCaspar–Klug 1962

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why viral capsid self-assembly matters

  • It is the cheapest way to wrap a large volume in a small amount of code. A poliovirus genome is about 7,400 nucleotides — barely enough for a dozen proteins. Encoding one 33 kDa subunit and reusing it 60 times (the principle of genetic economy, Crick & Watson 1956) is the only way to build a 30 nm container around that genome without the gene for the container being bigger than the genome itself.
  • It is a vaccine platform worth billions. The HPV vaccines Gardasil and Cervarix are nothing but the L1 coat protein self-assembled into empty virus-like particles (VLPs) — no DNA inside, no infectivity, but the same surface the immune system sees on a real virion. The hepatitis B surface-antigen vaccine works the same way. VLP vaccines have been given to hundreds of millions of people.
  • It is the engine of gene therapy. Adeno-associated virus (AAV) capsids self-assemble around a therapeutic gene and deliver it to human cells; AAV vectors carry the approved therapies for spinal muscular atrophy (Zolgensma) and inherited retinal disease (Luxturna).
  • It is a brand-new antiviral target. Capsid assembly modulators are a 2020s drug class that don't block an enzyme — they poison the geometry. HIV's lenacapavir (approved 2022) binds the capsid protein and forces it to mis-assemble; hepatitis B core-protein modulators do the same. Break the lattice and the virus can't build a working shell.
  • It is a model system for all of structural self-organization. The same physics — many identical parts, weak reversible bonds, an energy landscape with one deep minimum — explains protein folding, microtubule assembly, and designed protein nanocages. Capsids are the textbook case because their symmetry makes the geometry exactly solvable.
  • It defined an era of structural biology. Tomato bushy stunt virus (Harrison, 1978) and the HK97 bacteriophage chainmail capsid (Wikoff & Johnson, 2000) were landmark atomic structures. Today single-particle cryo-EM resolves entire 960-subunit capsids to better than 3 Å, turning the abstract triangulation rules into visible atoms.

How a capsid assembles itself

Start with a soup of freshly translated coat-protein subunits in the host cytoplasm. Each subunit is small — typically 200–400 amino acids — and folds into a compact domain, very often the eight-stranded jelly-roll β-barrel that recurs across unrelated virus families (picornaviruses, plant viruses, parvoviruses, adenovirus). The barrel presents one flat-ish face and edges studded with complementary chemical patches: hydrophobic ridges, charged residues for salt bridges, and backbone donors and acceptors for hydrogen bonds.

Assembly proceeds through a nucleation-and-growth pathway, like crystallization. First a small unstable nucleus forms — often a trimer or a pentamer of subunits, or a coat-protein dimer bound to a packaging signal on the genome. Below a critical nucleus the cluster is more likely to fall apart than grow; above it, adding the next subunit is downhill, and the shell sweeps to completion cooperatively. Because no single interface is strong (often just 3–8 kcal/mol of free energy per contact), wrongly placed subunits dissociate and re-bind. This error correction by reversibility is the whole reason a process with no proofreading enzyme reliably yields one perfect closed shell rather than a heap of misfolded junk.

The closure is forced by geometry. To make a finite closed shell from triangular units you must include exactly 12 pentamers — 12 points where five subunits meet instead of six — a hard constraint from Euler's theorem (the same reason a soccer ball has exactly 12 pentagonal panels). The protein therefore has to adopt two slightly different conformations: a flatter one at the hexamers and a more curved one at the pentamers. Many capsid proteins do this with a flexible "molecular switch" — a movable N-terminal or C-terminal arm, or a wedge of ordered RNA — that toggles the local curvature. That switch is what lets quasi-equivalence work: chemically identical subunits sit in geometrically non-identical seats by bending a hinge.

Two big classes of virus deviate from pure spontaneous folding. Tailed dsDNA bacteriophages and herpesviruses first build an empty procapsid around internal scaffolding proteins, then eject the scaffold, then run a powerful ATP-driven terminase motor at a unique portal vertex to pump a single naked genome into the rigid shell against pressures of tens of atmospheres. Helical viruses like tobacco mosaic virus skip the icosahedron entirely: subunits stack in a helix directly along the RNA, one base of structure laid down per turn, so the genome's length sets the rod's length exactly.

The players and conditions

  • Coat / capsid protein (CP). The one repeated subunit. Usually one gene product; some viruses use two or three different proteins (poliovirus uses VP1–VP4 cut from one precursor). The jelly-roll β-barrel fold is the most common architecture.
  • The genome. ssRNA, dsRNA, ssDNA, or dsDNA. In many RNA viruses the genome is not cargo loaded last — it is a structural scaffold that nucleates and templates assembly through packaging signals.
  • Packaging / nucleation signal. A specific sequence and fold — TMV's origin-of-assembly loop, MS2's operator stem-loop, the φ-sites of HIV, the pac and cos sites of DNA phages — that the CP recognizes to start assembly on the right nucleic acid.
  • Scaffolding proteins (large viruses only). Transient internal chaperones that template the procapsid shape, then leave or are proteolyzed before DNA enters.
  • The portal and terminase (tailed phages, herpesviruses). A 12-subunit ring at one special vertex plus an ATPase motor that translocates DNA.
  • Solution conditions. pH, ionic strength, and divalent cations (Mg²⁺, Ca²⁺) tune the interface strength. TMV protein forms disks at pH 7 and the helical lock at pH ~5–6; many capsids assemble only below a threshold pH or above a threshold protein concentration.

Caspar–Klug geometry: the allowed shells

In 1962 Donald Caspar and Aaron Klug explained why capsid sizes are quantized. An icosahedral shell is built from 60 × T subunits, where the triangulation number T counts the quasi-equivalent positions per asymmetric unit. T is not free: it must equal h² + hk + k² for non-negative integers h, k. Each capsid has exactly 12 pentamers and 10 × (T − 1) hexamers.

T number(h, k)Subunits (60×T)PentamersHexamersExample virusDiameter
T = 1(1,0)60120Parvovirus, STNV, AAV~18–26 nm
pseudo T = 360 (3 distinct CPs)12Poliovirus, rhinovirus~30 nm
T = 3(1,1)1801220Cowpea mosaic, nodavirus~28–30 nm
T = 4(2,0)2401230Hepatitis B core, Sindbis~32–45 nm
T = 7(2,1)4201260Papillomavirus, HK97 phage~55–60 nm
T = 13(3,1)78012120Rotavirus, bluetongue inner~70–80 nm
T = 16(4,0)96012150Herpesvirus capsid~125 nm

Note the special cases. Pseudo-T = 3 picornaviruses look like a T = 3 shell but use three different proteins (VP1, VP2, VP3) in the three positions instead of one protein in three conformations — same lattice, more genes. Giant viruses such as Mimivirus push to T = 1141 and beyond, and HIV breaks icosahedral symmetry entirely: its core is a fullerene cone of about 250 capsid (CA) hexamers closed by exactly 12 pentamers — 7 at the wide end, 5 at the narrow end — which is why it's a cone rather than a sphere.

Quantified figures: sizes, forces, and timescales

  • Subunit size: coat proteins are commonly 200–400 residues, ~25–60 kDa. TMV CP is 158 residues (17.5 kDa); HBV core is 183 residues.
  • Copy number: 60 (T=1) to 960 (T=16) for icosahedral capsids; ~2,130 identical subunits in a single TMV helical rod.
  • Shell dimensions: 18 nm (parvovirus) to 125 nm (herpesvirus) across; capsid walls are typically 2–5 nm thick. TMV is a hollow rod 300 nm long, 18 nm wide, with a 4 nm central channel.
  • Interface energy: roughly 3–8 kcal/mol of free energy per subunit–subunit contact — deliberately weak so errors self-correct; total shell stabilization is the sum over hundreds of such contacts.
  • Speed: in vitro reassembly of TMV or simple RNA virus capsids completes in seconds to a few minutes once nucleated; HBV capsids assemble on the order of seconds at physiological protein concentration.
  • Packaging pressure (DNA phages): the φ29 and T4 terminase motors pump DNA against internal pressures of ~50–60 atmospheres, and the φ29 motor is one of the strongest known, stalling near 50–60 piconewtons of force.
  • Packaging density: dsDNA inside a phage head reaches ~500 mg/mL — near the density of crystalline DNA — which is why so much pressure is needed to load it and why it ejects so fast on infection.
  • Symmetry: the icosahedral point group has 60-fold rotational symmetry (6 five-fold, 10 three-fold, 15 two-fold axes) — the maximum for a finite assembly of asymmetric subunits.

Self-assembly vs motor-driven assembly

PropertySpontaneous self-assembly (simple capsids)Scaffold + motor assembly (large dsDNA viruses)
ExamplesTMV, STNV, MS2, poliovirus, HBV, AAVT4, λ, φ29 phages; herpesvirus
Energy sourceNone — thermodynamic (entropy + weak bonds)ATP-burning terminase motor
Genome timingCo-assembled; often nucleates the shellEmpty procapsid first, genome pumped in last
Scaffolding proteinsUsually noneRequired, then removed/proteolyzed
Genome state insideLoosely condensed with the proteinTightly spooled, ~50–60 atm pressure
In-vitro reconstitutionYes — RNA + protein in a tube (TMV, 1955)No — needs portal, motor, and ATP
Error correctionReversible binding; mis-assembly falls apartPortal geometry + headful sensing
Typical T / sizeT = 1–4, 18–45 nmT = 7–16+, 55–125 nm

Real organisms, diseases, and named examples

  • Tobacco mosaic virus (TMV). The original self-assembly proof: Fraenkel-Conrat and Williams (1955) mixed purified TMV RNA with purified coat protein and recovered infectious rod-shaped virus, showing the assembly instructions live in the protein, not in any enzyme. Klug's later work on TMV and capsids won the 1982 Nobel Prize in Chemistry.
  • Poliovirus and rhinovirus (picornaviruses). Pseudo-T = 3 shells of VP1–VP4. Pleconaril and related "WIN compounds" wedge into a hydrophobic pocket under the canyon floor and rigidify the capsid so it can't uncoat — capsid-targeted antivirals that predate the modern assembly-modulator class.
  • Human papillomavirus (HPV). The L1 protein self-assembles into 72-pentamer T = 7 VLPs; expressed in yeast or insect cells with no viral DNA, these empty shells are Gardasil and Cervarix.
  • Hepatitis B virus (HBV). The core antigen self-assembles into T = 3 and T = 4 capsids; core-protein allosteric modulators (CpAMs) speed or mis-direct assembly and are in trials as functional-cure antivirals.
  • HIV-1. The Gag-derived capsid (CA) protein builds a non-icosahedral fullerene cone; lenacapavir (Sunlenca, 2022) binds CA and disrupts both assembly and uncoating, a long-acting twice-yearly drug.
  • Bacteriophage HK97. Its T = 7 capsid forms covalent isopeptide "chainmail" rings that catenate the subunits like molecular knights' armor — a unique stabilization strategy discovered from its 2000 crystal structure.
  • Adeno-associated virus (AAV). A T = 1 parvovirus capsid (VP1/VP2/VP3) self-assembled around a therapeutic transgene — the workhorse of approved gene therapies.

Common misconceptions and pitfalls

  • "The subunits are glued by strong, permanent bonds." No — the interfaces are deliberately weak (a few kcal/mol each). Weakness is the feature: it lets mis-placed subunits unbind and the shell reach its one correct minimum. Strong irreversible bonds would freeze in errors.
  • "Capsids are spheres." They look round at low resolution but are icosahedra (or helices, or HIV's cone). The 12 pentamers and the discrete T-number sizes are the giveaway that the geometry is faceted, not spherical — you cannot tile a true sphere with identical units.
  • "Any number of subunits can form a shell." Only 60 × T with T = h² + hk + k² gives a stable icosahedral closure. You can't build a stable 100-subunit or 300-subunit icosahedral capsid; the math forbids it.
  • "Quasi-equivalent subunits are different proteins." Usually they are the same protein in slightly different conformations, switched by a flexible arm or a bound RNA wedge. (Pseudo-T = 3 picornaviruses are the exception that uses genuinely different proteins.)
  • "Assembly needs ATP and enzymes." Simple capsids need neither — they are spontaneous. Only large dsDNA viruses use an ATP motor, and even then only to pump the genome, not to build the protein shell.
  • "The genome is loaded into a finished sphere like filling a balloon." True for tailed phages and herpesviruses, but for most RNA viruses the genome co-assembles and actively nucleates and templates the shell — protein and nucleic acid build each other simultaneously.
  • "A virus-like particle is infectious." A VLP has the same protein shell but no genome inside, so it cannot replicate — which is exactly why empty self-assembled capsids make safe vaccines.

Frequently asked questions

Why do viruses build their shells from identical repeated subunits?

It comes down to genetic economy, an argument made by Crick and Watson in 1956. A virus genome is tiny — poliovirus is about 7,400 nucleotides, enough to code for only a handful of proteins. A protein large enough to enclose that genome as a single molecule would need a gene far larger than the genome it had to protect, which is impossible. So instead of coding one giant container, the virus codes one small subunit (often 200–300 amino acids, roughly 25–60 kDa) and uses it hundreds of times. Building a shell from many identical copies of one gene product is the only way to wrap a large volume with a small amount of genetic information. The repeated use of one shape also forces symmetry: identical objects bonding in identical ways naturally tile into a regular, closed structure — which is why almost all spherical viruses are icosahedral.

Why are most viral capsids icosahedral rather than spherical?

An icosahedron is the closed shell with the most identical positions you can build from repeating units while keeping every subunit in a near-identical (quasi-equivalent) chemical environment. It has 20 triangular faces, 12 vertices, and 60-fold rotational symmetry — the highest point-group symmetry possible for a finite object made of asymmetric units. A perfect sphere cannot be tiled by identical flat subunits without distortion, but 60 identical subunits (3 per triangular face) close exactly into an icosahedron, which is why the smallest capsids (T = 1) have exactly 60 copies. Larger viruses keep the icosahedral framework but add more subunits in groups of 60, accepting slightly different local environments — the quasi-equivalence that Caspar and Klug described in 1962.

What is the triangulation number T?

The triangulation number T counts how many quasi-equivalent positions each icosahedral asymmetric unit must occupy, and a capsid with triangulation number T is built from exactly 60 × T protein subunits arranged as 12 pentamers and 10 × (T − 1) hexamers. T is not an arbitrary integer: it must equal h² + hk + k² for non-negative integers h and k, giving the allowed series 1, 3, 4, 7, 9, 12, 13, 16, 19, 21, 25, and so on. So a T = 1 capsid has 60 subunits (parvovirus), T = 3 has 180 (many plant viruses, picornaviruses pseudo-T = 3), T = 7 has 420 (papillomavirus, HK97 bacteriophage), and T = 16 has 960 (herpesvirus). Larger T means a bigger shell that can package a larger genome, but it also requires the subunits to flex between flat (hexamer) and curved (pentamer) conformations — a built-in switch that the protein achieves with a conformational arm or a wedge-shaped insertion.

Does capsid assembly require energy or enzymes?

For most simple capsids, no. Spontaneous self-assembly is driven by thermodynamics — the burial of hydrophobic side chains at the subunit-subunit interfaces releases ordered water molecules and increases the entropy of the system, while hydrogen bonds and salt bridges add enthalpic stability. The total free-energy gain per subunit interface is modest, often only a few kcal/mol (roughly 3–8 kcal/mol), which is exactly the point: bonds weak enough to break let mis-assembled intermediates fall apart and try again, so the shell finds its lowest-energy, error-free state. Purified tobacco mosaic virus coat protein and RNA reassemble into infectious virus in a test tube with nothing but the right pH and salt. Some large or complex viruses are different: tailed bacteriophages and herpesviruses build an empty procapsid with scaffolding proteins, then use an ATP-burning terminase motor — one of the strongest molecular motors known, generating tens of piconewtons — to pump the DNA inside against enormous internal pressure.

How does the capsid know which genome to package?

Many viruses use a packaging signal — a specific RNA or DNA sequence and secondary structure that the coat protein recognizes and binds, nucleating assembly around the correct genome and excluding host nucleic acid. In tobacco mosaic virus an internal RNA loop called the origin-of-assembly sequence inserts into the first protein disk and zips the helix outward in both directions. Bacteriophage MS2 uses a stem-loop hairpin (the operator) that binds a coat-protein dimer with nanomolar affinity. In many small RNA viruses, dozens of weak, sequence-degenerate packaging signals studded along the genome act together as a cooperative network, so the genome behaves like a folded scaffold that the coat protein wraps. dsDNA phages and herpesviruses instead build the empty shell first and then thread a single genome through a unique portal vertex, cutting the DNA at defined sequences (pac or cos sites) to load exactly one genome length.

Why does self-assembly matter for medicine and nanotechnology?

Because the instructions for building a perfect nanoscale container are encoded entirely in one protein's amino-acid sequence, capsids are a template for designed nanomaterials and a target for drugs. Virus-like particles (VLPs) — capsids self-assembled without any genome — are the basis of the human papillomavirus vaccines Gardasil and Cervarix (self-assembled L1 pentamers) and the hepatitis B vaccine, and they were the carrier scaffold concept behind some COVID-era vaccine platforms. Adeno-associated virus (AAV) capsids self-assemble around therapeutic genes and are the leading delivery vehicle in approved gene therapies. On the drug side, capsid assembly modulators are a new antiviral class: HIV's capsid inhibitor lenacapavir (approved 2022) binds the capsid protein and either over-stabilizes or distorts the lattice so the virus mis-assembles, and similar compounds target hepatitis B core protein assembly.