Molecular Biology
Spliceosome
5 snRNPs + ~150 proteins — the catalytic ribonucleoprotein that excises introns
The spliceosome is a 3-megadalton ribonucleoprotein assembly that removes introns from pre-mRNA. It contains five small nuclear RNAs (U1, U2, U4, U5, U6) packaged with seven Sm or Lsm proteins each as snRNPs, plus roughly 150 additional proteins that join during the assembly cycle. Splicing proceeds through two transesterification steps: first, the 2'-OH of a conserved branch point adenosine attacks the 5' splice site, releasing the upstream exon and forming a lariat intermediate; second, the freed 3'-OH of the upstream exon attacks the 3' splice site, joining the two exons and releasing the lariat intron. The catalytic core is RNA — U6 snRNA holds the two catalytic Mg2+ ions in geometry analogous to group II self-splicing introns, making the spliceosome one of two known cellular ribozymes (alongside the ribosome).
- Mass~3 MDa
- snRNAs5 (U1, U2, U4, U5, U6)
- Proteins~150
- Catalysis2 transesterifications, RNA-based
- Splicing rate~30 s per intron in vivo
- DiscoveredSharp & Roberts 1977 (Nobel 1993)
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
Why the spliceosome matters
- It is essential for almost every human gene. ~95% of human protein-coding genes contain introns; the average gene has ~8 exons. Without splicing, almost no functional mRNA reaches the cytoplasm. A typical human cell splices ~5,000 introns per minute during steady-state transcription.
- It generates the proteome diversity we have. The human genome encodes ~20,000 protein-coding genes but produces >100,000 distinct protein isoforms via alternative splicing. The Drosophila Dscam gene alone has 38,016 possible isoforms via mutually exclusive exon choice — more than the entire fly proteome.
- It is a ribozyme — RNA does the chemistry. Cryo-EM structures from Yigong Shi, Kiyoshi Nagai, and Reinhard Lührmann (2015-2018) confirmed that the two catalytic Mg2+ ions are coordinated by U6 snRNA atoms, in a geometry that mirrors group II self-splicing introns. Proteins control fidelity and conformational change; RNA does catalysis.
- It is a major disease target. SF3B1 mutations occur in ~25% of myelodysplastic syndromes; U2AF1 mutations in ~10%. SMN1 loss in spinal muscular atrophy is treated with nusinersen, a $750k/year antisense oligonucleotide that redirects SMN2 splicing to restore exon 7 inclusion — one of the highest-priced drugs ever launched.
- Cotranscriptional splicing is the rule. ~75-80% of introns are spliced before RNA Pol II finishes transcribing the gene. The C-terminal domain of Pol II recruits splicing factors directly. Slowing Pol II elongation (e.g., with low-dose α-amanitin) shifts splice site choice toward more proximal sites.
- It evolved from a self-splicing intron. Group II introns in mitochondria and chloroplasts use the same chemistry — 2'-OH attack from a branchpoint adenosine, lariat intermediate — without any external machinery. The spliceosome is widely thought to descend from a fragmented group II intron in the archaeal ancestor of eukaryotes, with U2/U6 corresponding to the catalytic domain V/VI.
- Splicing is one of the major evolutionary drivers in vertebrates. ~7% of human alternatively spliced exons are species-specific; alternative splicing differences contribute more proteome divergence between humans and chimps than amino-acid sequence differences.
Common misconceptions
- Splicing is rare. The opposite — almost every human gene is spliced. The Hox gene cluster, histone genes, and a handful of intronless genes are exceptions. Single-exon genes total <5% of the protein-coding genome.
- The spliceosome is one stable complex. It assembles fresh on every intron and disassembles after release. Five snRNPs, plus dozens of accessory proteins, recruit and depart in a defined order through E, A, B, B-activated, C, C*, P, and ILS complexes — eight distinct assembly states resolved by cryo-EM.
- Proteins do the catalysis. Cryo-EM and earlier biochemistry (Manley, Steitz, Padgett labs) place the catalytic Mg2+ ions on RNA atoms — specifically U6 snRNA. Ablating U6 stops splicing; ablating any single protein typically only stalls assembly. The spliceosome is fundamentally a ribozyme.
- Introns are junk. Many introns harbor regulatory elements: enhancers, miRNAs (one-third of human miRNAs are encoded in introns), snoRNAs, and entire alternative exons. Intron length correlates with developmental regulation; long introns (~50-100 kb) provide time and substrate for cotranscriptional regulation.
- The 5' splice site is always GU and 3' is always AG. The vast majority follow this — the GT/AG rule (DNA) or GU/AG (RNA). But ~0.7% of human introns use AT/AC and are processed by a minor U12-dependent spliceosome with U11, U12, U4atac, U5, U6atac. Mutations in U4atac cause microcephalic osteodysplastic primordial dwarfism type 1 (MOPD1).
- Alternative splicing is just exon skipping. Five major modes exist: cassette exon (skipped or included), alternative 5' splice site, alternative 3' splice site, mutually exclusive exons, and intron retention. Intron retention is now recognized as a major regulated mode (10-20% of human transcripts have at least one retained intron), often coupling splicing to nuclear export and translation efficiency.
How a single intron is excised
Assembly starts when U1 snRNP base-pairs with the 5' splice site (GU at the intron start) of the nascent pre-mRNA, forming the E (early) complex. SF1 binds the branch point sequence and U2AF65/35 binds the polypyrimidine tract and 3' splice site. ATP-dependent recruitment of U2 snRNP to the branch point via SF3a/b displaces SF1, forming the A complex with the catalytic adenosine bulged out of a U2/branch-point duplex. Pre-assembled U4/U6.U5 tri-snRNP joins next to form the B complex. A series of ATP-dependent rearrangements driven by DEAD-box helicases (Prp5, Prp28, Brr2, Prp2, Prp16, Prp22, Prp43) then proceeds: U1 and U4 are ejected, U6 base-pairs with the 5' splice site, U2 and U6 form the catalytic core, and the spliceosome reaches the B-activated state. Brr2 unwinds U4 from U6, freeing U6 to fold into its catalytic ISL.
The first transesterification fires in the C complex: the 2'-OH of the branch-point adenosine attacks the phosphodiester bond at the 5' splice site, releasing the upstream exon (now bearing a free 3'-OH) and forming a lariat intermediate where the intron 5' end is linked to the branch A via an unusual 2'-5' phosphodiester. After Prp16 ATPase repositions the spliceosome (the C-to-C* transition), the second transesterification in the C* complex fires: the freed 3'-OH of the upstream exon attacks the 3' splice site, joining the two exons and releasing the intron as a lariat. The mRNA exits, the intron is debranched by Dbr1 and degraded by the exosome, and snRNPs are recycled by Prp43 helicase action. The whole cycle takes ~30 seconds per intron in vivo, with hydrolysis of ~6-8 ATP molecules. Fidelity is enforced by kinetic proofreading: each DEAD-box helicase functions as a checkpoint, allowing the slow correct substrate to commit while the fast incorrect substrate is rejected.
Major U2-dependent vs minor U12-dependent spliceosome
| Property | Major (U2-dependent) | Minor (U12-dependent) |
|---|---|---|
| Fraction of human introns | ~99.3% | ~0.7% (~700 introns) |
| 5' splice site consensus | GU (GT/AG class) | AU (AT/AC class) and GU |
| 3' splice site consensus | AG | AC and AG |
| Branch point | YNCURAC | UCCUUAAC (more invariant) |
| snRNAs | U1, U2, U4, U5, U6 | U11, U12, U4atac, U5, U6atac |
| Common snRNA | — | U5 (shared with major) |
| Splicing rate | Fast (~30 s) | ~10x slower |
| Disease example | SMA (SMN1), MDS (SF3B1) | MOPD1 (U4atac), Roifman syndrome |
| Discovered | 1977 (Sharp, Roberts) | 1996 (Tarn & Steitz) |
Famous experiments
- Phillip Sharp & Richard Roberts, 1977. Discovered split genes in adenovirus — viral mRNA hybridized to DNA showed loops of unhybridized DNA corresponding to introns. Won the 1993 Nobel for the discovery that genes are not contiguous.
- Joan Steitz, 1980. Identified U1 snRNP as a complex of RNA and protein that base-pairs with the 5' splice site, using antibodies from lupus patients (anti-Sm autoantibodies). Founded the entire field of snRNP biology.
- Christine Guthrie & Brenton Graveley, 1980s-1990s. Genetic dissection of yeast splicing factors (Prp1-Prp45) defined the assembly pathway and the role of DEAD-box helicases as ATP-driven proofreading checkpoints.
- Yigong Shi & Kiyoshi Nagai labs, 2015-2018. First atomic-resolution cryo-EM structures of the assembled spliceosome at every step of the cycle (B, B-act, C, C*, P, ILS), confirming the RNA-only catalytic core and resolving how DEAD-box helicases drive conformational change.
- Tom Cech & Sidney Altman, 1980s. Discovered self-splicing group I and group II introns, showing RNA could catalyze splicing without protein. Won 1989 Nobel — established the ribozyme concept that prefigured the spliceosome's RNA-based catalysis.
Frequently asked questions
What does the spliceosome actually do?
It excises introns from pre-mRNA and joins the flanking exons. Most human genes are split: the dystrophin gene has 79 exons across 2.4 Mb, of which only ~14 kb survive as mRNA. The spliceosome reads three short consensus elements — the 5' splice site (GU at the intron start), the branch point adenosine ~20-50 nt upstream of the 3' end, and the 3' splice site (AG at the intron end) — and brings them into precise alignment for two transesterification reactions. The result is a mature mRNA whose exons are joined to single-nucleotide accuracy, with the intron released as a lariat that is debranched and degraded by the nuclear exosome.
How many proteins and RNAs are in a spliceosome?
A complete spliceosome at the catalytic step contains five small nuclear RNAs (U1, U2, U4, U5, U6 — collectively ~700 nt) and approximately 150 proteins, with a total mass of about 3 megadaltons. Each snRNA travels as a snRNP, decorated with seven Sm or Lsm proteins plus snRNP-specific factors: U1 has 10 protein partners, U2 has 17, U5 has 11, and U4/U6 share one chaperone protein on top of their respective sets. The full assembly is comparable in mass to the ribosome (~3-4 MDa) but morphs through five distinct conformational states (E, A, B, B-act, C, C*, P, ILS) during a single splicing cycle, making it one of the most dynamic machines in the cell.
Is the spliceosome a ribozyme?
Yes — the catalytic core is RNA, not protein. Cryo-EM structures from the Shi, Nagai, and Lührmann labs (2015-2018) showed that the two catalytic magnesium ions sit on the U6 snRNA, coordinated by an RNA helix called ISL (internal stem-loop) and a triplex with U2. The geometry mirrors group II self-splicing introns from organelles and bacteria, supporting the long-standing hypothesis that the spliceosome evolved from a self-splicing group II intron in the archaeal ancestor of eukaryotes. Like the ribosome, the spliceosome is fundamentally a ribozyme that recruits proteins for substrate selection, fidelity, and conformational control — but the chemistry of bond breaking and bond making is performed by RNA atoms.
Why does the spliceosome use a branch point adenosine?
Geometric necessity. The first chemical step is a transesterification — the 2'-OH of a nucleotide must attack the phosphodiester bond at the 5' splice site. Adenosine is the only common nucleotide whose 2'-OH is positioned (and electronically suited) to do this when bulged out of a duplex with U2 snRNA. The branch point sequence YNCURAC in metazoans (with the bolded A as the branch point) is recognized by SF1 in the early E complex, then by U2 snRNP in the A complex via base-pairing with U2's GUAGUA region that bulges the A out for catalysis. Mutations here cause a class of myelodysplastic syndromes — SF3B1 mutations alter branch point recognition in ~25% of MDS patients.
How does alternative splicing work?
Most human pre-mRNAs (~95%) are alternatively spliced — the same gene produces multiple mRNA isoforms by including or excluding specific exons, retaining introns, or using alternative 5' or 3' splice sites. The choice is regulated by splicing factor binding to exonic and intronic enhancers (ESE/ISE) and silencers (ESS/ISS). SR proteins (SRSF1-12) typically promote inclusion when bound to enhancers; hnRNPs typically suppress. Tissue-specific factors like NOVA1/2 (neurons), MBNL1 (muscle), and PTBP1/2 add cell-type-specific patterns. The titin gene produces hundreds of isoforms; the Drosophila Dscam gene can generate over 38,000 different isoforms via mutually exclusive exons. Alternative splicing roughly doubles the proteome compared to a one-gene-one-protein model.
Are there diseases caused by spliceosome defects?
Many. Spinal muscular atrophy is caused by loss of SMN1, which assembles snRNPs — the partial-redundancy gene SMN2 produces an unstable protein due to a single C-to-T change that disrupts SRSF1 binding and skips exon 7; the drug nusinersen is an antisense oligonucleotide that restores exon 7 inclusion. Retinitis pigmentosa types RP9, RP11, RP13 are caused by mutations in core spliceosome proteins (PRPF31, PRPF8, PRPF3) — bizarrely affecting only photoreceptors. SF3B1 is mutated in ~25% of myelodysplastic syndromes and ~80% of refractory anemia with ring sideroblasts; U2AF1 in ~10% of MDS. Familial dysautonomia is caused by IKBKAP intron retention. Spliceosome inhibitors like spliceostatin A and pladienolide B target SF3B1 and are in trials for SF3B1-mutant cancers.