Molecular Biology
Alternative Splicing
One gene, many proteins — the proteome multiplier
Alternative splicing is the process by which a single pre-mRNA gives rise to multiple mature mRNAs — and therefore multiple protein isoforms — by selectively including or excluding exons, retaining introns, or shifting splice-site choice. The spliceosome, a 5-snRNP RNA-protein machine, reads splice signals and decides which exons to join. Roughly 95% of human multi-exon genes are alternatively spliced. Tissue-specific patterns explain why a heart cell and a brain cell express different protein repertoires from the same genome. Drosophila Dscam can produce 38,000+ isoforms from a single locus by mutually exclusive selection across four cassettes — more isoforms than the fly has genes. Aberrant splicing causes spinal muscular atrophy, retinitis pigmentosa, and progeria; Spinraza and Evrysdi treat SMA by correcting the SMN2 splicing pattern.
- Multi-exon genes spliced alternatively~95% in humans
- Spliceosome compositionU1, U2, U4, U5, U6 snRNPs + ~150 proteins
- Splice signals5'-GU... branch-A...AG-3'
- Dscam isoforms38,016 (Drosophila)
- SMA splicing drugsSpinraza (ASO), Evrysdi (small molecule)
- Discovery of split genesRoberts & Sharp, 1977 (Nobel 1993)
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
How splicing works, in chemistry
Each intron is bounded by GU at its 5' end and AG at its 3' end, with a branch-point adenosine ~20–50 nt upstream of the 3' splice site. The spliceosome assembles step-wise:
- E complex. U1 snRNP base-pairs with the 5' splice site; U2AF binds the polypyrimidine tract and 3' AG.
- A complex. U2 snRNP base-pairs with the branch point, bulging out the branch A.
- B complex. The U4/U6.U5 tri-snRNP joins; the spliceosome is now assembled but catalytically inactive.
- B activation. U1 leaves the 5' splice site; U6 takes over and base-pairs with U2 to form the catalytic core. U4 dissociates.
- First transesterification. The 2'-OH of the branch-point A attacks the 5' splice site phosphodiester bond, freeing the upstream exon and forming a lariat intron-exon intermediate.
- Second transesterification. The 3'-OH of the freed upstream exon attacks the 3' splice site, joining the two exons and releasing the lariat intron, which is then debranched and degraded.
Both steps are simple Sn2-like nucleophilic substitutions — the same chemistry as group II self-splicing introns and a likely ancestor of the spliceosome.
Splicing decision diagram
5' exon ── GU ─────── intron ─────── A ── (Py)n ── AG ── 3' exon
│ │ │
▼ ▼ ▼
U1 binds U2 binds U2AF / 3' SS
▼ (U4/U6.U5 joins, U1+U4 leave, U6+U2 form catalytic core)
branch-A 2'-OH attacks 5' SS → lariat intermediate
│
3'-OH of upstream exon attacks 3' SS → joined exons + lariat
Five canonical alternative splicing patterns
| Pattern | What changes | Frequency in humans | Functional effect | Classic example | Aberrant example |
|---|---|---|---|---|---|
| Exon skipping (cassette) | An internal exon is included or excluded | ~40% of alternative events | Adds or removes a protein domain | Tropomyosin TPM1 muscle isoforms | SMN2 exon 7 skipping → SMA |
| Intron retention | An intron is kept in the mature mRNA | ~5% in mammals; majority in plants | Often introduces premature stop → NMD | Plant flowering FLC regulation | NF1 intron retention in tumors |
| Alternative 5' splice site | Different 5' GU chosen | ~8% | Shortens or extends exon | BCL2L1 Bcl-x(L) vs Bcl-x(S) | LMNA progerin in progeria |
| Alternative 3' splice site | Different 3' AG chosen | ~18% | Shortens or extends exon | FN1 fibronectin EDB region | Cancer-specific MDM4 splicing |
| Mutually exclusive exons | Exactly one of two (or more) exons included | ~5%, but most striking | Switches functional domain | FGFR2 IIIb (epithelial) vs IIIc (mesenchymal) | — |
| Alternative promoters / poly-A | Different first or last exon | Common; often combined | Tissue-specific UTRs and translation | DRD2 dopamine receptor isoforms | — |
One transcript can use several patterns simultaneously: Drosophila Dscam combines four mutually-exclusive cassettes; mammalian troponin T uses cassette and alternative-3' events.
Real-world significance
- Tissue identity. Brain has the highest splicing complexity. Microexons (3–27 nt) regulated by SRRM4 are nearly brain-exclusive and misregulated in autism (Irimia, Cell 2014).
- Cancer. Mutations in splicing factors (SF3B1, SRSF2, U2AF1) are found in 60–80% of myelodysplastic syndromes. They reprogram exon usage genome-wide, generating tumor-specific neoantigens for immunotherapy.
- SMA treatment. Nusinersen (Spinraza, 2016) and risdiplam (Evrysdi, 2020) restored SMN2 exon 7 inclusion and turned a fatal infant disease into chronic-care. Splicing's first proof-of-concept as a drug target.
- Duchenne muscular dystrophy. Eteplirsen (Exondys 51, 2016), golodirsen, viltolarsen, casimersen — antisense oligos that force skipping of mutated dystrophin exons to restore an in-frame (if shorter) protein.
- Sex determination. Drosophila sex is controlled by a splicing cascade: Sex-lethal → transformer → doublesex. Each step is alternative-3'-splice-site choice.
- Adaptive immunity. Membrane vs secreted IgM are produced by alternative splicing of the same heavy-chain transcript — the very RNA that pushed Sharp and Roberts toward split genes in 1977.
Variants and special cases
- Trans-splicing. Splice sites on different RNA molecules joined. Standard in trypanosomes; rare in mammals (JAZF1-JJAZ1).
- Recursive splicing. Long introns removed in successive segments — important for dystrophin.
- Back-splicing → circRNAs. A downstream 5' splice site joins an upstream 3' splice site, producing a circular RNA. CDR1as sponges miR-7 in brain.
- Self-splicing introns. Group I and group II introns splice without protein (Cech, Altman, Nobel 1989).
- Minor (U12) spliceosome. Splices ~700 human introns with non-canonical boundaries; U4atac mutations cause MOPDI.
Pitfalls and clinical traps
- RNA-seq quantification. Short reads cannot resolve all isoforms. Tools like Salmon, Kallisto, and rMATS estimate isoform abundance with assumptions; long-read sequencing (Iso-Seq, Nanopore) is the rigorous answer.
- Reference-annotation bias. If an isoform is missing from your annotation (Ensembl, GENCODE), its reads get misassigned. Always inspect novel splice junctions.
- Cryptic splice sites. Mutations can create new GT or AG dinucleotides that the spliceosome uses. Many disease alleles act this way (Tay-Sachs HEXA, β-thalassemia HBB) — in silico splice predictors (SpliceAI) catch most.
- Progeria. A single C1824T LMNA point mutation activates a cryptic 5' splice site, producing the truncated progerin protein. The classical example of splicing pathology from a non-splice-site SNV.
- Splicing-NMD coupling. Many isoforms exist transiently and are degraded by NMD; their detection depends on inhibiting NMD (cycloheximide, UPF1 knockdown). Steady-state RNA-seq under-counts these regulated targets.
- SMA carrier testing. SMN1 and SMN2 are 99% identical; confusing them in PCR-based diagnostics has led to misdiagnoses. MLPA or droplet-digital PCR is the standard.
Frequently asked questions
How does the spliceosome work?
The major spliceosome is built from five small nuclear ribonucleoproteins (snRNPs: U1, U2, U4, U5, U6) plus more than a hundred associated proteins. U1 base-pairs with the 5' splice site (GU); U2 binds the branch point adenosine. The U4/U6.U5 tri-snRNP joins, U1 and U4 leave, and U6 takes over base-pairing with the 5' splice site. Two transesterification reactions follow: the branch-point A attacks the 5' splice site, releasing the upstream exon and creating a lariat; the 3'-OH of the upstream exon attacks the 3' splice site, joining exons and freeing the lariat intron.
What controls alternative splicing?
Splice-site strength sets the baseline. Splicing factors then tip the balance: SR proteins (SRSF1–12) bind exonic splicing enhancers and promote exon inclusion; hnRNP proteins (hnRNPA1, PTBP1) bind silencers and promote skipping. Tissue-specific factors include NOVA in neurons, MBNL in muscle, and RBFOX in heart, brain, and muscle. Transcription rate matters too — slow Pol II favors inclusion of weak exons. RNA secondary structure can hide or expose splice sites.
What does Dscam show about isoform diversity?
Drosophila Dscam has four mutually exclusive exon cassettes — 12 alternatives in exon 4, 48 in exon 6, 33 in exon 9, and 2 in exon 17 — for a maximum of 12 × 48 × 33 × 2 = 38,016 distinct isoforms. Each neuron expresses a different combination, giving every cell a unique molecular identity tag for self-recognition during axon guidance. The fly genome has only ~14,000 genes; one Dscam locus produces nearly three times as many isoforms.
How is splicing therapy used in spinal muscular atrophy?
SMA is caused by loss of the SMN1 gene. Humans also have SMN2, an almost-identical paralog, but a single C-to-T transition in exon 7 of SMN2 disrupts an exonic splicing enhancer; ~85% of SMN2 transcripts skip exon 7, producing unstable truncated SMN protein. Nusinersen (Spinraza, 2016) is an antisense oligo that binds an intronic splicing silencer and forces exon 7 inclusion. Risdiplam (Evrysdi, 2020) is a small molecule that stabilizes U1-snRNA at the 5' splice site of SMN2.
Is splicing co-transcriptional?
Yes — splicing happens while the pre-mRNA is still being made. The CTD of RNA Pol II recruits splicing factors as it elongates. Phosphorylation of CTD serine 2 marks elongation and brings in U1 and U2AF. Slow elongation gives weak splice sites time to be recognized; fast elongation favors stronger sites. Drugs like camptothecin slow Pol II and shift splicing patterns.
What's nonsense-mediated decay's role?
Many alternative splicing events introduce premature stop codons. The exon junction complex (EJC), deposited 20–24 nt upstream of every exon-exon junction during splicing, serves as a marker for nonsense-mediated decay (NMD). If a stop codon sits more than ~50 nt upstream of the last EJC, the transcript is degraded by UPF1, UPF2, UPF3, SMG1, and SMG6. Roughly a third of alternative isoforms are removed before they translate.