Molecular Biology

DNA Methylation

A single methyl group on cytosine — the most-studied epigenetic mark and the substrate of every age clock

DNA methylation is the covalent addition of a methyl group (–CH₃) to the 5-position of a cytosine ring, producing 5-methylcytosine. In mammals it almost always happens at CpG dinucleotides — about three quarters of all CpGs in the genome carry the mark. The unmethylated quarter clusters in CpG islands at gene promoters; keeping these islands clean is what allows the gene to be expressed. The methyltransferases DNMT3A and DNMT3B place new methyl groups during embryonic development; DNMT1 maintains the pattern by re-methylating the daughter strand at every replication fork. Erasure runs through the TET enzymes, which oxidise 5-methylcytosine and feed it into base-excision repair. Aberrant methylation silences tumour suppressors in nearly every cancer, drives the imprinting disorders Prader-Willi and Angelman, and provides the read-out for the Horvath epigenetic clock — currently the most accurate molecular predictor of human age.

  • Mark5-methylcytosine (5mC)
  • Context (mammals)CpG dinucleotide, symmetric
  • MethyltransferasesDNMT3A / 3B (de novo); DNMT1 (maintenance)
  • DemethylasesTET1 / TET2 / TET3 (oxidative)
  • Methyl donorS-adenosylmethionine (SAM)
  • Drugs5-azacytidine, decitabine (DNMT inhibitors)

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

A methyl group, a cytosine, and three enzymes

Cytosine is one of four DNA bases. Add a methyl group to its C5 position and you get 5-methylcytosine — the same Watson-Crick pairing with guanine, but a small bulge sitting in the major groove of the double helix. That bulge is enough to (a) physically obstruct some transcription factors, (b) recruit methyl-binding proteins like MeCP2 and MBD1-4, and (c) prevent CpG-island promoters from firing. Methylated promoters are silent; unmethylated promoters are open for business.

The mark is symmetric. CpG is the only dinucleotide that reads the same when you flip the double helix — the cytosine on one strand always sits opposite a cytosine of the other strand's GpC pair. That symmetry lets the cell copy methylation through cell division: after replication, the daughter helix is hemi-methylated (parent strand methylated, new strand unmethylated), and DNMT1 fills in the missing methyl groups within minutes. The methyl pattern propagates almost like a second genetic code, except that it can be wiped and rewritten in response to development or environment.

Three enzymes do almost all of the writing in mammals. DNMT3A and DNMT3B are de novo methyltransferases: they scan unmethylated DNA and lay down new patterns, mostly during embryonic implantation and germline reprogramming. DNMT1 is the maintenance methyltransferase: it rides PCNA at the replication fork, recognises hemi-methylated CpGs through its UHRF1 partner, and copies the parent's methyl mark onto the new strand. Erasure runs in the opposite direction through the TET family, which oxidise 5mC stepwise to 5-hydroxymethyl, 5-formyl, and 5-carboxylcytosine; the last two are excised by thymine DNA glycosylase and replaced through base-excision repair.

CpG islands and the silencing logic

CpG dinucleotides are dramatically under-represented in the mammalian genome — about a fifth of what you would expect from base composition alone. The reason is mutational: 5-methylcytosine spontaneously deaminates to thymine, so methylated CpGs convert to TpG over evolutionary time. The CpGs that survive this attrition are concentrated in short, GC-rich stretches called CpG islands, typically 200-3000 base pairs long, sitting at about 70% of human gene promoters. Islands are kept unmethylated by active mechanisms — H3K4me3 deposition, transcription factor occupancy, and CXXC-domain proteins that bind unmethylated CpGs and recruit demethylases.

When that protection fails, an island can become hypermethylated, recruiting MBD proteins, HDAC complexes, and Polycomb factors that compact the chromatin and lock the underlying gene shut. In tumour cells this is the most common way to silence a tumour-suppressor gene without mutation: BRCA1, MLH1, p16, RASSF1A are all silenced by promoter hypermethylation in many cancers, and the silencing is reversible if you knock down DNMT1.

Methylation contexts across kingdoms

MarkPositionWhere it occursSequence contextFunctionDetection
5-methylcytosine (5mC)C5 of cytosineMammals, plants, fungiCpG (mammals); CpG, CHG, CHH (plants)Promoter silencing, transposon controlBisulfite sequencing
5-hydroxymethyl (5hmC)C5 of cytosineBrain, embryosCpG, oxidative intermediateDemethylation pathway, gene-body activityoxBS-seq, TAB-seq
N6-methyladenine (6mA)N6 of adenineBacteria, archaea, some eukaryotesGATC, GANTC etc.Restriction defence, replication timingSMRT sequencing
N4-methylcytosine (4mC)N4 of cytosineBacteria, archaeaRestriction enzyme recognition motifsRestriction-modification systemsSMRT sequencing
Plant CHG methylationC5 of cytosinePlants onlyCWG (W = A or T)Transposon silencingBisulfite sequencing
Plant CHH methylationC5 of cytosinePlants onlyCHH (H = A, C, T)Asymmetric, RdDM-drivenBisulfite sequencing

The take-away from the table is that "DNA methylation" means very different things across organisms. Mammals run a single dominant chemistry (5mC at CpG); plants run three contexts simultaneously and use small RNAs to direct the methyltransferases (the RdDM pathway); bacteria use methylation primarily for self/non-self discrimination through restriction-modification systems.

Reprogramming windows in the life cycle

Mammalian DNA methylation is not a static label; it is wiped and rewritten twice in every life. After fertilisation the paternal genome is actively demethylated by TET3 within hours, and the maternal genome demethylates more slowly through replication-coupled dilution; by the blastocyst stage the embryo has the most hypomethylated genome of its life. DNMT3A/3B then re-establish lineage-specific methylation as the embryo implants. The second wipe happens in primordial germ cells around week six of human development, ensuring that imprints are reset to the appropriate parental state before they are re-laid in sperm or egg. Failure of either wipe causes imprinting disorders and most embryonic lethality phenotypes in mice.

DNA methylation vs histone modification

DNA methylationHistone modification
SubstrateCytosine base in DNAHistone tail residues
Number of marksOne dominant mark (5mC)Dozens, combinatorial
Reversibility timescaleHours (TET) to cell divisions (passive)Minutes (HDACs / KDMs)
Inheritance through replicationDNMT1 copies the parent strand directlyReader-writer feedback re-establishes pattern
Direction of effectMostly silencing in promotersActivating or silencing depending on residue
Drugs5-azacytidine, decitabineVorinostat, tazemetostat, JQ1

The two systems are interlocked. H3K9me3 recruits DNMT3A; DNMT3L (a methyltransferase-like cofactor) reads unmethylated H3K4 and recruits DNMT3A there; in the other direction MeCP2 reads 5mC and recruits HDAC complexes. Cells use the redundancy as belt-and-braces silencing — a transposon should not depend on a single mark to stay quiet.

Where methylation lands in the clinic

  • Cancer. Promoter hypermethylation silences BRCA1 in 10-15% of sporadic breast and ovarian cancers, MLH1 in nearly all sporadic Lynch-like colorectal cancers, and MGMT in glioblastoma — and MGMT methylation status is the standard biomarker for whether to add temozolomide. Counterintuitively, the rest of the tumour genome is usually hypomethylated, destabilising chromosomes and reactivating transposons.
  • Imprinting disorders. Loss of methylation on the maternal 15q11-13 imprint causes Angelman syndrome; loss on the paternal copy causes Prader-Willi. Beckwith-Wiedemann (overgrowth) and Silver-Russell (under-growth) syndromes both come from methylation defects at 11p15.
  • Hypomethylating drugs. Azacitidine and decitabine are first-line therapies for myelodysplastic syndromes and elderly acute myeloid leukaemia; they work by trapping DNMT1 on DNA, triggering its degradation, and reawakening silenced tumour suppressors.
  • Epigenetic age and biomarkers. The Horvath, PhenoAge, GrimAge, and DunedinPACE clocks all read methylation at a few hundred CpGs to estimate biological age and mortality risk. Methylation in cell-free DNA also reveals tissue of origin, opening early cancer detection assays such as Galleri.
  • Environmental epigenetics. The Dutch Hunger Winter cohort showed that prenatal famine alters methylation at IGF2 decades later. Smoking re-methylates a stable signature visible years after quitting. The effects are real but small at the level of any single CpG.

Variants of the chemistry beyond 5mC

  • 5-hydroxymethylcytosine (5hmC). Generated by TET enzymes; abundant in brain (≈0.7% of cytosines) and embryonic stem cells. Probably an active mark in its own right at gene bodies, not just a demethylation intermediate.
  • Non-CpG methylation. Mammalian neurons accumulate substantial CpA methylation (mCH) in adulthood; this reads as a neuron-specific mark and helps tell cell types apart in single-cell methylomes.
  • N6-methyladenine. Once thought to be bacterial only, low-level adenine methylation has been reported in mammalian DNA; the biology and even the existence are still debated.
  • Plant RdDM pathway. Small interfering RNAs guide the DNA methyltransferase DRM2 to specific loci, allowing plants to inherit acquired methylation patterns over generations — a real Lamarckian-flavoured channel that is well documented in Arabidopsis but largely absent in mammals.

Pitfalls and easy misreadings

  • "Methylation always silences." Promoter methylation usually does, but gene-body methylation in mammals is positively correlated with expression. Plant CHG methylation activates many genes. Always specify the genomic context.
  • "Bisulfite sequencing reads 5mC." It reads 5mC and 5hmC together; you need oxidative or TAB-seq to separate them. Many published methylation tracks therefore conflate the two.
  • "Epigenetic age is reversible." Yamanaka-factor reprogramming resets methylation age in vitro. In vivo studies in mice and human trials are ongoing but the clinical claim is far ahead of the data.
  • "Lifestyle changes inheritable methylation." Methylation in primordial germ cells is largely (though not entirely) erased; transgenerational epigenetic inheritance in mammals is real for a handful of imprints but headlines routinely overstate the size of the effect.
  • "DNA methylation rewrites the genome." It does not change a single base of sequence — and when 5mC deaminates to T, that is a sequence change, not a methylation effect.

Frequently asked questions

Why does mammalian methylation cluster at CpG dinucleotides?

Because methylation is symmetric on the double helix — a methylated cytosine on one strand pairs with a guanine, and the cytosine of the GpC pair on the other strand also gets methylated, giving a self-copying mark. CpG is the only dinucleotide that reads the same when you flip the strand, so it is the only context where this symmetric copying works in mammals. Plants and some fungi additionally methylate CHG and CHH contexts, but mammals do not.

What is a CpG island?

A short stretch of DNA — typically 200-3000 bp — that is unusually CpG-rich (observed/expected CpG ratio above 0.6) and high in GC content (above 50%). About 70% of human gene promoters sit in CpG islands. Crucially, these islands are usually unmethylated in healthy cells; they are kept open because methylation here would silence the gene. Tumour cells frequently hypermethylate CpG islands at tumour-suppressor genes, mimicking the loss-of-function effect of a deletion.

How is the methylation pattern copied through cell division?

After DNA replication the daughter molecule is hemi-methylated — the parental strand still carries 5-methylcytosines, the newly synthesised strand does not. DNMT1, with its accessory factor UHRF1, slides along behind the replication fork, recognises hemi-methylated CpGs, and adds methyl groups to the new strand to restore symmetry. The whole pattern is copied within minutes to hours of replication.

How is methylation removed?

Two routes. Passive demethylation: block DNMT1 and the mark dilutes by half every cell division. Active demethylation: TET enzymes (TET1-3) oxidise 5-methylcytosine through 5-hydroxymethyl, 5-formyl, and 5-carboxyl intermediates; thymine DNA glycosylase then excises the modified base and base-excision repair restores an unmodified cytosine. The active route is how zygotes erase the paternal methylome in hours.

What is the epigenetic clock?

Steve Horvath showed in 2013 that the methylation level at a small panel of CpGs predicts a person's chronological age within a few years across nearly all human tissues. Newer clocks (PhenoAge, GrimAge, DunedinPACE) predict biological age and mortality risk. They are the most accurate molecular age predictors we have, although the underlying biology — why these specific CpGs change with age — remains partly unresolved.

Are demethylating drugs used clinically?

Yes. 5-azacytidine (azacitidine) and 5-aza-2'-deoxycytidine (decitabine) are nucleoside analogues that replace cytosine in DNA, trap DNMT1, and trigger its degradation. Genome-wide methylation drops, silenced tumour suppressors re-express, and abnormal myeloid cells differentiate or die. Both drugs are first-line therapy for myelodysplastic syndromes and older patients with acute myeloid leukaemia.