Question 1

What is the formula for linkage disequilibrium?

Accepted Answer

For two biallelic loci with alleles A/a and B/b, with allele frequencies p_A, p_a, p_B, p_b and haplotype frequency p_AB, the standard pairwise LD coefficient is D = p_AB − p_A · p_B. D ranges from −0.25 to +0.25 and is sensitive to allele frequencies. Two normalised statistics are typically reported. D' = D / D_max, where D_max is the maximum value of |D| given the marginal allele frequencies; D' ranges from −1 to +1, with |D'| = 1 meaning at least one of the four possible haplotypes is unobserved. r² = D² / (p_A · p_a · p_B · p_b) is the squared correlation between the indicator variables for the two alleles; r² is the statistic that drives statistical power for SNP-tag association studies, because the sample size needed to detect a causal variant via a tag SNP scales as 1/r².

Question 2

How fast does LD decay?

Accepted Answer

Each generation, LD between two loci with recombination fraction c shrinks: D_{t+1} = (1 − c) · D_t. After t generations D_t = (1 − c)^t · D_0. For loci 1 cM apart (c ≈ 0.01), LD decays by about 1 percent per generation; after 100 generations only ~37 percent of the initial LD remains, after 300 generations ~5 percent. Loci 0.1 cM apart (about 100 kb in humans on average) lose only 0.1 percent per generation, so LD persists for thousands of years. This is why LD blocks in non-African human populations extend 10–100 kb and span demographically recent history; Africans, with longer effective population history without a recent bottleneck, show LD blocks roughly an order of magnitude shorter.

Question 3

Why is LD essential for GWAS?

Accepted Answer

GWAS typically genotype 0.5 to 2 million tagging SNPs across roughly 3 billion bp of human genome — far less than a complete inventory of 80–100 million common variants. The strategy works because tagging SNPs are chosen so that any common (minor allele frequency > 0.05) untyped variant is in high LD (r² > 0.8) with at least one typed SNP. A causal variant that the chip does not measure directly will still produce an association signal at any tag SNP it shares an LD block with, statistically detectable with sample size scaling as 1/r². The HapMap and 1000 Genomes projects measured LD across populations to allow chip designs that maximise tagging efficiency. As a side effect, fine-mapping the causal variant within an associated LD block requires either dense sequencing or trans-ethnic comparison.

Question 4

Why is LD different across populations?

Accepted Answer

LD is shaped by population history. Three forces matter most. First, effective population size N_e: smaller populations accumulate LD faster from drift and decay it slower because the equilibrium r² ≈ 1 / (1 + 4 N_e c). Second, bottlenecks: a recent severe bottleneck (e.g., out-of-Africa migration ~60,000 years ago) elevates LD across the entire genome for thousands of generations afterward. Third, admixture: recent mixing of two diverged populations creates extensive LD even between physically unlinked loci, decaying with each post-admixture generation. Empirically, African genomes show short LD blocks (~5–10 kb), East Asian and European show longer blocks (~20–60 kb), Finns and Ashkenazi Jews show even longer blocks (>100 kb in some regions) consistent with their bottlenecks.

Question 5

What causes high LD between distant loci?

Accepted Answer

Several non-recombination forces sustain or generate LD. Recent admixture between populations with different allele frequencies creates LD between unlinked loci that decays at rate c per generation. Selection on a haplotype carrying linked variants (a selective sweep) drags the entire haplotype to high frequency, generating long-range LD that persists for thousands of years afterward — the lactase persistence haplotype in Europeans extends LD over more than 1 Mb. Population structure: if a sample mixes two subpopulations with different allele frequencies, alleles common in one subpopulation but absent in the other appear correlated even when physically unlinked; failing to control for this is the leading cause of false-positive GWAS hits. Epistatic selection (favouring particular allele combinations) and inversions that suppress recombination locally also create persistent long-range LD.

Question 6

What is a haplotype block?

Accepted Answer

A haplotype block is a stretch of the genome over which most pairs of common SNPs are in high LD with each other and only a few haplotype configurations are observed. The HapMap consortium showed that human chromosomes can be partitioned into roughly 100,000 to 300,000 such blocks, separated by short recombination hotspots that account for the majority of crossovers. Within a block, typically 4 to 10 distinct haplotypes account for >95 percent of all observed chromosomes, so a small number of tagging SNPs (often 3 to 6) can capture most of the variation. Block boundaries correspond largely to the same hotspots PRDM9 directs in meiosis, with hotspot recombination rates 10 to 1,000 times the genome-wide average concentrated in 1–2 kb windows.

Concept	Linkage	Linkage disequilibrium	Association
Scale	Within a single family	Across a population	Across a population (case-control or cohort)
Measure	Recombination fraction c (LOD score)	D, D', r²	Odds ratio, β coefficient, P-value
Time horizon	One meiosis (one generation)	Thousands of generations	Current population
Detects	Co-segregation of marker and trait in pedigrees	Allelic correlation independent of phenotype	Correlation between genotype and phenotype
Sample size	10–100 families	1,000–100,000 chromosomes for fine LD	10,000–1,000,000 individuals (modern GWAS)
Resolution	~1–10 cM	Down to single-SNP via dense LD blocks	Locus-level; needs LD/fine-mapping for causal variant
Classic study	Sturtevant 1913 Drosophila	HapMap 2002, 1000 Genomes	WTCCC 2007, UK Biobank 2018

Linkage Disequilibrium

Interactive visualization

Watch the 60-second explainer

Why linkage disequilibrium matters

Common misconceptions

How LD evolves

Linkage vs LD vs association

Famous experiments

Frequently asked questions