Genetics

Quantitative Trait Loci (QTL)

Genome regions whose allelic variation contributes to a continuous phenotype — height, yield, blood pressure

A quantitative trait locus (QTL) is a region of the genome where allelic variation contributes to a continuous phenotype — height, blood pressure, crop yield, milk production, gene-expression levels. QTL mapping was formalised by Eric Lander and David Botstein in 1989 with RFLP markers and an interval-mapping likelihood framework, and modern GWAS extends the same logic to large outbred populations using LD-tagged SNPs. Most individual QTLs explain less than 5 percent of phenotypic variance; human adult height is roughly 80 percent heritable but split across hundreds of loci, with the largest single SNP accounting for only ~0.4 percent of variance. The GWAS Catalog records more than 600,000 trait-SNP associations across more than 6,000 traits.

  • FormalisedLander & Botstein 1989
  • SignificanceLOD > 3 (P < 1e-4) per locus
  • Typical effect< 5 % of variance per QTL
  • Height heritability~80 % · hundreds of loci
  • GWAS hits (2024)600,000+ in catalog
  • eQTL coverageGTEx 49 tissues, ~17k genes

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why QTLs matter

  • Most diseases are complex, not Mendelian. Type 2 diabetes, schizophrenia, coronary artery disease, hypertension, and the major psychiatric disorders are polygenic — controlled by hundreds to thousands of small-effect loci, not a single Mendelian gene. Of the ~600,000 GWAS Catalog associations, roughly 90 percent are for quantitative or complex traits rather than monogenic disorders.
  • Plant and animal breeding gains. Marker-assisted selection using QTLs identified in pedigreed breeding populations has accelerated genetic gain in dairy cattle (milk yield gains of 100+ kg/cow/year via genomic selection since 2008), pigs, chickens, and crops including maize, rice, soybean, and tomato. Genomic estimated breeding values (GEBVs) have replaced classical pedigree-based BLUP in major commercial breeding programs.
  • Heritability quantifies the genetic share. Twin studies put heritability of human height at ~80 percent, BMI ~70 percent, schizophrenia ~80 percent, IQ ~50–80 percent depending on age. QTL/GWAS work translates these abstract heritability fractions into specific genomic loci.
  • Polygenic risk scores stratify disease risk. Summing weighted risk alleles across a genome produces a polygenic score (PRS) predicting disease liability. The top 5 percent of coronary artery disease PRS have ~3-fold higher risk than the population average — comparable to monogenic familial hypercholesterolemia. PRS for breast cancer (~310 SNPs) is being clinically piloted alongside BRCA1/2 testing.
  • Drug-target validation. A drug targeting a gene whose QTL alleles already shift the disease phenotype in the human population has roughly 2× higher chance of clinical success — the "human genetic evidence" multiplier observed by Nelson et al 2015 across pharma pipelines. PCSK9 inhibitors (alirocumab, evolocumab) are the canonical example: loss-of-function alleles lower LDL cholesterol and protect against heart disease, motivating drugs with ~$8B in 2023 sales.
  • eQTLs link non-coding GWAS hits to genes. Roughly 90 percent of GWAS-significant SNPs are in non-coding regions. Colocalising them with cis-eQTLs from GTEx tissues identifies the affected gene in roughly 60–70 percent of cases, providing the first mechanistic step beyond the locus association.
  • Cross-species portability. Many human disease genes were first identified as QTLs in mouse, rat, or zebrafish models. The Tcf7l2 locus, the strongest type 2 diabetes association in humans, was independently confirmed by mouse Tcf7l2 knockouts showing impaired glucose-stimulated insulin secretion — closing the loop from human GWAS to mechanism.

Common misconceptions

  • A QTL is a gene. A QTL is a chromosomal region — typically 1–30 cM in classical mapping or 1–100 kb in fine GWAS — that may contain dozens to hundreds of genes. Identifying the actual causal gene and variant within a QTL requires fine-mapping, expression analysis, or functional assays. The historical literature is full of QTLs whose causal gene was only confirmed decades later.
  • High LOD score means strong biological effect. LOD score depends on sample size as well as effect size: a QTL with 1 percent variance explained can show LOD > 10 if the population is large enough. Conversely, a 10-percent-variance QTL can be undetectable in a small sample. Always report variance explained alongside LOD or P-value.
  • QTL effects are additive only. Many QTLs show dominance (heterozygote effect not the average of homozygotes) or epistasis (effect depends on genotype at another locus). Modern QTL mapping software (R/qtl2, GEMMA) tests for non-additive effects, but additive variance still dominates in published estimates of heritability.
  • Heritability is fixed. Heritability is a population-specific ratio of genetic to total variance — it depends on the environment. Heritability of height in the United States today (~80 percent) was lower a century ago when nutritional variance was larger. Heritability does not measure how much of an individual's trait is genetic; only how much variation across a population is genetic.
  • Polygenic risk scores work equally well across populations. They don't. PRS derived from European-ancestry GWAS lose 40–80 percent of predictive accuracy in African ancestry samples because of LD differences and allele-frequency differences. Multi-ancestry GWAS (e.g., Pan-UK Biobank, AFR-GWAS in TOPMed) is essential for equitable clinical use.
  • Missing heritability proves heritability is wrong. Missing heritability narrowed substantially from 2010 to 2022 as sample sizes grew: height GWAS hits explained ~5 percent of variance in 2010, ~20 percent in 2014, and ~40 percent in 2022 (Yengo et al, 5.4M individuals). The remaining gap is plausibly explained by rare variants, weaker effects below significance thresholds, and assortative mating — not a fundamental flaw in heritability estimation.

How QTL mapping works

Classical QTL mapping starts with two parental inbred lines that differ for the trait of interest — for example, a tall and a short maize line. Crossing them produces an F1 generation that is uniformly heterozygous at every locus differing between the parents. Selfing or sibling-mating the F1 generates an F2 population (or recombinant inbred lines after several more generations) where each individual is a unique mosaic of parental haplotypes due to meiotic recombination. Genotyping all individuals at hundreds to thousands of markers spanning the genome — historically RFLPs, then SSRs, today SNP arrays or low-coverage sequencing — assigns parental ancestry at each locus. For each marker (single-marker analysis) or genomic interval (interval mapping), one tests whether parental ancestry predicts the trait value, computing a LOD score. Significant peaks identify QTL locations, with the LOD threshold typically set at 3–5 to control the genome-wide false positive rate.

Lander and Botstein 1989 introduced interval mapping, which uses maximum-likelihood to estimate both the position of a putative QTL between two flanking markers and its additive and dominance effects. The method explicitly models recombination between the QTL and flanking markers, gaining power over single-marker analysis. Composite interval mapping (Zeng 1993, 1994) adds covariates from other markers to control for confounding by linked QTLs and reduce false-positive peaks at correlated locations. Multiple-QTL methods (R/qtl, R/qtl2) extend this to simultaneous fitting of many QTLs and their interactions.

GWAS replaces the controlled experimental cross with an unrelated cohort: tens of thousands to millions of individuals genotyped at common variants. The statistical model regresses phenotype on each SNP one at a time, with population stratification controlled via principal components or mixed-effect models (BOLT-LMM, SAIGE). The genome-wide significance threshold is conventionally 5 × 10−8, derived from a Bonferroni correction for ~1 million effectively independent tests. Effect-size estimation requires careful handling of "winner's curse" (selected loci have inflated effect-size estimates in the discovery sample) and population-specific LD reference panels for fine-mapping. The resulting catalogue (NHGRI-EBI GWAS Catalog) holds >600,000 entries across 6,000+ traits as of 2024.

QTL mapping vs GWAS

FeatureClassical QTL mappingGenome-wide association (GWAS)
SampleF2 / RIL / backcross from defined parentsUnrelated individuals from a population
Sample size100–1,000 individuals10,000 to several million
Recombination resolution1–30 cM (a few generations)1–100 kb via population LD
Statistical methodLOD interval mapping (Lander-Botstein 1989)Mixed-model linear regression at each SNP
Significance thresholdLOD > 3 (locus-wise) or empirical permutationP < 5 × 10−8 (genome-wide Bonferroni)
StrengthsHigh per-individual power, allele frequency 0.5 by designHigh resolution, captures common-variant architecture
WeaknessesConfined to parental allele set, low resolutionMisses rare variants, needs huge cohorts, population stratification
Typical organismMouse, rat, maize, Drosophila, Arabidopsis, yeastHuman (UK Biobank), large outbred animal populations
Output10–30 cM QTL peaks, dozens per genome1–100 kb associated regions, hundreds–thousands per trait

Famous experiments

  • Lander & Botstein 1989. Introduced interval mapping with RFLP markers and the LOD-score framework for QTL detection. The methods paper has been cited >15,000 times and laid the statistical foundation for the field.
  • Paterson et al 1988 tomato fruit-mass QTLs. First mapped a quantitative trait — fruit weight in Solanum tomato interspecific cross — to specific chromosome regions using RFLPs, identifying six QTLs with effects ranging from 5 to 30 percent of phenotypic variance.
  • Mackay 2014 Drosophila Genetic Reference Panel. Sequenced 205 inbred fly lines and mapped QTLs for hundreds of behavioural and physiological traits — established the modern multi-trait QTL benchmark.
  • HapMap Project 2002–2010. Catalogued LD and common variation across populations, providing the reference panel that enabled commercial GWAS arrays starting in 2005.
  • Yengo et al 2022 height GWAS. Meta-analysis of 5.4 million individuals identified ~12,000 independent SNPs associated with adult height, jointly explaining ~40 percent of variance from common variants — the largest single-trait GWAS to date and a benchmark for the polygenic architecture of complex traits.

Frequently asked questions

What is a quantitative trait locus?

A quantitative trait locus, or QTL, is a region of the genome where allelic variation correlates with variation in a continuous phenotype across a population or experimental cross. Unlike Mendelian genes that produce discrete categorical phenotypes (round versus wrinkled), QTLs typically each contribute a small fraction of the phenotypic variance — often less than 5 percent — and most traits are influenced by tens to thousands of QTLs operating together. The term applies to any continuous trait: height, blood-pressure, blood-glucose, crop yield, milk fat percentage, behavioural latency, or even gene-expression levels (called eQTLs). The loci themselves are not different in kind from Mendelian genes; they differ in effect size and in the statistical methodology required to find them.

How does QTL mapping work?

Classical QTL mapping in experimental species uses an F2 or recombinant inbred line (RIL) population from two divergent parental strains, genotypes them at hundreds to thousands of markers spanning the genome, and tests at each marker (or in each interval between markers) whether genotype is associated with the phenotype. Lander and Botstein 1989 introduced interval mapping, which uses maximum-likelihood to estimate the position and effect size of a putative QTL between two flanking markers, reporting a LOD score (logarithm of odds). LOD > 3 (corresponding to roughly P < 1e-4 per locus) is the conventional significance threshold, often raised to LOD > 4 or higher to control for multiple testing across the genome. Composite interval mapping (Zeng 1994) adds covariates from other markers to remove spurious peaks. The resolution of mapping is set by the number of recombination events in the population, typically yielding 10–30 cM intervals — meaning hundreds of genes per QTL window.

How is QTL different from GWAS?

QTL mapping uses experimental crosses (F2, backcross, RIL) with known parents and a small number of generations of recombination, exploiting the meiotic recombination within the cross to localise variants. GWAS uses unrelated individuals from a population and exploits historical recombination embedded in linkage-disequilibrium structure across the genome. QTL mapping has more statistical power per individual (because allele frequencies are 0.5 by design and family structure is known) but lower resolution (1–10 cM blocks). GWAS has lower per-individual power but vastly higher resolution, often pinpointing variants to within a few kilobases when LD blocks are short. The two approaches have converged: large biobank cohorts (UK Biobank, FinnGen, Million Veteran Program) are essentially GWAS at scale, while in model organisms diversity outbred mice and multi-parent advanced generation intercross (MAGIC) lines combine both strategies.

Why are QTL effect sizes typically small?

Most quantitative traits are polygenic — controlled by many loci with each contributing a small effect. Mathematical reasons: if many loci affect a trait additively, the total heritable variance is split among them, so each individual locus accounts for a small fraction. Evolutionary reasons: for a trait under stabilising selection (like body mass index in many environments), large-effect alleles are pulled to extreme frequencies (fixed or lost) by selection, leaving only small-effect variation segregating in the population. Empirically, in humans the largest height-associated SNPs explain roughly 0.4 percent of variance, the largest BMI SNPs around 0.3 percent, and the largest blood-pressure SNPs around 0.1 percent. Hundreds to thousands of variants each contribute a small slice, summed in polygenic risk scores.

What is missing heritability?

Missing heritability is the gap between heritability estimated from twin or family studies (e.g., 80 percent for adult human height) and the variance explained by the sum of all genome-wide-significant GWAS hits (which until ~2018 amounted to less than 30 percent for height even with millions of subjects). Several explanations have been validated. First, many small-effect loci are below the genome-wide significance threshold and contribute when summed with weaker filters. Second, rare variants (MAF < 1 percent) are poorly tagged on common-variant arrays. Third, gene-gene and gene-environment interactions add variance not captured in additive models. Fourth, some heritability estimates from twin studies overestimate (shared environment, assortative mating). The largest height GWAS to date (5.4 million individuals, Yengo 2022) recovered ~40 percent of variance from common variants alone, with rare-variant studies and biobank-scale designs continuing to close the gap.

What are eQTLs?

An expression QTL (eQTL) is a locus where genotype is associated with the expression level of a gene measured by RNA sequencing or microarray. cis-eQTLs sit close to the gene whose expression they affect (typically within 1 Mb), and trans-eQTLs sit elsewhere in the genome. The GTEx Project (2010–2017) measured eQTLs across 49 human tissues in ~1,000 donors and identified cis-eQTLs for nearly every protein-coding gene. eQTLs help interpret GWAS hits in non-coding regions: a disease-associated SNP whose effect on a nearby gene's expression is consistent with the disease biology gives a strong candidate causal mechanism. About 60–70 percent of GWAS-significant loci colocalise with at least one significant eQTL in a relevant tissue, supporting regulatory effects as the dominant mechanism for non-coding common-variant disease associations.