Organic Chemistry

Retrosynthetic Analysis

E. J. Corey's disconnection logic — work backward from target, identify synthons, simplify to commercial starting materials (Nobel 1990)

Retrosynthetic analysis is E. J. Corey's logic for designing organic syntheses by working backward from a target molecule. The chemist identifies strategic bonds, mentally cleaves them in reverse-arrow steps to yield "synthons" (idealized cation, anion, or radical fragments), maps each synthon to a real reagent ("synthetic equivalent"), and recurses on each precursor until all starting materials are commercially available. Corey received the 1990 Nobel Prize for this framework and built the LHASA computer program for it in 1969. Modern AI tools — IBM RXN, Synthia, AiZynthFinder — train neural networks on millions of literature reactions to suggest disconnections automatically, but the underlying logic is still Corey's.

  • FounderE. J. Corey 1960s
  • Nobel1990 (Corey)
  • First softwareLHASA 1969
  • Modern AI toolsSynthia, IBM RXN, AiZynth
  • Key conceptSynthon ≠ reagent
  • Best yield trickConvergence over linearity

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

Why retrosynthesis matters

  • Transformed synthesis into a discipline. Before Corey, synthesis was apprenticed intuition — a student watched a master and absorbed taste. After 1969, retrosynthetic analysis turned route design into an explicit, teachable, debuggable problem-solving framework with rules, conventions, and notation. Today every undergraduate organic class teaches it.
  • Convergence is the yield multiplier. A 10-step linear synthesis at 80% per step gives ~11% overall; a 10-step convergent (5+5) gives ~21%, twice as much. Lipitor (atorvastatin, Pfizer, $13B/yr peak) is made convergently — pharmaceutical process scientists pick convergent routes specifically to halve cost of goods.
  • LHASA was the first synthesis-AI. Corey's 1969 LHASA program pre-dated computer chemistry by decades. It encoded ~250 transforms and could propose multi-step disconnections for steroid-class targets. Today's AI descendants (Synthia, IBM RXN, AiZynthFinder) train on 5-50 million literature reactions and propose viable routes for ~70% of benchmark drug targets.
  • Scaffold-based drug discovery. Pharmaceutical companies maintain in-house libraries of 10-100 million compounds; retrosynthesis is used in reverse to suggest which scaffolds can be made from which available building blocks, mapping the "synthesizable space" of a project. Pfizer, Merck, and Novartis each run dedicated retrosynthesis teams of 5-20 chemists.
  • Total synthesis benchmarks. Robert Burns Woodward's strychnine (1954, 28 steps), R.B. Woodward's vitamin B12 (1973, 100 steps with Eschenmoser), and K. C. Nicolaou's Taxol (1994, 51 steps) are landmark retrosyntheses that showed the framework scales to molecular complexity. Strychnine has been re-synthesized 16+ times since, each in fewer steps as retrosynthetic understanding improved.
  • Process chemistry cost reductions. A successful retrosynthetic redesign can cut step count from 12 to 6, dropping cost of goods by 5-10x. The Sitagliptin (Januvia, Merck) process chemistry route was redesigned via biocatalytic retrosynthesis to remove a Rh-catalyzed step, saving Merck ~$50M/yr in 2010.
  • Teaches polarity matching. Disconnection forces the chemist to think in synthon terms — every bond-forming reaction in the forward direction has a polar logic (electrophile + nucleophile, or radical pair). Retrosynthesis exposes this logic and trains chemists to predict reactivity even for unfamiliar substrates.

Common misconceptions

  • Disconnect any bond and call it retrosynthesis. No — disconnections must yield synthons whose synthetic equivalents are realistic. Disconnecting a C-C bond between two saturated carbons of equal substitution is rarely useful because no realistic reagent corresponds to a saturated carbanion or carbocation. Strategic bonds are typically those flanked by heteroatoms (C-C alpha to C=O) or in rings.
  • The shortest synthesis is always best. Not at scale. A 12-step convergent route with cheap, non-toxic reagents and no chromatography can beat a 6-step linear synthesis that requires Pd at 5 mol% and a chiral auxiliary. Process chemistry at scale optimizes cost per kg, not step count.
  • Synthons must always be ions. Radical synthons are equally valid. The methyl radical CH3• is a synthon for a methyl reagent in radical chemistry; AIBN-initiated radical addition is the synthetic equivalent. Modern photoredox catalysis has expanded the radical-synthon palette dramatically (since ~2010), and retrosynthesis tools like IBM RXN now include radical disconnections.
  • Functional groups don't count as disconnections. They do — functional-group interconversions (FGI) are first-class retrosynthetic moves, drawn with a different arrow. An alcohol can be retrosynthetically traced to an aldehyde via NaBH4-reduction FGI, even though no C-C bond is broken. FGI mastery is what unlocks disconnections that would otherwise require impossible reagents.
  • You disconnect in the order you'll synthesize. Reverse — retrosynthesis goes target → starting materials. Synthesis goes starting materials → target. This is the entire point of the framework: planning happens in reverse, execution happens in forward.
  • AI retrosynthesis is a solved problem. Even the best 2024 tools (Synthia, IBM RXN) match expert chemists on ~40-70% of test cases but underperform on stereochemistry, ring-strain handling, and convergence-aware planning. Human expert taste is still required for non-trivial targets.

The disconnection workflow

The retrosynthetic workflow has four stages. Stage 1: Analyze the target. Identify functional groups, stereogenic centers, ring systems, and connectivity. The chemist annotates the target with key structural features and notes any symmetry that suggests a midpoint disconnection. Stage 2: Identify strategic bonds. These are bonds whose disconnection maximally simplifies the target — typically bonds in rings (especially newly closed rings), bonds adjacent to functional groups that can act as handles, and bonds at the molecular midpoint that yield two roughly equal precursors. Corey enumerated five quantitative rules for strategic-bond selection.

Stage 3: Disconnect each strategic bond and write the synthons. The retrosynthetic arrow ⇒ is drawn pointing from the target to the precursors. Each disconnection produces two (or more) synthons with assigned polarities — typically one cation (synthetic equivalent: electrophile) and one anion (synthetic equivalent: nucleophile). The chemist writes the real synthetic equivalent below each synthon: e.g., the acetyl cation CH3CO⁺ corresponds to acetyl chloride or acetic anhydride; the methyl carbanion CH3⁻ corresponds to MeMgBr or MeLi. Stage 4: Recurse. Treat each precursor as a new target, identify its strategic bonds, disconnect, and continue until all branches reach commercial starting materials. Multiple routes are typically generated and compared on overall yield, step count, convergence, cost, and safety.

Functional-group interconversion (FGI) supplements the disconnection toolkit. When the target's functional group makes the desired disconnection impossible (e.g., a carboxylic acid where you wanted to disconnect at an aldehyde-equivalent step), the chemist applies an FGI arrow to swap the functional group for a more favorable one. Each FGI corresponds to a known forward reaction (NaBH4: aldehyde → alcohol; DIBAL-H: ester → aldehyde; PCC: alcohol → aldehyde). Iterating disconnections and FGIs builds the complete retrosynthetic tree, which is then translated to the forward synthesis plan with each retro-arrow becoming a forward step in reverse order.

Convergent vs linear synthesis — yield and cost comparison

Synthesis styleStep countPer-step yield 80%Per-step yield 90%Cost-of-goods scalingBest use case
Linear (single chain)5 steps0.8⁵ = 33% overall0.9⁵ = 59% overallLinear in steps × material priceSimple targets, <5 steps
Linear10 steps0.8¹⁰ = 11%0.9¹⁰ = 35%Material accumulates through chainRare for >6 steps in pharma
Linear20 steps0.8²⁰ = 1.2%0.9²⁰ = 12%Prohibitive at scaleAcademic curiosities only
Convergent (2 branches)10 steps (5 + 5)0.8⁵ × 0.8⁵ × 0.8 = 21%0.9⁵ × 0.9⁵ × 0.9 = 31%~½ of linear costPharma APIs (Lipitor, Crizotinib)
Convergent (3 branches)15 steps (5 + 5 + 5)~17%~26%~⅓ of linear costComplex natural products
Iterative coupling10 building-block additions0.95¹⁰ = 60% (peptide-grade)0.99¹⁰ = 90% (DNA synthesis)Linear + automationPeptides, oligonucleotides, MIDA boronates
Total synthesis (Woodward strychnine 1954)28 linear~0.0001% reportedHeroic; not for productionMethodology development
Modern Vanderwal strychnine 20116 steps~13% overall1000× simpler than 1954Demonstrates retrosynthetic progress

Famous syntheses

  • R. B. Woodward's strychnine (1954, 28 steps). The benchmark that demonstrated total synthesis of a complex alkaloid was possible. Woodward chose his disconnections by intuition; modern retrosynthesis (Vanderwal 2011, 6 steps) cuts the route by 4×. Corey explicitly cited Woodward's syntheses as the inspiration for formalizing retrosynthetic logic.
  • R. B. Woodward and Albert Eschenmoser's vitamin B12 (1973, ~100 steps). The most ambitious synthesis ever attempted, requiring 100 graduate students over 11 years. Demonstrated that retrosynthetic disconnection could plan a target with 9 stereocenters and a unique cobalt-corrin macrocycle. Many of the disconnections (the Eschenmoser sulfide contraction, A-B ring annulation) became standard methodology.
  • K. C. Nicolaou's Taxol (1994, 51 steps). Anticancer drug originally isolated from Pacific yew bark. The retrosynthetic plan disconnects the strained 8-membered B ring as the strategic bond, with two convergent fragments meeting at a key Heck or McMurry coupling. Total synthesis demonstrated that natural products could be re-engineered when supply was limited.
  • E. J. Corey's prostaglandin syntheses (1969-1980s). Corey's group built nearly all major prostaglandins from a common bicyclic intermediate (the "Corey lactone") via convergent disconnections. This showcased the power of identifying a single strategic bicyclic precursor and elaborating outward — a unifying retrosynthetic insight that produced ~50 different prostaglandins from one platform.
  • Industrial Lipitor (atorvastatin, Pfizer). The world's best-selling drug ($13B/yr peak). Process chemistry uses convergent retrosynthesis: the chiral side-chain (4 stereocenters) is built enzymatically in one branch; the heterocyclic core in another; they unite via Paal-Knorr and Knoevenagel-Doebner condensations. Cost of goods at multi-tonne scale <$100/kg.

Frequently asked questions

What is a synthon and how does it differ from a reagent?

A synthon is an idealized cation, anion, or radical fragment that captures the polarity needed for a bond-forming reaction in the forward direction. The acetyl cation CH3CO⁺ is a synthon for an acyl electrophile; the corresponding real reagent is acetyl chloride CH3COCl or acetic anhydride. The methyl carbanion CH3⁻ is a synthon for a methyl nucleophile; the real reagent is methylmagnesium bromide CH3MgBr or methyllithium CH3Li. Synthons are conceptual — they let you reason about polarity matching at the disconnection — but you actually run the reaction with the corresponding 'synthetic equivalent.' This translation step is where most retrosynthetic mistakes happen: a clever disconnection on paper might require a synthon (e.g., the formyl anion HC(=O)⁻) for which no good synthetic equivalent exists, in which case the disconnection is impractical.

What are the rules for choosing a strategic bond to disconnect?

Corey's strategic-bond rules are: (1) Disconnect bonds in or near rings rather than in chains — ring closures are usually the hardest steps so removing them up front simplifies most. (2) Disconnect bonds that maximize size reduction — break C-C bonds at the symmetric midpoint or near a quaternary carbon. (3) Disconnect bonds adjacent to heteroatoms or functional groups that can act as 'handles' for known reactions (C=O, OH, NH, halide). (4) Disconnect bonds whose forward formation has many literature precedents (Suzuki coupling, aldol, Diels-Alder are favored disconnections). (5) Disconnect bonds whose disconnection produces enantiopure or stereodefined precursors when the target is chiral. The bond meeting the most rules simultaneously is the 'strategic bond.' On a complex polycyclic target there are typically 3-10 candidate strategic bonds; comparing routes rooted in each is the core skill of retrosynthesis.

How does a chemist evaluate one retrosynthetic route against another?

Several quantitative criteria are used. (1) Step count — fewer steps means higher overall yield and less waste; a 10-step synthesis with 80% per step gives ~10% overall, while 15 steps at 80% gives ~3.5%. (2) Convergence — combining two or three branches into the target near the end gives much better overall yield than a linear assembly. A 10-step convergent route with 5+5 branches gives ~25% overall vs ~10% for the same 10-step linear synthesis. (3) Strategic-bond strength — does the longest branch contain a difficult ring closure or a stereochemistry-setting step? Such bottleneck steps should be early, when material is cheap. (4) Cost of starting materials — pennies/g vs $1000/g matters at scale. (5) Reagent toxicity, hazard, and waste — mCPBA at 100 g is fine, at 1 kg requires safety review. The route that wins on aggregate of these is the chosen synthesis.

What is convergent synthesis and why is it superior to linear?

Linear synthesis builds the target step by step, with each intermediate increasing in mass through the route. If each step has 80% yield, a 10-step linear synthesis delivers 0.8¹⁰ ≈ 10.7% overall. Convergent synthesis builds two or more roughly equal-sized fragments separately, then couples them together in the last step or two. A 10-step convergent synthesis built from two 5-step branches that meet at the end delivers 0.8⁵ × 0.8⁵ × 0.8 ≈ 21% — twice the overall yield. The advantage grows with longer routes: a 20-step linear synthesis at 80% per step gives 1.2% overall, while a 20-step convergent (10+10) gives ~9%. Convergence is central to industrial process chemistry — Atorvastatin (Lipitor) and Crizotinib are made convergently with separate fragments carried in parallel and joined at the end, and the cost of goods is 5-10x lower than a linear approach would give.

How do AI retrosynthesis tools like Synthia and IBM RXN work?

Modern AI retrosynthesis platforms train a neural network (typically a transformer or graph neural network) on a database of 5-50 million literature reactions extracted from journals and patents. Given a target SMILES, the model proposes a ranked list of disconnections by predicting which precursor pair is most likely to lead to the target. The platform recurses on each precursor, building a tree of routes. Search algorithms (Monte Carlo Tree Search, AlphaGo-style reward signals) prune the tree by estimating the cost and feasibility of each branch, terminating when commercial starting materials are reached. IBM RXN, Reaxys Retrosynthesis Planner, AiZynthFinder (open source from AstraZeneca), and Synthia (Chematica, founded by Bartosz Grzybowski) are all in production use. Performance: on benchmark targets, AI tools propose viable routes in ~70% of cases and propose at least one route that an expert chemist would also choose ~40% of the time; they exceed expert performance in raw step counts but lag in route elegance and stereoselectivity.

What is the difference between functional-group interconversion (FGI) and disconnection?

Disconnection breaks a bond and produces two synthons; FGI changes one functional group into another without breaking the carbon skeleton. For example, a target containing a primary alcohol might be retrosynthetically traced back to an aldehyde (FGI: NaBH4 reduces aldehyde to alcohol in the forward direction), then to an ester (FGI: DIBAL-H reduces ester to aldehyde), then disconnected at the ester C-O bond (true disconnection: ester ← carboxylic acid + alcohol via Fischer esterification). FGI is critical because the strategic-bond disconnection often requires the molecule to be in a specific oxidation state or with a specific functional group present; FGI lets the chemist 'set up' for the disconnection. In Corey's notation, the FGI arrow is drawn with =FGI= over it, while disconnection is drawn with the simple double-line retrosynthetic arrow ⇒. Mastery of FGI is what separates a workable retrosynthesis from one that gets stuck at an impossible reagent step.