Algorithms

Genetic Algorithms

Breed better answers — let bad ideas die and good ones have children

A genetic algorithm is a population-based optimizer that evolves candidate solutions over generations using selection, crossover, and mutation, trading guaranteed optimality for the ability to search huge, rugged, non-differentiable spaces.

ClassStochastic metaheuristic
Work per runO(G · P · f)
OptimalityNo guarantee
Typical population50–500
Mutation rate≈ 1/L per gene

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

How a genetic algorithm works

Imagine you can't solve a problem by reasoning, but you can score any answer someone hands you. That's the situation a genetic algorithm thrives in. Instead of computing a solution, it keeps a whole population of random candidate solutions, scores each one with a fitness function, and then breeds the next generation by favoring the high-scoring candidates — exactly the way natural selection breeds organisms that survive their environment.

The idea was formalized by John Holland at the University of Michigan in the 1960s and popularized in his 1975 book Adaptation in Natural and Artificial Systems. The loop is short and almost mechanical:

Initialize a population of P random chromosomes (each chromosome encodes one candidate solution).
Evaluate every chromosome's fitness.
Select parents, biased toward higher fitness.
Crossover pairs of parents to produce offspring that mix their genes.
Mutate offspring with small random changes.
Replace the old population with the offspring and loop back to step 2.

Each pass through steps 2–6 is a generation. Run enough generations and the average fitness of the population climbs, the way a species adapts over many lifetimes. The algorithm never knows what "good" means — it only knows which candidates scored higher than their neighbors, and lets that comparison do all the work.

Selection, crossover, and mutation in detail

A genetic algorithm is built from three operators, and the whole behavior hinges on getting their balance right.

Selection decides who reproduces. The cleanest method is tournament selection: pick k individuals at random (typically k = 2 or 3) and keep the fittest of the group. Larger k means higher selection pressure — the strong dominate faster, but diversity collapses sooner. The older roulette-wheel (fitness-proportionate) method gives each individual a slice of a wheel proportional to its fitness, but it's fragile: one super-fit individual can grab the whole wheel and freeze the search.

Crossover recombines two parents into offspring. With a binary or array chromosome of length L, single-point crossover picks a cut point and swaps tails; two-point swaps a middle segment; uniform crossover flips a coin per gene. Crossover is the exploitation operator — it assumes useful sub-structures ("building blocks") already exist in the population and tries to combine them.

parent A:  1 1 0 1 | 0 1 1 0
parent B:  0 0 1 0 | 1 1 0 1
                   ^ cut point (single-point crossover)
child  1:  1 1 0 1 | 1 1 0 1
child  2:  0 0 1 0 | 0 1 1 0

Mutation flips or perturbs individual genes with a small probability — usually around 1/L, so roughly one gene changes per chromosome. Mutation is the exploration operator: it injects genetic material no parent had, which is the only way to escape a population that has converged to a single point. Too little mutation and the search stagnates; too much and the algorithm degrades into random search.

The theoretical justification Holland offered is the schema theorem: short, low-order patterns of genes ("schemata") that correlate with above-average fitness receive exponentially increasing trials over generations. The building-block hypothesis extends this — good solutions are assembled from good building blocks via crossover. Both are useful intuition pumps rather than convergence guarantees; modern theory treats the building-block hypothesis as a rough heuristic that can fail on deceptive problems.

When to use a genetic algorithm

Black-box objectives. You can score a solution (a simulation result, a game score, a circuit layout) but you have no formula and no gradient to follow.
Discrete and combinatorial spaces. Scheduling, routing, bin packing, feature selection — domains where you can't take a derivative.
Rugged, multi-modal landscapes with many local optima, where a single hill-climber would get trapped but a diverse population can hold several basins at once.
"Good enough, fast" beats "optimal, slow." When a 95%-of-optimal answer found in seconds is worth more than the exact optimum found in hours.

Reach for something else when the structure of your problem gives you leverage. A smooth differentiable objective belongs to gradient descent. A problem with optimal substructure belongs to dynamic programming. A pure single-solution local search with a temperature schedule is often handled more simply by simulated annealing. Genetic algorithms shine precisely when none of those structural assumptions hold.

Genetic algorithm vs other optimizers

	Genetic algorithm	Simulated annealing	Hill climbing	Gradient descent	Particle swarm	Exhaustive search
State at a time	Population (many)	Single solution	Single solution	Single point	Population (many)	One at a time
Needs a gradient?	No	No	No	Yes	No	No
Escapes local optima?	Yes (diversity + mutation)	Yes (uphill moves cool)	No	No (without restarts)	Partially	N/A (finds global)
Finds global optimum?	No guarantee	Asymptotic only	No	Convex only	No guarantee	Yes, but exponential
Recombines solutions?	Yes (crossover)	No	No	No	Velocity-based, no crossover	No
Cost per generation/step	O(P · f)	O(f)	O(f)	O(∇f)	O(P · f)	O(f)
Best for	Discrete, black-box, rugged	Single rugged landscape	Smooth, near a peak	Smooth differentiable	Continuous, rugged	Tiny search spaces

The defining difference is the population plus crossover. Simulated annealing and hill climbing carry one solution and perturb it; a genetic algorithm carries many and lets them recombine, which can stitch together partial answers discovered in different corners of the space. That same population is also the cost: every generation pays P fitness evaluations, so genetic algorithms are wasteful when each evaluation is expensive and the landscape is actually smooth.

What the numbers actually say

Total work is O(G · P · f). With G generations, population P, and fitness cost f, a run with G = 500, P = 200 makes 100,000 fitness evaluations. If each evaluation is a 1 ms simulation, that's 100 seconds; if each is a 10-second physics solve, that's 11.5 days — which is why expensive fitness functions force you to shrink P and G or parallelize.
The operators themselves are cheap. Selection, crossover, and mutation are O(L) per individual, so for typical chromosome lengths they vanish next to the fitness call. Profile the fitness function first — it is almost always 99%+ of the runtime.
Mutation rate ≈ 1/L. For a length-100 chromosome that's a 1% per-gene flip probability, so on average one gene changes per child — enough to explore, rare enough to preserve good structure.
Embarrassingly parallel. The P fitness evaluations within a generation are independent, so a genetic algorithm scales nearly linearly across cores or machines — an advantage gradient descent's sequential steps don't share.
No big-O on convergence. Unlike sorting or shortest paths, there is no general bound on how many generations reach a target fitness; it depends entirely on the landscape and operator tuning. Always cap the run with a generation limit or a "no improvement for N generations" stop.

JavaScript implementation

A complete genetic algorithm for a classic toy problem — evolving a random bit string to match a target. The structure (initialize, evaluate, select, cross, mutate, elitism) is identical for any real problem; only the chromosome encoding and fitness function change.

function geneticAlgorithm(target, {
  popSize = 200, mutationRate = 1 / target.length,
  crossoverRate = 0.9, generations = 500, tournament = 3,
} = {}) {
  const L = target.length;
  const randGene = () => (Math.random() < 0.5 ? 0 : 1);
  const fitness = c => c.reduce((s, g, i) => s + (g === target[i] ? 1 : 0), 0);

  // 1. Initialize
  let pop = Array.from({ length: popSize },
    () => Array.from({ length: L }, randGene));

  const select = scored => {           // tournament selection
    let best = scored[(Math.random() * popSize) | 0];
    for (let i = 1; i < tournament; i++) {
      const c = scored[(Math.random() * popSize) | 0];
      if (c.fit > best.fit) best = c;
    }
    return best.genes;
  };

  for (let gen = 0; gen < generations; gen++) {
    // 2. Evaluate
    const scored = pop.map(genes => ({ genes, fit: fitness(genes) }));
    scored.sort((a, b) => b.fit - a.fit);
    if (scored[0].fit === L) return { solution: scored[0].genes, gen };

    // Elitism: carry the single best unchanged
    const next = [scored[0].genes.slice()];

    while (next.length < popSize) {
      const a = select(scored), b = select(scored);
      let child = a.slice();
      if (Math.random() < crossoverRate) {     // 4. single-point crossover
        const cut = 1 + ((Math.random() * (L - 1)) | 0);
        child = a.slice(0, cut).concat(b.slice(cut));
      }
      for (let i = 0; i < L; i++)               // 5. mutation
        if (Math.random() < mutationRate) child[i] ^= 1;
      next.push(child);
    }
    pop = next;
  }
  // Return best found even if target was never hit
  const final = pop.map(g => ({ g, fit: fitness(g) }))
                   .sort((a, b) => b.fit - a.fit)[0];
  return { solution: final.g, gen: generations };
}

Two details worth flagging. First, elitism — copying the best individual unchanged into the next generation — guarantees the best-so-far fitness never decreases, which is the cheapest insurance against a lucky solution being lost to crossover. Second, the early-exit on fit === L handles the case where evolution solves the problem before the generation cap; real problems usually run to the cap or a stagnation check instead.

Python implementation

The same algorithm, plus a famous application: evolving a tour for the Travelling Salesman Problem. TSP needs permutation chromosomes, so it uses order crossover (OX) and swap mutation instead of the bit-string operators — a good illustration of how the encoding drives the operator choice.

import random

def tsp_genetic(dist, pop_size=300, generations=800,
                mutation_rate=0.2, tournament=4):
    n = len(dist)
    cities = list(range(n))

    def tour_len(t):
        return sum(dist[t[i]][t[(i + 1) % n]] for i in range(n))

    # fitness = negative length (shorter is better)
    def fitness(t): return -tour_len(t)

    def order_crossover(a, b):            # OX: keep a slice of a, fill from b
        i, j = sorted(random.sample(range(n), 2))
        child = [None] * n
        child[i:j] = a[i:j]
        fill = [c for c in b if c not in child[i:j]]
        k = 0
        for p in range(n):
            if child[p] is None:
                child[p] = fill[k]; k += 1
        return child

    def mutate(t):                        # swap two cities
        if random.random() < mutation_rate:
            i, j = random.sample(range(n), 2)
            t[i], t[j] = t[j], t[i]
        return t

    pop = [random.sample(cities, n) for _ in range(pop_size)]

    def select():                         # tournament selection
        group = random.sample(pop, tournament)
        return max(group, key=fitness)

    best = min(pop, key=tour_len)
    for _ in range(generations):
        nxt = [best[:]]                   # elitism
        while len(nxt) < pop_size:
            child = order_crossover(select(), select())
            nxt.append(mutate(child))
        pop = nxt
        gen_best = min(pop, key=tour_len)
        if tour_len(gen_best) < tour_len(best):
            best = gen_best
    return best, tour_len(best)

Notice how naive single-point crossover would break a TSP tour — it would produce a sequence with duplicate and missing cities, an invalid permutation. Order crossover preserves the permutation property by construction. The lesson generalizes: in a genetic algorithm, the hard design work is choosing an encoding and operators that keep offspring valid and meaningful.

Variants worth knowing

Steady-state GA. Instead of replacing the whole population each generation, replace just one or two individuals per step (usually the worst). Lower memory churn and the best solutions stay in the gene pool longer; common in real-time and online settings.

Island model (coarse-grained parallel GA). Run several semi-isolated subpopulations on separate cores, each evolving independently, and periodically migrate a few migrants between islands. Isolation preserves diversity and fights premature convergence; it parallelizes almost perfectly.

NSGA-II. The standard multi-objective genetic algorithm (Deb et al., 2002). It ranks individuals by Pareto dominance and crowding distance, returning a whole Pareto front of trade-off solutions rather than one answer — used when you must balance, say, cost against weight against speed.

Genetic programming. The chromosome is a program tree (often Lisp-like expressions), not a fixed-length string. Crossover swaps subtrees; the algorithm evolves code or symbolic formulas. Pioneered by John Koza in the early 1990s.

CMA-ES and differential evolution. For continuous-valued optimization, these often beat a vanilla GA. CMA-ES (Covariance Matrix Adaptation Evolution Strategy) adapts a search distribution; differential evolution mutates by adding scaled differences of population members. Both are evolutionary, but neither uses bit-string crossover.

Common bugs and edge cases

Premature convergence. The whole population becomes near-identical, crossover produces clones, and progress halts. Counter with higher mutation, larger populations, lower selection pressure, fitness sharing, or the island model.
Forgetting elitism. Without carrying the best individual forward, a great solution can be destroyed by crossover or mutation and never recovered. Always keep at least one elite — or track the global best separately.
Roulette-wheel selection with raw fitness. If fitness values are huge or negative (e.g. negative tour lengths), the proportionate math breaks. Rank-based or tournament selection sidesteps the problem entirely.
Invalid offspring from naive crossover. On permutations (TSP, scheduling) a plain cut-and-swap yields chromosomes with duplicate or missing genes. Use order, partially-mapped (PMX), or cycle crossover instead.
An expensive fitness function with a huge population. The cost is O(G · P · f); a 30-second simulation × 200 individuals × 500 generations is 3 million seconds — about 35 days of serial compute. Cache repeated evaluations, parallelize the generation, or shrink P and G.
Treating it as a guaranteed solver. A genetic algorithm can return a mediocre answer and look like it "converged." Always log the best fitness per generation and stop on stagnation, not just a fixed generation count.

Frequently asked questions

Does a genetic algorithm find the global optimum?

Not guaranteed. A genetic algorithm is a stochastic metaheuristic with no convergence proof for finite runs — it can stall on a local optimum if the population loses diversity. With elitism it never loses the best-so-far solution, but reaching the true global optimum is a matter of probability, time, and good operator design, not a theorem.

What's the difference between crossover and mutation?

Crossover recombines two parents into offspring, exploiting good building blocks the population already discovered. Mutation randomly perturbs a single individual, injecting new genetic material to explore regions no parent covers. Crossover exploits; mutation explores. Drop mutation and the population converges and freezes; drop crossover and you have a parallel random walk.

Why is tournament selection preferred over roulette-wheel selection?

Roulette-wheel (fitness-proportionate) selection breaks when fitness values are skewed: one super-individual hogs all the slots and the population collapses early, or, late in a run, near-equal fitnesses make selection nearly random. Tournament selection picks k random individuals and keeps the best, so it only depends on fitness rank, not raw magnitude — it's scale-invariant, needs no normalization, and runs in O(k) per pick.

How big should the population be and how long should it run?

Common defaults are a population of 50–500, a crossover rate around 0.7–0.9, and a per-gene mutation rate near 1/L where L is the chromosome length. Total work is O(generations × population × fitness-cost), so a 200-individual population over 500 generations means 100,000 fitness evaluations — fine if each costs microseconds, painful if each costs a physics simulation.

When should I use a genetic algorithm instead of gradient descent?

Use a genetic algorithm when the search space is discrete, combinatorial, or non-differentiable, when the fitness landscape is rugged with many local optima, or when the objective is a black box (a simulation, a game score) with no gradient. If you have a smooth, differentiable objective, gradient descent is far faster — genetic algorithms are a last resort for problems where calculus doesn't apply.

What is premature convergence and how do you prevent it?

Premature convergence is when the population becomes nearly identical before finding a good solution, so crossover produces clones and the search stalls. Fixes include higher mutation rates, larger populations, fitness sharing or crowding to penalize duplicates, lower selection pressure (smaller tournaments), and the island model where semi-isolated subpopulations evolve separately and occasionally exchange migrants.