Algorithms
The Simplex Algorithm
The corner-hopping engine behind 70 years of linear programming
The simplex algorithm solves a linear program by walking from vertex to vertex along the edges of the feasible polytope, pivoting toward better objective values until no improving edge remains — the optimum sits at a corner.
- InventedDantzig, 1947
- Worst caseexponential (Klee–Minty)
- Typical pivots≈ 2–3 (m + n)
- Per-pivot costO(m·n) dense
- Optimum lives ata vertex
Interactive visualization
Press play, or step through manually. The visualization is yours to drive — try it before reading on.
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The corner-hopping intuition
Suppose a factory makes two products. Each unit of product A nets $3 profit and each unit of B nets $5, but they share limited machine hours, labor, and raw material. How many of each should you build? Write the profit as maximize 3x + 5y and the limits as linear inequalities like x + 2y ≤ 14. Those inequalities carve a convex region out of the plane — a polygon in 2-D, a polytope in higher dimensions — called the feasible region. Every point inside satisfies all the constraints.
Here is the fact that makes the whole field tractable: because both the objective and the constraints are linear, the maximum of 3x + 5y is always reached at a corner of that polytope. You never need to consider the interior. The simplex algorithm exploits this by starting at one corner and repeatedly walking along an edge to a neighboring corner that increases profit. When no adjacent corner is better, you're done — that corner is the global optimum, guaranteed, because a linear function over a convex set has no local maxima that aren't also global.
George Dantzig devised this procedure in 1947 for U.S. Air Force logistics. It was named one of the top ten algorithms of the 20th century, and a slightly tuned version still ships in nearly every commercial optimizer.
The mechanics: slack variables and the tableau
To compute, simplex first turns inequalities into equalities by adding slack variables. The constraint x + 2y ≤ 14 becomes x + 2y + s₁ = 14 with s₁ ≥ 0; the slack measures how much room is left under that limit. With m constraints and n original variables you get a system of m equations in m + n variables.
A basic feasible solution (BFS) picks m of those variables to be "basic" (free to take positive values) and forces the other n to zero. Each BFS corresponds to exactly one vertex of the polytope. Starting all original variables at zero — the origin — makes the slacks basic, which is feasible whenever the origin satisfies the constraints. From there each iteration does three things:
- Pricing — pick an entering variable. Look at the objective row. A nonbasic variable with a positive reduced cost (for a maximization) will raise the objective if you let it grow from zero. Dantzig's rule picks the most positive one.
- Ratio test — pick a leaving variable. As the entering variable grows, some basic variable hits zero first. The minimum ratio test finds which one:
min(bᵢ / aᵢ)over rows with a positive coefficientaᵢ. This is how far you can slide along the edge before bumping the next constraint. - Pivot. Swap the entering and leaving variables in the basis using Gaussian elimination on the tableau. You've now moved to the adjacent vertex.
Repeat until no nonbasic variable has a positive reduced cost. That tableau encodes the optimal vertex, the optimal objective value, and — for free — the dual prices.
Complexity: great in practice, terrible in theory
Each pivot is one Gaussian-elimination sweep of the tableau, costing O(m·(m + n)) arithmetic in the dense textbook form, or roughly O(m·n). The open question is how many pivots you need.
In 1972 Victor Klee and George Minty built a deformed hypercube — the Klee–Minty cube — on which Dantzig's pivot rule visits all 2ⁿ vertices. So the worst case is exponential, and to this day no pivot rule is known to be polynomial in the worst case. Yet on real problems simplex almost always terminates in about 2(m + n) to 3(m + n) pivots — linear in the problem size. Smoothed analysis (Spielman and Teng, 2004) explained why: a tiny random perturbation of any instance makes the expected pivot count polynomial, so the bad cases are vanishingly rare. That gap between worst-case and observed behavior is the most famous feature of the algorithm.
When to reach for simplex
- Small to medium LPs with sparse constraint matrices — scheduling, blending, transportation, network flow relaxations. The revised simplex method exploits sparsity beautifully.
- Problems you re-solve repeatedly after small edits. Simplex warm-starts from the previous optimal basis in a handful of pivots — invaluable inside branch-and-bound for integer programs, where you solve thousands of nearly-identical LPs.
- When you need an exact vertex solution and dual prices, e.g. shadow prices in economics or sensitivity analysis.
Reach for an interior-point method instead when the problem is very large and dense, when you can't afford the exponential tail risk, or when you don't need warm-starting. For integer decision variables, neither solves the problem alone — you wrap an LP solver in branch-and-bound or cutting planes.
Simplex vs other optimization methods
| Simplex | Interior-point | Ellipsoid | Gradient descent | Branch & bound | |
|---|---|---|---|---|---|
| Problem class | Linear programs | LP & convex | LP & convex | Smooth (often nonconvex) | Integer / combinatorial |
| Path | Polytope edges, vertex to vertex | Through the interior (central path) | Shrinking ellipsoids | Downhill in continuous space | Tree of LP relaxations |
| Worst case | Exponential (Klee–Minty) | Polynomial | Polynomial (first proof, 1979) | No global guarantee | Exponential |
| Typical speed | Fast on sparse small/medium | Fast on large/dense | Slow — theoretical only | Depends on conditioning | Depends on bound tightness |
| Warm-start | Excellent (reuse the basis) | Poor | Poor | Moderate | Built on warm-started LPs |
| Solution type | Exact vertex + dual prices | Interior point near a vertex | Approximate | Local optimum | Provably optimal integer point |
| Real-world use | CPLEX, Gurobi, GLPK | CPLEX/Gurobi barrier, Mosek | Mostly a theoretical landmark | ML training | MIP solvers everywhere |
The headline is that the ellipsoid method (Khachiyan, 1979) finally proved LP is polynomial-time — a theoretical breakthrough — but it's too slow to use. Karmarkar's 1984 interior-point method was both polynomial and practical, ending simplex's monopoly. Modern solvers ship both and pick per problem.
What the numbers actually say
- Klee–Minty in dimension n forces 2ⁿ − 1 pivots. At n = 30 that's over a billion pivots for a problem with just 30 variables and 30 constraints — yet equivalent random 30-variable LPs solve in roughly 60–90 pivots.
- Observed pivot count scales roughly linearly in m + n. A model with 10,000 constraints typically optimizes in tens of thousands of pivots, not the astronomically large worst case.
- The revised simplex method stores and updates an
m × mbasis factorization instead of the full tableau. For a problem with 50,000 columns but only 2,000 rows, it works with a 2,000 × 2,000 object rather than a 2,000 × 52,000 one — the dominant reason production solvers use it. - Degeneracy is common, cycling is not. Real solvers see degenerate pivots constantly but almost never cycle, because tiny numerical perturbation and anti-cycling rules break ties; a textbook pure simplex with no safeguard can loop forever.
JavaScript implementation
A compact standard-form simplex (maximize cᵀx subject to Ax ≤ b, x ≥ 0, b ≥ 0) using a dense tableau and Bland's rule for safety:
// maximize c·x s.t. A x <= b, x >= 0, assuming b >= 0 (origin feasible).
function simplex(A, b, c) {
const m = A.length, n = c.length;
// Tableau rows 0..m-1 are constraints (with slacks), last row is objective.
// Columns: n structural + m slack + 1 RHS.
const W = n + m + 1;
const T = Array.from({ length: m + 1 }, () => new Array(W).fill(0));
for (let i = 0; i < m; i++) {
for (let j = 0; j < n; j++) T[i][j] = A[i][j];
T[i][n + i] = 1; // slack
T[i][W - 1] = b[i]; // RHS
}
for (let j = 0; j < n; j++) T[m][j] = -c[j]; // objective row (minimize -c·x)
const basis = Array.from({ length: m }, (_, i) => n + i); // slacks start basic
while (true) {
// Bland's rule: lowest-index column with negative objective coefficient.
let pivCol = -1;
for (let j = 0; j < W - 1; j++) {
if (T[m][j] < -1e-9) { pivCol = j; break; }
}
if (pivCol === -1) break; // optimal: no improving column
// Minimum ratio test for the leaving row.
let pivRow = -1, best = Infinity;
for (let i = 0; i < m; i++) {
if (T[i][pivCol] > 1e-9) {
const ratio = T[i][W - 1] / T[i][pivCol];
if (ratio < best - 1e-12 ||
(Math.abs(ratio - best) <= 1e-12 && basis[i] < basis[pivRow])) {
best = ratio; pivRow = i;
}
}
}
if (pivRow === -1) return { status: 'unbounded' };
// Pivot: normalize the pivot row, eliminate the column elsewhere.
const pv = T[pivRow][pivCol];
for (let j = 0; j < W; j++) T[pivRow][j] /= pv;
for (let i = 0; i <= m; i++) {
if (i === pivRow) continue;
const f = T[i][pivCol];
if (f !== 0) for (let j = 0; j < W; j++) T[i][j] -= f * T[pivRow][j];
}
basis[pivRow] = pivCol;
}
const x = new Array(n).fill(0);
for (let i = 0; i < m; i++) if (basis[i] < n) x[basis[i]] = T[i][W - 1];
return { status: 'optimal', x, value: T[m][W - 1] };
}
// maximize 3x + 5y s.t. x <= 4, 2y <= 12, 3x + 2y <= 18
console.log(simplex([[1,0],[0,2],[3,2]], [4,12,18], [3,5]));
// => { status:'optimal', x:[2,6], value:36 }
Two details carry the correctness. The objective row stores -c so that "no negative entry" means optimal, and the RHS cell T[m][W-1] accumulates the objective value automatically as you pivot. Using the lowest-index tie-break in both the pricing and the ratio test is Bland's rule, which guarantees termination even on degenerate problems.
Python implementation
The same algorithm with NumPy, plus the practical advice that you should normally call a library:
import numpy as np
def simplex(A, b, c, tol=1e-9):
"""maximize c·x s.t. A x <= b, x >= 0, b >= 0."""
A, b, c = map(np.asarray, (A, b, c))
m, n = A.shape
W = n + m + 1
T = np.zeros((m + 1, W))
T[:m, :n] = A
T[:m, n:n + m] = np.eye(m) # slacks
T[:m, -1] = b
T[m, :n] = -c # objective row
basis = list(range(n, n + m)) # slacks start basic
while True:
col = next((j for j in range(W - 1) if T[m, j] < -tol), None)
if col is None: # optimal
x = np.zeros(n)
for i, bi in enumerate(basis):
if bi < n: x[bi] = T[i, -1]
return {"status": "optimal", "x": x, "value": T[m, -1]}
ratios = [(T[i, -1] / T[i, col], basis[i], i)
for i in range(m) if T[i, col] > tol]
if not ratios:
return {"status": "unbounded"}
_, _, row = min(ratios) # min ratio, Bland tie-break on basis index
T[row] /= T[row, col]
for i in range(m + 1):
if i != row and abs(T[i, col]) > tol:
T[i] -= T[i, col] * T[row]
basis[row] = col
# maximize 3x + 5y subject to x <= 4, 2y <= 12, 3x + 2y <= 18
print(simplex([[1, 0], [0, 2], [3, 2]], [4, 12, 18], [3, 5]))
# {'status': 'optimal', 'x': array([2., 6.]), 'value': 36.0}
# In production, prefer the battle-tested solver:
from scipy.optimize import linprog
res = linprog(c=[-3, -5], # linprog minimizes
A_ub=[[1, 0], [0, 2], [3, 2]],
b_ub=[4, 12, 18], method="highs")
print(-res.fun, res.x) # 36.0 [2. 6.]
This teaching version assumes b ≥ 0 so the origin is feasible. The moment a constraint makes the origin infeasible — a ≥ constraint or an equality — you need Phase I to find a starting vertex, which is why real code reaches for HiGHS or Gurobi rather than the loop above.
Variants worth knowing
Revised simplex. Instead of carrying the full m × (m + n) tableau, maintain only a factorization of the m × m basis matrix and recompute columns on demand. This is the form every serious solver uses; it slashes both memory and the per-pivot cost on sparse problems.
Two-phase and Big-M methods. Both manufacture an initial feasible vertex when the origin isn't one. Two-phase minimizes a sum of artificial variables first; Big-M folds a huge penalty for artificials into a single objective. Two-phase is numerically safer because Big-M's giant constant wrecks floating-point conditioning.
Dual simplex. Keeps the objective row optimal and works toward feasibility, the mirror image of the primal. It's the workhorse inside branch-and-bound: after you add a cutting plane or branching bound, the old solution stays dual-feasible, so dual simplex warm-starts in a few pivots.
Pivot rules. Dantzig's "most negative reduced cost" is greedy and fast but can cycle. Bland's rule (lowest index) guarantees termination but is slow. Steepest-edge and Devex rules estimate the true objective gain per step and are what production solvers actually default to.
Network simplex. A specialization for min-cost-flow problems where the basis is always a spanning tree, so pivots are tree updates rather than matrix operations — orders of magnitude faster for transportation and assignment problems.
Common bugs and edge cases
- Cycling on degenerate problems. Without an anti-cycling rule, a naive Dantzig implementation can pivot forever among bases that all describe the same vertex. Use Bland's rule, lexicographic ordering, or a perturbation to break ties.
- Assuming the origin is feasible. The simple tableau only works when
b ≥ 0. A single≥or=constraint needs Phase I, or your loop starts at an infeasible point and produces garbage. - Missing the unbounded case. If the entering column has no positive entry, no basic variable ever leaves and the objective can grow forever. The ratio test must detect this and report "unbounded," not crash with an empty min.
- Numerical drift in the tableau form. Repeated Gaussian elimination accumulates floating-point error; over many pivots the tableau loses accuracy. Production solvers periodically refactorize the basis and use tolerances rather than exact zero tests.
- Confusing simplex with the "downhill simplex" (Nelder–Mead). Despite the shared name, Nelder–Mead is an unrelated derivative-free method for nonlinear optimization. Dantzig's simplex is specifically for linear programs.
- Forgetting that integer answers aren't free. An LP optimum can land at a fractional vertex like x = 2.5. If your variables must be whole units, rounding is generally wrong — you need integer programming (branch-and-bound) on top.
Frequently asked questions
Why is the optimum of a linear program always at a vertex?
Because the objective is linear and the feasible region is a convex polytope. A linear function over a convex set attains its maximum on the boundary, and on a polytope the boundary's lowest-dimensional pieces are its vertices. If an edge is improving you slide along it to its endpoint; you can always keep moving to a corner without ever decreasing the objective.
Is the simplex algorithm polynomial time?
No. Klee and Minty showed in 1972 that Dantzig's pivot rule can visit all 2^n vertices of a distorted n-dimensional cube, making it exponential in the worst case. In practice it runs in roughly 2(m+n) to 3(m+n) pivots, which is why it stayed the dominant LP solver for decades despite the bad worst case.
What is degeneracy and how does cycling happen?
A basic feasible solution is degenerate when a basic variable equals zero, meaning more than n constraints are tight at one vertex. A pivot can then change the basis without moving the vertex or improving the objective, and the algorithm can loop forever through the same point. Bland's rule (always pick the lowest-index eligible variable) provably prevents cycling.
How do you find a starting vertex if the origin isn't feasible?
Use a two-phase method or the Big-M method. Phase I adds artificial variables and minimizes their sum; if that minimum is zero you have reached a feasible vertex and drop the artificials, then run Phase II on the real objective. If the minimum is positive, the original problem is infeasible.
What's the difference between the simplex method and interior-point methods?
Simplex walks along the edges of the polytope from corner to corner. Interior-point methods cut straight through the interior toward the optimum following the central path. Interior-point methods have proven polynomial worst-case bounds and dominate very large or dense problems; simplex is usually faster on small to medium sparse problems and warm-starts trivially after a small change.
Who invented the simplex algorithm?
George Dantzig devised it in 1947 while working for the U.S. Air Force on logistics planning. Linear programming and the simplex method became foundational tools in operations research, and the algorithm was later named one of the top ten algorithms of the 20th century.