Optimization
Linear Programming Duality
Every primal LP has a dual LP — and at the optimum the two objectives coincide
Every LP has a twin. The dual's variables are shadow prices, its feasible values bound the primal, and at the optimum the two objectives meet exactly.
- Pairmax c·x ⟺ min b·y
- Weak dualityc·x ≤ b·y for all feasible x, y
- Strong dualityc·x* = b·y* at optimum
- Slater's conditionStrictly feasible interior ⇒ strong duality
- Dual variablesShadow prices = marginal value
- Discoveredvon Neumann sketch 1947, Dantzig–Tucker 1948–51
Watch the 60-second explainer
A condensed visual walkthrough — narrated, captioned, under a minute.
The primal and the dual side by side
Start with a primal LP in standard form: choose an n-vector x to maximise a linear objective subject to m linear inequality constraints and non-negativity. Then the dual LP is constructed mechanically — one dual variable per primal constraint, the matrix A transposed, c and b swapped, max replaced by min, and ≤ replaced by ≥.
PRIMAL DUAL
maximise c · x minimise b · y
subject to A x ≤ b subject to Aᵀ y ≥ c
x ≥ 0 y ≥ 0
If the primal has n decision variables and m constraints, the dual has m decision variables and n constraints. The structural roles flip but the data c, A, b are the same. Dualising the dual gives back the primal — the relationship is involution-symmetric.
Weak duality — every dual value bounds every primal value
The simplest theorem first. Take any primal-feasible x and any dual-feasible y. Then:
c · x ≤ (Aᵀ y) · x (because Aᵀy ≥ c and x ≥ 0)
= yᵀ A x
≤ yᵀ b (because Ax ≤ b and y ≥ 0)
= b · y
So c·x ≤ b·y for every compatible (x, y) — primal objective values can never exceed dual objective values. Weak duality is a free bound that costs no work to produce: pick any dual feasible y and you have an upper bound on the primal optimum.
This sounds modest but it powers most of approximation algorithms. To prove a heuristic produces solutions within 5% of optimal, you do not need to know the optimum — you exhibit a dual feasible y and compare your primal value to b·y.
Strong duality — and the gap closes
The real surprise is that weak duality is tight. If the primal has a finite optimum x*, then there exists a dual feasible y* with c·x* = b·y*. The bound is exact at the optimum — there is no duality gap.
For ordinary LPs (linear objective, linear constraints, real variables) strong duality holds whenever both problems are feasible. For convex generalisations — quadratic, conic, general convex — strong duality requires a constraint qualification. Slater's condition is the cleanest: if the primal has a strictly feasible interior point (Ax < b somewhere with x > 0), strong duality holds. Without Slater's, both problems can be feasible but separated by a positive gap.
The proof routes vary. Dantzig's original argument runs the simplex method to optimality and reads the dual variables off the final basis. The modern proof uses the separating hyperplane theorem on a convex cone in (objective, constraint) space — that proof generalises to all convex optimisation and underlies Fenchel duality.
Complementary slackness — pay only for binding constraints
At any primal-dual optimum (x*, y*) two algebraic identities hold:
- For each i = 1, …, m: y*ᵢ · (bᵢ − A_i · x*) = 0.
- For each j = 1, …, n: x*ⱼ · ((Aᵀ y*)ⱼ − cⱼ) = 0.
Each product has both factors non-negative, so at least one factor must vanish. Read the first identity as: for every constraint, either the constraint binds (slack b − Ax = 0) or its dual variable is zero. Read the second identity as: for every variable, either the variable is positive or its dual constraint binds.
The economic reading is intuitive. The shadow price y*ᵢ is the marginal value of relaxing constraint i. If constraint i is not binding — there is slack, you already have more capacity than the optimum uses — its marginal value is zero. Only the constraints that actually limit the optimum get a positive shadow price.
Worked example — a tiny production LP
Take the workshop LP. Make x units of product A ($3 profit each) and y units of product B ($2 each). Labour: 1 hour per unit of A or B, 4 hours available. Material: 2 kg per A, 1 kg per B, 6 kg available. Variables non-negative.
PRIMAL DUAL
maximise 3 x + 2 y minimise 4 y₁ + 6 y₂
subject to x + y ≤ 4 subject to y₁ + 2 y₂ ≥ 3
2 x + y ≤ 6 y₁ + y₂ ≥ 2
x, y ≥ 0 y₁, y₂ ≥ 0
The dual variable y₁ is the shadow price on the labour constraint; y₂ is the shadow price on the material constraint. The first dual constraint says: a unit of A consumes one hour of labour and two kg of material, and must yield at least its $3 profit in shadow value — y₁ + 2 y₂ ≥ 3. Likewise for product B.
Solving the primal (a tiny simplex run) gives x* = 2, y* = 2, primal optimum $10. Both labour and material are fully consumed. Solving the dual gives y*₁ = 1, y*₂ = 1, dual optimum 4·1 + 6·1 = $10 — strong duality verified.
Interpretation: one extra hour of labour buys $1 of profit at the margin; one extra kg of material also buys $1. Both constraints bind, both shadow prices positive. Complementary slackness: the primal constraints are tight (slack = 0), so the y values can be positive. Both primal variables are positive, so both dual constraints are tight: 1 + 2·1 = 3 = c_A, 1 + 1 = 2 = c_B.
Primal vs dual — a structural comparison
| Primal LP | Dual LP | |
|---|---|---|
| Goal | maximise profit / output | minimise cost of resources |
| Variables | activity levels (how much to do) | shadow prices (value of each resource) |
| Number of variables | n | m |
| Number of constraints | m | n |
| Coefficients on objective | profits c | capacities b |
| Right-hand side | capacities b | profits c |
| Constraint direction | ≤ | ≥ |
| Feasible region | polytope of activity levels | polytope of prices |
| Optimum direction | upper envelope of c·x | lower envelope of b·y |
The two LPs sit on different polytopes — one in ℝⁿ, one in ℝᵐ — but the optimal values match. Geometrically, weak duality is the statement that the primal polytope is below the dual polytope's b·y level, and strong duality is the statement that their boundaries touch at a single value.
How to construct the dual mechanically
There is a recipe that always works, even for primals with mixed constraint types and free variables.
- Write the primal as max c·x subject to Ax ≤ b, x ≥ 0 (canonical form). If you have ≥ constraints, multiply by −1. If you have = constraints, split into ≥ and ≤ or introduce a free dual variable.
- Introduce one dual variable y_i ≥ 0 per primal ≤ constraint. Free dual variables correspond to primal equality constraints. Non-positive dual variables correspond to primal ≥ constraints if you don't flip signs first.
- Each primal variable x_j ≥ 0 produces a dual ≥ constraint (the j-th row of Aᵀy ≥ c). A free primal variable produces a dual equality constraint.
- Swap c and b. The primal objective vector c moves to the right-hand side of the dual; the primal right-hand side b moves to the dual objective.
- Flip max to min.
Once you have done this on a half-dozen LPs the recipe is automatic. Most textbooks include a small symmetry table — primal max with x ≥ 0 maps to dual min with y ≥ 0; primal max with x free maps to dual min with y satisfying equality.
Where the theorem came from
John von Neumann sketched LP duality to Dantzig at the Institute for Advanced Study in October 1947, only weeks after Dantzig had first described the simplex method to him. Von Neumann saw immediately that Dantzig's algorithm was related to his own 1928 minimax theorem for two-player zero-sum games — and that the LP duality theorem and the minimax theorem are essentially the same statement reframed.
Dantzig and Albert Tucker formalised the relationship between 1948 and 1951; Tucker's name attaches to a generation of duality theorems. The first textbook treatment is Gale, Kuhn and Tucker's 1951 paper Linear Programming and the Theory of Games. The modern proof via the separating hyperplane theorem on convex cones is due to Werner Fenchel (1949) and R. Tyrrell Rockafellar (1970), and that proof extends to general convex optimisation — Fenchel duality, conic duality, Lagrangian duality are all descendants.
Where duality earns its keep
- Sensitivity analysis. Managers want to know how much profit moves if the labour budget changes by ten hours. The dual variable answers that question directly: marginal profit per hour of labour = y*₁ at the optimum, valid for small perturbations that don't change the optimal basis.
- The simplex stopping rule. Modern revised-simplex implementations check optimality by maintaining a dual solution and verifying it is feasible. When dual feasibility is achieved, primal optimality is certified by strong duality — no further pivots needed.
- Approximation algorithms. LP relaxations of NP-hard integer programs are commonly rounded. The integrality gap of the relaxation is bounded above by exhibiting a dual solution — the dual gives a free upper bound for free certification.
- Network flow algorithms. The max-flow / min-cut theorem is LP duality on a specially structured LP. Hoffman, Edmonds, Karp and others built combinatorial algorithms whose correctness proofs rest on the LP dual of a flow problem.
- Support vector machines. The SVM training problem is a quadratic program with linear constraints. Its dual is the form usually solved in practice (sequential minimal optimisation, kernel SVM) because the dual variables — Lagrange multipliers — sparsify on the support vectors only.
- Mechanism design and auction theory. Vickrey-Clarke-Groves auctions, combinatorial auctions and matching markets express their efficiency proofs through LP duals — the shadow prices are the prices buyers pay.
- Column generation. Huge LPs (airline crew scheduling, vehicle routing, cutting-stock) start with a small subset of variables and use the LP dual to identify which new columns would improve the objective. Generate, solve, repeat — billion-column LPs solve in minutes because duality tells you exactly which variables matter.
- Robust optimisation. When LP coefficients are uncertain (varying within polyhedral or ellipsoidal sets), the robust counterpart problem is constructed by dualising the inner worst-case maximisation. The dual variables become decision variables of the robust LP; solving the dual IS solving the worst case.
Common mistakes
- Dualising before standardising. Skipping the conversion to a canonical max-with-≤ primal produces sign errors in the dual. Always rewrite the primal as max c·x s.t. Ax ≤ b, x ≥ 0 (or its standard min equivalent) before applying the recipe.
- Ignoring constraint qualifications in nonlinear duals. Strong duality is automatic for LPs but not for general convex programs. Without Slater's condition, the dual optimum can be strictly less than the primal — a positive duality gap that breaks any algorithm relying on closing it.
- Reading shadow prices outside their range. The marginal interpretation y*ᵢ is only locally valid: it predicts the change in optimum for small perturbations of bᵢ that keep the same optimal basis. Large perturbations cross into different bases with different shadow prices.
- Forgetting that infeasibility certifies the other side. If the primal is unbounded above, the dual is infeasible; if the primal is infeasible, the dual is either infeasible or unbounded below. The two LPs are not always both nicely solvable — the four feasibility-by-boundedness cases form a 2×2 table.
- Treating the dual variables of an integer program as the IP shadow prices. Solving an LP relaxation gives LP duals, not IP duals; IP shadow prices are well-defined only between branchings and may jump discontinuously. Use the LP duals as a heuristic, not a guarantee.
- Forgetting degenerate primal duals are not unique. On degenerate LPs (more than n + m active constraints meeting at the optimal vertex), the dual solution is not uniquely determined. Different simplex bases yield different dual vectors all of which are valid shadow prices. Sensitivity analysis must consider the full range, not a single dual value.
- Skipping sign conventions when implementing. Different textbooks adopt different sign conventions (max vs min in primal, ≤ vs ≥ on the dual). Mixing conventions inside the same codebase is the single most common bug in handwritten LP solvers. Pick one (Bertsimas-Tsitsiklis is the safest reference) and stick to it.
Farkas' lemma — the algebraic engine
Underneath LP duality sits Farkas' lemma (Gyula Farkas, 1902), a clean theorem of the alternative. For any matrix A and vector b, exactly one of these two systems has a solution:
- System I: A x = b, x ≥ 0.
- System II: Aᵀ y ≤ 0, bᵀ y > 0.
System I being infeasible is the same as saying b is not in the conic combination of columns of A. System II being feasible exhibits a separating hyperplane (with normal y) between b and the cone. Either b is in the cone (System I) or it is not (System II). Farkas' lemma is the separating hyperplane theorem in linear-algebra form.
LP duality is Farkas with bookkeeping. Strong duality says max c·x = min b·y, which decomposes into 'primal-feasible-and-optimal x exists ⟺ dual-feasible-and-optimal y exists with c·x = b·y'. The infeasibility certificates that fall out (an unbounded ray for the primal, or an unbounded direction for the dual) come directly from Farkas-style alternatives.
Duality and the minimax theorem
Von Neumann's 1928 minimax theorem for two-player zero-sum games says that for any matrix payoff M, max over mixed strategies x of min over mixed strategies y of xᵀMy equals min over y of max over x of xᵀMy. The two values coincide; the game has a value. The proof is LP duality applied to the LP whose objective is the row player's expected payoff and whose constraints encode mixed-strategy feasibility. Von Neumann recognised the connection immediately when Dantzig described the simplex method to him in 1947 — LP duality and minimax are the same theorem, framed once for optimisation and once for games.
The connection goes both ways. Modern algorithmic game theory uses LP duality to prove the existence of mixed Nash equilibria in zero-sum games and to compute them via the simplex method on the equivalent LP. Conversely, problems originally stated as games (security games, network defence, online learning's adversarial bandits) frequently dualise back into LPs that mainstream solvers can handle.
From LP duality to convex duality
LP duality is the linear case of a much wider duality theory. Convex problems with smooth constraints have Lagrangian duals; conic programs (semidefinite, second-order cone) have conic duals; Fenchel duality pairs convex functions with their convex conjugates. In each case, the same weak/strong/slackness structure holds — with the same caveat that strong duality may need a constraint qualification.
The unifying view is the Lagrangian function L(x, y) = f(x) + yᵀ(Ax − b) (for the LP case). The primal optimum is min over x of (max over y ≥ 0 of L), the dual optimum is max over y ≥ 0 of (min over x of L). Strong duality is the statement that min-max equals max-min — exactly the saddle-point condition that ties optimisation back to game theory and to the KKT system.
Modern solvers and the dual in practice
Every commercial and open-source LP solver exposes dual variables alongside primal solutions. Gurobi, CPLEX, Mosek, HiGHS, the GLPK and Clp open-source pair, and SciPy's linprog all return the dual y vector (often called π or pi) on solver exit. Accessing y_i tells you immediately the shadow price of constraint i — no extra computation, the value falls out of the final simplex basis or interior-point central path.
Modern solvers use the dual variables for warm-starting after problem edits. Adding a new constraint changes the primal feasible region but preserves dual feasibility; the dual simplex method picks up from the old dual solution and re-establishes primal feasibility in a few pivots. Conversely, adding a new variable preserves primal feasibility but changes dual feasibility — the primal simplex resumes from the old basis. Branch-and-bound for integer programs exploits both kinds of warm start systematically and is the reason simplex-based MIP solvers handle million-variable instances in commercial finance and supply chain.
Costed claims
Strong duality holds under Slater's condition (strictly feasible interior). Convex hull of N points in 2D takes O(N log N) via Graham scan and likewise in 3D. KKT generalises LP duality to nonlinear convex problems: Lagrangian gradient = 0 with λ·g(x) = 0 (complementary slackness) and λ ≥ 0. Jensen's inequality (log E[X] ≥ E[log X] for the concave case) and Cauchy–Schwarz (equality iff u, v collinear) are the inequality cousins that make convex duality work in expectation and inner-product spaces respectively. The dual of a min-cost-flow LP is the shortest-path-like potential LP; the dual of a max-flow LP is the min-cut LP — combinatorial dualities that are special cases of the general theorem.
Frequently asked questions
What exactly is the dual of a linear program?
Given the standard primal — maximise c·x subject to Ax ≤ b, x ≥ 0 — the dual is minimise b·y subject to Aᵀy ≥ c, y ≥ 0. One dual variable per primal inequality constraint and one dual constraint per primal variable. The roles of c and b swap; the matrix A transposes; max becomes min and ≤ becomes ≥. The dual of the dual is the primal again, so the relationship is genuinely symmetric.
Why is weak duality obvious?
Take any primal-feasible x and any dual-feasible y. Then c·x ≤ (Aᵀy)·x = yᵀAx ≤ yᵀb = b·y. The first inequality uses dual feasibility Aᵀy ≥ c (and x ≥ 0); the second uses primal feasibility Ax ≤ b (and y ≥ 0). So every dual feasible value is an upper bound for every primal feasible value, before you even solve anything.
When does strong duality fail?
For ordinary LPs over ℝ, strong duality holds whenever both primal and dual are feasible — and it can fail only in pathological infeasibility / unbounded cases. For nonlinear convex problems, strong duality typically needs a constraint qualification — most commonly Slater's condition, which asks for a strictly feasible interior point of the primal. Without Slater's, you can have a positive duality gap even when both problems are feasible.
What are shadow prices, in plain English?
Imagine a factory whose labour constraint is 100 hours per week, and the optimal LP says profit is $5000. If the dual variable on the labour constraint is y* = 30, then one extra hour of labour buys $30 of additional profit at the margin — provided the rest of the problem doesn't change. The dual variable IS the marginal value of the constraint. Managers use shadow prices to decide which constraint to spend money loosening.
What is complementary slackness?
At optimum, primal and dual satisfy two slackness conditions. (i) For each primal constraint, either it binds (slack = 0) or its dual variable is zero. (ii) For each primal variable, either it is positive or the corresponding dual constraint binds. Read it as 'pay only for binding constraints': constraints that don't actually limit the optimum get a zero shadow price.
Why do we ever bother solving the dual instead of the primal?
Three reasons. First, the dual is sometimes simply smaller — a primal with thousands of constraints but few variables has a small dual. Second, the dual simplex is the natural warm-starter after adding a constraint in branch-and-bound, because the new constraint preserves dual feasibility. Third, duals give certificates: any dual feasible y gives an instant upper bound on the primal optimum, which is invaluable inside heuristics and approximation algorithms.
Who proved strong duality, and when?
John von Neumann sketched the idea to Dantzig in October 1947, only weeks after Dantzig described the simplex method to him at the Institute for Advanced Study. Dantzig and Albert Tucker formalised LP duality through 1948–1951. The proof via the simplex method's optimality conditions is in Dantzig's 1963 book; the more elegant separating-hyperplane proof generalises to all of convex duality and is due to Fenchel and Rockafellar in the 1960s.