Optimization

Karush-Kuhn-Tucker Conditions

Four equations characterise every constrained optimum — necessary always, sufficient under convexity

A constrained optimum satisfies four conditions: stationarity, primal feasibility, non-negative multipliers, and complementary slackness.

ConditionsStationarity + primal + dual feasibility + slackness
Stationarity∇f + Σ λᵢ ∇gᵢ + Σ μⱼ ∇hⱼ = 0
Slacknessλᵢ · gᵢ(x) = 0 for each i
StatusNecessary always; sufficient for convex problems
DiscoveredKarush thesis 1939; Kuhn-Tucker 1951
Foundation ofConvex optimisation, SVM, interior-point methods

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The constrained-optimisation problem

Take a smooth nonlinear program with both inequality and equality constraints:

minimise   f(x)
subject to g_i(x) ≤ 0    for i = 1, …, p
           h_j(x) = 0    for j = 1, …, q
           x ∈ ℝⁿ

Lagrange multipliers tell you how to handle the equality constraints alone — write the Lagrangian and set its gradient to zero. KKT extends that to inequality constraints, where you cannot simply set g_i = 0 because some inequalities will be slack at the optimum and some will bind, and you do not know in advance which.

The four conditions

Introduce a multiplier λᵢ for each inequality constraint and a multiplier μⱼ for each equality constraint. The Lagrangian is L(x, λ, μ) = f(x) + Σ λᵢ g_i(x) + Σ μⱼ h_j(x). The KKT conditions for a point x* to be a local optimum are:

Stationarity. ∇_x L = 0, i.e. ∇f(x*) + Σ λᵢ ∇g_i(x*) + Σ μⱼ ∇h_j(x*) = 0. The objective's gradient is balanced by the constraint gradients weighted by the multipliers.
Primal feasibility. g_i(x*) ≤ 0 for every i and h_j(x*) = 0 for every j. The point is actually in the feasible region.
Dual feasibility. λᵢ ≥ 0 for every inequality. The multipliers on inequalities are non-negative (their sign is fixed by which side of the inequality is feasible). Equality multipliers μⱼ are free.
Complementary slackness. λᵢ · g_i(x*) = 0 for every i. Each term is the product of a non-negative multiplier and a non-positive constraint value; the product being zero means at least one factor vanishes.

These four together form a system of n + p + q + p equations in the n + p + q unknowns (x, λ, μ). The slackness condition is non-smooth — it partitions the inequalities into active (g = 0) and inactive (λ = 0) sets — so the system is harder to solve than a smooth one, but the four conditions are the cleanest possible characterisation of optimality.

Geometric picture

At a constrained optimum x*, let I(x*) ⊂ {1, …, p} be the set of active inequalities — those with g_i(x*) = 0. Complementary slackness says λᵢ can be non-zero only for i ∈ I(x*). Stationarity, rewritten using only the active set, becomes:

−∇f(x*) = Σ_{i ∈ I(x*)} λᵢ ∇g_i(x*) + Σ_{j} μⱼ ∇h_j(x*),    λᵢ ≥ 0

The negative gradient of the objective is a non-negative combination of the active constraint gradients plus an arbitrary combination of the equality gradients. Geometrically: the direction of steepest descent (−∇f) points into the cone spanned by the outward constraint normals. Any move you could make from x* that did not violate constraints would not decrease the objective — which is the very definition of a local minimum.

For a problem with one equality constraint and no inequalities, the cone collapses to a line and KKT becomes ∇f = −μ ∇h — pure Lagrange. Adding inequalities widens the cone into a polyhedral set; the sign restriction λᵢ ≥ 0 keeps the cone pointed outward.

Worked example — quadratic on a half-plane

Minimise f(x, y) = (x − 2)² + (y − 1)² subject to x + y ≤ 1 and x ≥ 0, y ≥ 0. Geometrically: find the point in the triangle bounded by the axes and the line x + y = 1 closest to (2, 1).

Write inequalities as g₁ = x + y − 1 ≤ 0, g₂ = −x ≤ 0, g₃ = −y ≤ 0. No equality constraints. The Lagrangian is

L = (x−2)² + (y−1)² + λ₁(x+y−1) + λ₂(−x) + λ₃(−y).

Stationarity gives

∂L/∂x = 2(x − 2) + λ₁ − λ₂ = 0
∂L/∂y = 2(y − 1) + λ₁ − λ₃ = 0

Guess that the first constraint x + y ≤ 1 is the only active one (the unconstrained minimum (2, 1) lies above x + y = 1, but is to the right of and above the axes, so axes are slack). Then λ₂ = λ₃ = 0 by complementary slackness, and

2(x − 2) + λ₁ = 0   →   x = 2 − λ₁/2
2(y − 1) + λ₁ = 0   →   y = 1 − λ₁/2
Active constraint:    x + y = 1
(2 − λ₁/2) + (1 − λ₁/2) = 1   →   λ₁ = 2.

So x = 1, y = 0. Now check the assumed slack constraints: x = 1 ≥ 0 (satisfied, with λ₂ = 0 OK because g₂ = −1 < 0 has slack), y = 0 (the axis is binding — but we assumed it was slack!). The assumption was wrong; the candidate sits exactly on the y = 0 axis.

Re-solve with both x + y = 1 and y = 0 active. Then x = 1, y = 0, and stationarity gives the multipliers

2(1 − 2) + λ₁ − 0 = 0   →   λ₁ = 2
2(0 − 1) + λ₁ − λ₃ = 0  →   λ₃ = 0.

λ₂ = 0 because g₂ = −1 < 0 still has slack. So the KKT point is x* = (1, 0) with λ₁ = 2, λ₂ = 0, λ₃ = 0 — all multipliers non-negative, all four conditions verified. Distance² from (2, 1) is (1−2)² + (0−1)² = 2, so distance √2.

Notice the algorithmic flavour: KKT is solved by guessing which constraints are active, solving the resulting smooth system, checking the guess, and re-guessing if it fails. Active-set methods for QP are systematic versions of exactly this procedure.

Convexity makes KKT sufficient

For a general nonlinear program KKT is only necessary — every local optimum at which a constraint qualification holds satisfies KKT, but KKT points can be saddle points or local maxima. For convex programs the picture flips: KKT is both necessary and sufficient. Any point satisfying the four conditions is a global minimum.

A program is convex when the objective f is convex, every inequality g_i is convex (so the feasible region cut by gᵢ ≤ 0 is convex), and every equality h_j is affine (linear plus constant). Linear programs and convex quadratic programs are obvious examples; semidefinite programs and second-order cone programs are larger ones. In all of these, KKT plus Slater's condition equals a global certificate of optimality — solve the KKT system, you have solved the problem.

Constraint qualifications

KKT applies at a local optimum x* only if some constraint qualification holds. The point of a qualification is to ensure the active constraint gradients are well-enough behaved that Lagrange-multiplier theorems apply.

LICQ — Linear Independence Constraint Qualification. The gradients of the active inequality constraints and all equality constraints at x* are linearly independent. The strongest and easiest to check; under LICQ the KKT multipliers are unique.
MFCQ — Mangasarian-Fromovitz CQ. Slightly weaker — the equality gradients are linearly independent, and there exists a direction along which all active inequality gradients have negative inner product. KKT still holds; multipliers form a bounded set but may not be unique.
Slater's condition. Convex programs only — there exists a strictly feasible interior point (gᵢ(x) < 0 for all i, equalities satisfied). Implies strong duality and KKT sufficiency in one stroke.
Linearity CQ. If all constraints are affine (linear + constant), KKT holds automatically without any further qualification. This is why LP duality works without explicit Slater's.

Without any constraint qualification you can construct optima at which no KKT multipliers exist — typically the active constraint gradients are linearly dependent in a degenerate way. Real-world problems almost always satisfy LICQ; constraint-qualification failure is mostly a theoretical concern.

KKT and the dual

Concept	Pure Lagrange (equality only)	KKT (inequality + equality)	LP duality
Stationarity	∇f + Σ μⱼ ∇hⱼ = 0	∇f + Σ λᵢ ∇gᵢ + Σ μⱼ ∇hⱼ = 0	Aᵀy − c ≥ 0 (read as ∇L = 0)
Multiplier sign	μⱼ free	λᵢ ≥ 0, μⱼ free	y ≥ 0
Slackness	None (all constraints active)	λᵢ · gᵢ = 0	yᵢ · (bᵢ − Aᵢx) = 0
Active set	All	Subset, found automatically	Binding constraints
Sufficient when	Convex problem	Convex problem	Always (LP is convex)
Solved by	Substitution	Active-set, interior point	Simplex, interior point
Reduces to	—	Pure Lagrange (no inequalities)	KKT (linear case)

KKT, Lagrange, and LP duality are three views of the same idea at increasing generality. The simplex method's optimality conditions are a discrete special case of KKT; interior-point methods are gradient flows on the KKT system.

Where KKT shows up

Support vector machines. Soft-margin SVM is a quadratic program with linear inequality constraints. The dual formulation drops out of KKT — αᵢ ≥ 0 (dual feasibility), αᵢ · (1 − yᵢ(w·xᵢ + b) − ξᵢ) = 0 (slackness). Support vectors are the training points with αᵢ > 0; everything else has αᵢ = 0 by slackness.
Portfolio optimisation. Mean-variance portfolios with no-short-sale constraints (xᵢ ≥ 0) and exposure caps are quadratic programs. KKT tells you which constraints bind at the optimum — and the shadow prices on those constraints are the cost of the regulatory restrictions.
Power flow. AC and DC optimal power flow in electricity networks are constrained nonlinear programs. KKT multipliers on the line-capacity constraints are the locational marginal prices that ISOs use to compensate generators and charge consumers.
Interior-point methods. Every primal-dual interior-point algorithm solves a perturbed version of the KKT system, with the slackness condition λᵢ · gᵢ = 0 replaced by λᵢ · gᵢ = −μ (a small positive μ). As μ → 0 the iterate converges to the exact KKT point.
Reinforcement learning with constraints. Constrained Markov decision processes (safety constraints on the policy) are solved by Lagrangian methods that use KKT to balance reward and constraint violation. The constraint multipliers update like dual variables.
Robotics — inverse kinematics and contact dynamics. Joint limits, contact non-penetration and friction cones are all inequality constraints. The KKT multipliers are the joint torques and contact forces — they have direct physical meaning.
Sparsity-inducing regularisation (LASSO, group LASSO). The L¹ penalty on regression coefficients becomes a non-smooth constraint that yields KKT conditions with subgradients. Solving the KKT system identifies which coefficients are exactly zero — the sparsity pattern emerges from complementary slackness.
Economic equilibrium. Walrasian general equilibrium in Arrow-Debreu economies is characterised by the KKT conditions of each agent's utility-maximisation problem plus market-clearing constraints. The shadow prices on the market-clearing equations are the equilibrium prices.

Common mistakes

Forgetting dual feasibility. Setting up the Lagrangian and stationarity correctly, then accepting any solution without checking λᵢ ≥ 0. Negative multipliers indicate the wrong active set — you have hit a saddle, not a minimum.
Treating all constraints as active. Pure Lagrange-style substitution where every g_i = 0 is enforced gives an over-constrained system that usually has no solution. The whole point of KKT is that you do not know the active set in advance — it is solved for.
Skipping the constraint qualification check. If the active constraint gradients are linearly dependent (LICQ fails) the multipliers can be ill-defined; in pathological cases KKT does not apply at all. Most failures of KKT in software come from degenerate constraint geometry.
Confusing necessary and sufficient. KKT is necessary for any regular optimum but sufficient only under convexity. A KKT point of a non-convex program may be a local max or saddle — second-order conditions or global search are needed to certify.
Sign errors when flipping inequality direction. Writing g(x) ≥ 0 instead of g(x) ≤ 0 flips the sign of the multiplier. Decide once on a convention (textbooks almost universally use ≤ 0 with λ ≥ 0) and convert all inequalities into it before writing the Lagrangian.
Reading the KKT multiplier of a redundant constraint. If a constraint is redundant (implied by the others), KKT may still assign it a non-zero multiplier in the absence of LICQ — but the value has no economic meaning. Detect redundancy by checking the rank of active constraint gradients before interpreting.
Using equality KKT for inequalities and vice versa. Equality-constraint multipliers μⱼ are unrestricted in sign; inequality multipliers λᵢ are non-negative. Mixing the two leads to subtle bugs in solver implementations where the multiplier update step uses the wrong sign rule.

The Lagrangian function as a saddle point

The Lagrangian L(x, λ, μ) = f(x) + Σ λᵢ gᵢ(x) + Σ μⱼ hⱼ(x) is the central object of KKT. At a primal-dual optimum (x*, λ*, μ*) under convexity, the Lagrangian has a saddle point: L(x*, λ, μ) ≤ L(x*, λ*, μ*) ≤ L(x, λ*, μ*) for any feasible x, λ ≥ 0, μ. The primal player minimises over x; the dual player maximises over (λ, μ); the saddle point is the equilibrium.

This saddle-point characterisation lets you solve KKT via gradient ascent-descent (sometimes called primal-dual gradient methods). It also connects KKT directly to game theory: the Lagrangian saddle is a Nash equilibrium of the primal-vs-dual zero-sum game. Many machine learning algorithms — generative adversarial networks, robust learning with adversarial perturbations, fictitious play in self-play reinforcement learning — implement primal-dual ascent-descent on Lagrangian-flavoured objectives.

Active-set methods — KKT, solved iteratively

Active-set methods solve KKT by guessing which subset of inequality constraints is active at the optimum, treating those as equalities, and dropping the rest. The resulting equality-constrained problem is solved by pure Lagrange (a linear system in the KKT residual). The guessed active set is then checked: if any guessed-active constraint has a negative multiplier, drop it; if any guessed-inactive constraint is violated, add it; repeat.

Concrete algorithms: for QPs, Goldfarb-Idnani and Wright-Nocedal active-set methods are standard. For LPs, the simplex method is itself an active-set algorithm — each basis is a guessed active set of n constraints. For general nonlinear KKT, sequential quadratic programming (SQP) linearises the constraints around the current iterate, solves the linearised KKT, and iterates. SQP is what powers Ipopt, Knitro, fmincon (MATLAB) and SciPy's minimize when constraints are present.

Second-order conditions

KKT is first-order — it uses only gradients. For a non-convex problem, KKT identifies stationary points of the Lagrangian but does not distinguish minima from maxima or saddles. The second-order necessary condition at a KKT point x* asks that the Hessian of the Lagrangian, restricted to the tangent cone of the active constraints, is positive semidefinite:

dᵀ ∇²L(x*) d ≥ 0    for all d in the tangent cone.

The sufficient condition strengthens semidefinite to strictly positive: dᵀ ∇²L(x*) d > 0 for every non-zero d in the cone. Under this second-order sufficient condition together with strict complementary slackness (λᵢ > 0 on every active constraint), x* is a strict local minimum. Solvers like Ipopt and Knitro verify these second-order conditions to certify their answers, distinguishing local minima from suspect KKT points.

KKT inside interior-point methods

Every primal-dual interior-point method (Mehrotra's predictor-corrector, the algorithms in MOSEK, Gurobi's barrier, Ipopt) solves a perturbed KKT system. The slackness condition λᵢ · gᵢ = 0 is replaced by λᵢ · gᵢ = −μ for a small positive μ, which moves the iterate into the interior of the feasible region. As μ → 0 the iterate traces a smooth curve — the central path — that terminates at the exact KKT point.

Each interior-point iteration linearises the perturbed KKT system around the current iterate and solves the resulting Newton step, then takes a damped step that stays interior. Polynomial-time complexity comes from the fact that the central path is well-conditioned even when the constraint set is degenerate, sidestepping the active-set guessing that plagues simplex on highly degenerate LPs. The trade-off is that interior-point methods are poor at warm starting — adding a constraint pulls the iterate off the central path and recovery takes most of the original iteration count.

Strong duality and the zero duality gap

Whenever KKT holds at a primal-feasible x* and dual-feasible (λ*, μ*), the Lagrangian dual function g(λ, μ) = inf_x L(x, λ, μ) achieves its maximum at (λ*, μ*) and the primal objective at x* equals that maximum. The duality gap — the difference between primal and dual optima — is zero. KKT is therefore a certificate of zero duality gap; the multipliers (λ*, μ*) are the dual variables.

For convex problems with Slater's condition, the duality gap is zero automatically. For general nonconvex problems, KKT at a local minimum does not certify the global gap; the local Lagrangian dual can be strictly less than the local primal. Detecting and closing the duality gap is the central task of branch-and-bound for global optimisation — the LP/SDP relaxation gives a dual lower bound, and branching seeks to close the gap one subproblem at a time.

Costed claims

KKT: Lagrangian gradient = 0 + λ·g(x) = 0 (slackness). Strong duality holds under Slater's condition (strictly feasible interior). For convex problems, KKT is necessary and sufficient. For non-convex problems, KKT identifies candidate local optima but not global ones; the second-order condition dᵀ∇²L(x*)d ≥ 0 on the tangent cone is the missing classifier. The convex hull of N points in 2D is O(N log N); Jensen's inequality (log E[X] ≥ E[log X], the concave version) and Cauchy-Schwarz (equality iff u, v collinear) are the inequality companions.

Frequently asked questions

What are the four KKT conditions in one line each?

Stationarity: ∇f(x*) + Σ λᵢ ∇gᵢ(x*) + Σ μⱼ ∇hⱼ(x*) = 0. Primal feasibility: gᵢ(x*) ≤ 0 and hⱼ(x*) = 0. Dual feasibility: λᵢ ≥ 0 on every inequality constraint. Complementary slackness: λᵢ · gᵢ(x*) = 0 for every i — multiplier and constraint cannot both be active. The four together are necessary at any regular local optimum and are sufficient when the problem is convex.

How is KKT different from Lagrange multipliers?

Lagrange multipliers handle equality constraints only. KKT generalises the same idea to inequality constraints, which forces the new structure: dual feasibility (the inequality multipliers must be non-negative) and complementary slackness (a multiplier can be non-zero only when its inequality is tight). For pure equality-constrained problems KKT collapses back to Lagrange — the inequality conditions are trivially satisfied with empty index sets.

Why must the inequality multipliers be non-negative?

Geometric reasoning. Writing the inequality as g(x) ≤ 0, the feasible side is g ≤ 0 and the gradient ∇g points outward into infeasibility. To prevent improving the objective by stepping outward, the projection of ∇f on the feasible side must be non-positive — and that translates algebraically into a non-negative multiplier. If you flip the sign of the inequality to g ≥ 0, the multiplier sign flips too.

What does complementary slackness mean intuitively?

Each inequality either binds at the optimum (g = 0, the constraint is active) or has slack (g < 0, the constraint is not limiting anything). Complementary slackness says: the multiplier is non-zero only on active constraints. Slack constraints have zero shadow price — relaxing them by a tiny amount changes nothing. So KKT automatically figures out which subset of inequalities matters at the optimum, without you specifying it.

When is KKT sufficient, not just necessary?

When the problem is convex — convex objective, convex inequality constraints (gᵢ convex with ≤), affine equality constraints. For convex problems KKT is necessary and sufficient for global optimality. For non-convex problems KKT remains necessary at any regular optimum but only locates candidate local optima, not certified global ones. You need second-order conditions or global search to distinguish.

What is a constraint qualification?

A regularity condition that guarantees the KKT conditions actually apply at the optimum. The simplest is LICQ (Linear Independence Constraint Qualification): the gradients of the active constraints are linearly independent. Slater's condition is another: there exists a strictly feasible interior point (gᵢ(x) < 0). Without some constraint qualification, you can construct examples where the optimum exists but no KKT multipliers exist — a pathological corner of the feasible region.

Who actually discovered KKT?

William Karush proved the conditions in his 1939 University of Chicago master's thesis under Lawrence Graves, and they were lost in the gray literature. Harold Kuhn and Albert Tucker independently rediscovered them and presented them at the 1950 Berkeley Symposium, publishing in 1951 — that paper made the conditions famous. Karush's priority was recognised much later; the modern name 'KKT' (instead of 'Kuhn-Tucker') became standard only in the 1980s.