Calculus

Lagrange Multipliers

Optimize a function subject to constraints — gradients must be parallel

Lagrange multipliers find the maximum or minimum of a function subject to one or more constraint equations. The key insight — at a constrained extremum, the gradient of the objective is parallel to the gradient of the constraint. Solving ∇f = λ∇g yields the constrained critical points. Used in economics, physics, machine learning, and any optimization where you can't move freely in all directions.

SetupMaximize f(x) subject to g(x) = 0
LagrangianL(x, λ) = f(x) − λ · g(x)
Optimum condition∇f = λ · ∇g; g = 0
λ is calledLagrange multiplier — has economic interpretation as marginal value
Multiple constraints∇f = ∑ λᵢ ∇gᵢ
YearJoseph-Louis Lagrange, 1788

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The setup and the formula

To find the maximum or minimum of f(x, y, ...) subject to a constraint g(x, y, ...) = 0:

Set up the Lagrangian — L(x, y, λ) = f(x, y) − λ · g(x, y).
Find critical points of L by setting all partial derivatives to zero — ∂L/∂x = 0, ∂L/∂y = 0, ∂L/∂λ = 0.
The first two give ∂f/∂x = λ · ∂g/∂x, ∂f/∂y = λ · ∂g/∂y — combined as ∇f = λ · ∇g.
The third gives g(x, y) = 0 — the constraint itself.
Solve this system for x, y, λ.

The "Lagrangian" is a clever bookkeeping trick — it bundles the objective and constraint into one function whose unconstrained critical points correspond to constrained critical points of the original.

Geometric intuition

At a constrained extremum:

You can't increase f by moving along the constraint surface (otherwise you'd keep moving).
So the component of ∇f along the constraint surface is zero.
Equivalently, ∇f is perpendicular to the constraint surface.
The gradient ∇g is also perpendicular to the surface (by definition — gradients point perpendicular to level surfaces).
Two vectors perpendicular to the same surface must be parallel — ∇f = λ · ∇g.

The multiplier λ is the proportionality constant. Geometrically, parallel gradients are the signature of constrained extrema.

Worked examples

Example 1 — Maximize f(x, y) = xy subject to x + y = 10

Constraint — g(x, y) = x + y − 10 = 0. Gradients:

∇f = (y, x)
∇g = (1, 1)

Setting ∇f = λ∇g — y = λ, x = λ. So x = y. From the constraint x + y = 10 — 2x = 10 — x = y = 5.

Maximum is f(5, 5) = 25. The Lagrange multiplier λ = 5 — meaning if we relaxed the constraint to x + y = 11, the maximum would increase by approximately 5 to 30 (in fact, 5.5² = 30.25, close to the linearized prediction).

Example 2 — Maximum-volume box with fixed surface area

Find the box with surface area 96 that has maximum volume.

Variables — x, y, z (length, width, height). Volume V = xyz. Surface area S = 2(xy + xz + yz).

Maximize V subject to 2(xy + xz + yz) = 96, i.e., g = xy + xz + yz − 48 = 0.

∇V = (yz, xz, xy)
∇g = (y+z, x+z, x+y)

Setting ∇V = λ∇g and using symmetry (the problem is symmetric in x, y, z), the solution is x = y = z. From the constraint — 3x² = 48 — x = 4. The optimal box is a cube of side 4 with volume 64.

This is intuitive — for fixed surface area, the cube has maximum volume. Lagrange multipliers prove it rigorously.

Example 3 — Distance from a point to a curve

Find the closest point on the parabola y = x² to the point (0, 1).

Minimize d² = x² + (y − 1)² subject to y − x² = 0 (i.e., y = x²).

Use Lagrangian — L = x² + (y − 1)² − λ(y − x²).

∂L/∂x = 2x + 2λx = 2x(1 + λ) = 0
∂L/∂y = 2(y − 1) − λ = 0
∂L/∂λ = −(y − x²) = 0

From the first equation — x = 0 or λ = −1. Case x = 0 gives y = 0 from the constraint, and 2(0 − 1) − λ = 0 means λ = −2. Distance² = 1.

Case λ = −1 — from the y equation, 2(y − 1) + 1 = 0 — y = 1/2. From the constraint, x² = 1/2 — x = ±1/√2. Distance² = 1/2 + (1/2 − 1)² = 1/2 + 1/4 = 3/4. So minimum distance = √(3/4) = √3/2.

Two candidates — (0, 0) at distance 1, and (±1/√2, 1/2) at distance √3/2 ≈ 0.866. The latter is closer; the closest points are (±1/√2, 1/2).

Multiple constraints

For optimization subject to multiple constraints g₁ = 0, g₂ = 0, ..., gₘ = 0, generalize to:

∇f = λ₁ ∇g₁ + λ₂ ∇g₂ + ... + λₘ ∇gₘ

One Lagrange multiplier per constraint. The number of equations grows correspondingly. Solve the system simultaneously.

Example — find the closest point on the line of intersection of two planes (x + y + z = 1 and x − y + z = 0) to the origin. Two constraints, three variables, two multipliers — five equations to solve.

Inequality constraints — KKT

For inequality constraints g(x) ≤ 0, the Lagrange technique generalizes to Karush-Kuhn-Tucker (KKT) conditions:

Stationarity — ∇f = ∑ μᵢ ∇gᵢ.
Primal feasibility — gᵢ(x) ≤ 0.
Dual feasibility — μᵢ ≥ 0.
Complementary slackness — μᵢ · gᵢ(x) = 0 (for each i — either the constraint binds or its multiplier is zero).

KKT is the foundation of convex optimization, support vector machines, and constrained nonlinear programming. The "complementary slackness" condition is the key — at an optimum, each inequality is either active (binding, μ > 0) or inactive (μ = 0). The optimization automatically figures out which subset of constraints actually matters.

JavaScript — solving Lagrange numerically

// For 2D problems with one constraint, set up and solve the system
// Maximize f(x, y) subject to g(x, y) = 0

// Example: maximize f(x,y) = xy subject to x + y = 10
// Numerical solver — Newton's method on the system:
//   ∂f/∂x − λ · ∂g/∂x = 0
//   ∂f/∂y − λ · ∂g/∂y = 0
//   g(x, y) = 0

function solveLagrange(initial, fGrad, gGrad, gFunc, maxIter = 100) {
  let [x, y, lambda] = initial;
  const eps = 1e-8;

  for (let iter = 0; iter < maxIter; iter++) {
    const [fx, fy] = fGrad(x, y);
    const [gx, gy] = gGrad(x, y);
    const gv = gFunc(x, y);

    // Residual
    const r1 = fx - lambda * gx;
    const r2 = fy - lambda * gy;
    const r3 = gv;

    if (Math.abs(r1) < eps && Math.abs(r2) < eps && Math.abs(r3) < eps) break;

    // Simple step (proper Newton would use Hessian; this is gradient descent on residuals)
    x -= 0.1 * r1;
    y -= 0.1 * r2;
    lambda -= 0.1 * r3;
  }

  return { x, y, lambda, value: fGrad === null ? null : fGrad(x, y) };
}

// xy maximize subject to x + y = 10
const result = solveLagrange(
  [4, 6, 1],
  (x, y) => [y, x],            // ∇f
  (x, y) => [1, 1],            // ∇g
  (x, y) => x + y - 10,         // g
);
console.log(result);  // { x: 5, y: 5, lambda: 5, value: ... }

Where Lagrange multipliers appear

Economics — utility maximization. Maximize utility subject to a budget constraint. The multiplier IS the marginal utility of money — how much utility one extra dollar buys.
Physics — Lagrangian mechanics. Newton's laws can be reformulated as L = T − V (kinetic minus potential energy). Constraints (rigid rods, fixed lengths) appear as Lagrange multipliers. Goldstein's Classical Mechanics spends chapters on this.
Machine learning — SVM and regularization. Support vector machines maximize margin subject to classification constraints. The dual formulation uses Lagrange multipliers (the αᵢ); the kernel trick happens in the dual.
Engineering — design optimization. Maximize structural strength subject to weight or material limits. Lagrangian-based gradient descent is the workhorse.
Statistics — maximum entropy. The maximum-entropy distribution subject to known moment constraints is found by Lagrangians. Used in machine learning's maximum-entropy classifier and statistical mechanics.
Operations research — linear programming. The simplex method's "shadow prices" are Lagrange multipliers (specifically, the multipliers of the binding constraints at the optimum).

Common mistakes

Forgetting to include the constraint. ∇f = λ ∇g gives multiple equations but no closed system without g(x) = 0. Always include the constraint as the last equation.
Solving the wrong system. The Lagrangian's critical points satisfy ∂L/∂(everything) = 0. Setting only some derivatives to zero misses constraints or solutions.
Confusing maximum and minimum. Lagrange multipliers identify critical points, not their type. Compare values to determine which is the max vs min, or use second-order conditions (the bordered Hessian).
Misinterpreting λ as a number to "solve for." λ is implicit in the system; when there are multiple solutions, the value of λ may differ across them. Sometimes λ has economic meaning; sometimes it's just a bookkeeping device.
Constraint qualification failure. When ∇g = 0 at the optimum, the standard Lagrange formulation breaks down. Check that ∇g ≠ 0 along the constraint surface; if it can be zero, use generalized methods (KKT with constraint qualifications).
Trying to optimize without constraints when constraints exist. If you ignore constraints, you'll find unconstrained extrema that don't satisfy your problem. Always include all relevant constraints.

Frequently asked questions

Why must the gradients be parallel at the optimum?

Geometric intuition — at a constrained max/min, you can't increase f by moving along the constraint surface. So the component of ∇f along the constraint surface is zero — meaning ∇f is perpendicular to the constraint surface. But ∇g is also perpendicular to the constraint surface (by definition of gradient). So ∇f and ∇g are both perpendicular to the same surface — therefore parallel to each other. The constant of proportionality is λ.

What does the multiplier λ mean?

Economic interpretation — λ is the rate at which the optimal value changes if you relax the constraint. If a budget constraint is "spend at most $100" and λ = 2, then loosening the budget by $1 (to $101) increases the maximum profit by $2. Lagrange multipliers ARE marginal values in economics, "shadow prices" in linear programming, and Lagrangian density in physics.

How is this different from unconstrained optimization?

Unconstrained — find x where ∇f(x) = 0. Constrained — find x where ∇f(x) = λ · ∇g(x) AND g(x) = 0. The latter has more equations and more unknowns (extra λ); the former is the special case g = 0 trivially. Most real-world optimizations have constraints (budget, capacity, physical limits), so Lagrange multipliers are the practical tool.

What are the KKT conditions?

Karush-Kuhn-Tucker conditions — generalization of Lagrange to inequality constraints. For minimize f(x) subject to g_i(x) ≤ 0 — at a minimum, ∇f = ∑ μᵢ ∇gᵢ where μᵢ ≥ 0 and μᵢ · gᵢ = 0 (complementary slackness). Either the constraint binds (g_i = 0, μᵢ > 0) or it doesn't (g_i < 0, μᵢ = 0). KKT is the foundation of nonlinear optimization.

How are Lagrange multipliers used in machine learning?

Constrained optimization in support vector machines, max-entropy models, certain neural network regularizations. Also implicit in autoencoders (constraint — bottleneck dimension). The dual problem (Lagrangian's dual form) is sometimes more tractable than the primal — the SVM-style "kernel trick" and convex programming all use this.

Can Lagrange multipliers handle multiple constraints?

Yes — one multiplier per constraint. ∇f = λ₁ ∇g₁ + λ₂ ∇g₂ + ... + λₘ ∇gₘ. The number of equations grows with both the number of variables and the number of constraints. For m constraints, the optimization happens on the intersection of m surfaces in the variable space.

What does it mean if no Lagrange multiplier exists?

The constraint qualification has failed — typically meaning ∇g = 0 at the optimum (constraint is degenerate) or the optimum lies on the boundary of the constraint set in a problematic way. Standard Lagrange multipliers don't apply; you may need KKT with more general qualification conditions, or to reformulate the problem.