Control Systems

LQR Control

The unique state-feedback law that minimizes integrated quadratic state error and control effort — with stability for free

The Linear-Quadratic Regulator is the optimal state-feedback law for any linear system. Pick Q to penalise state error, R to penalise control effort, solve the algebraic Riccati equation for P, and the gain K = R⁻¹B'P drops out — automatically stable, 60° phase margin, infinite gain margin. The same law that flies Boeing autopilots, balances Segways, and stabilises Tesla yaw.

IntroducedKalman, 1960
CostJ = ∫₀^∞ (x'Qx + u'Ru) dt
Optimal gainK = R⁻¹B'P
Riccati equationA'P + PA − PBR⁻¹B'P + Q = 0
Gain margin−6 dB to +∞
Phase margin≥ 60°

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The setup — plant, cost, and the question

Take a linear time-invariant plant in state-space form,

ẋ = A x + B u
y = C x

where x ∈ ℝⁿ is the state, u ∈ ℝᵐ is the control input, and A, B are constant matrices. We want to drive the state to zero (regulation) and we want to do it without wasting actuator effort. Both desires are quadratic functions of the trajectory, so we encode them in a single scalar cost:

J = ∫₀^∞ ( x(t)' Q x(t)  +  u(t)' R u(t) ) dt

Q ∈ ℝⁿˣⁿ is positive semidefinite (Q ≽ 0) — the state penalty. R ∈ ℝᵐˣᵐ is strictly positive definite (R ≻ 0) — the control penalty. Both are usually diagonal in practice; the diagonal entries say "how much do I care about errors in this state" and "how much does it cost me to use this actuator". The LQR question is: among all control laws u(·) that admit a finite J, which one minimises it?

The answer is striking. The minimiser is a static linear state feedback — no integral, no derivative, no memory, just a constant matrix multiplying the current state:

u*(t) = − K x(t)         with    K = R⁻¹ B' P

where P ∈ ℝⁿˣⁿ is the unique positive-semidefinite solution of the continuous-time algebraic Riccati equation (CARE):

A' P + P A − P B R⁻¹ B' P + Q = 0

That is the entire core of the theory. Three matrices in, one matrix gain out, and the closed loop ẋ = (A − BK) x is automatically stable. The result is due to Rudolf Kalman's 1960 paper "Contributions to the theory of optimal control," which is one of the founding documents of modern control engineering.

Why a quadratic cost?

The choice of x'Qx + u'Ru is not arbitrary. Three reasons it dominates:

Tractable. Quadratic cost plus linear dynamics give a quadratic value function V(x) = x'Px. Plugging this into the Hamilton-Jacobi-Bellman equation reduces an infinite-dimensional problem to a finite-dimensional one — the Riccati equation in P.
Convex. The integrated cost is convex in u, so any stationary point is the global minimum. There are no local optima to worry about.
Faithful. Around any operating point, a smooth cost is approximated quadratically to leading order. So LQR is what you would do anyway in a small neighbourhood of the regulation target.

The flip side: a quadratic cost can never encode "I want this state to stay below a hard limit" or "this actuator saturates at ±1". Those become Model Predictive Control problems instead.

Where the Riccati equation comes from

Hamilton-Jacobi-Bellman for the infinite-horizon problem says that the optimal cost-to-go V(x) satisfies

0 = min_u  [ x' Q x  +  u' R u  +  ∇V(x)' (A x + B u) ]

Guess V(x) = x' P x for some symmetric P. Then ∇V = 2 P x. Differentiate the bracket with respect to u and set to zero:

2 R u + 2 B' P x = 0    →    u* = − R⁻¹ B' P x = − K x

Substitute u* back into HJB. After cancelling the factor of x' (·) x and using a trick with A'P + PA, the result is the algebraic Riccati equation

A' P + P A − P B R⁻¹ B' P + Q = 0.

The four terms have a clean interpretation: A'P + PA is the open-loop quadratic energy rate, +Q is the state penalty added at every instant, and −PBR⁻¹B'P is what the optimal feedback subtracts off. The Riccati equation says that these balance exactly along the optimal trajectory.

The four free guarantees

Provided (A, B) is stabilisable and (A, √Q) is detectable, the unique P ≽ 0 satisfying the CARE delivers four classical properties that no other linear controller delivers from first principles:

Stability. A − BK is Hurwitz. Every closed-loop eigenvalue has negative real part. The Lyapunov function is V(x) = x'Px.
Gain margin −6 dB to +∞. The loop transfer at the plant input tolerates any positive scalar gain from 1/2 to infinity without losing stability.
Phase margin ≥ 60°. The corresponding phase margin is bounded below by 60°.
Optimality. No other admissible control law achieves a smaller J for the chosen Q and R.

The robustness margins are the so-called "Anderson-Moore" or "LQR" margins, proved in their 1971 textbook. They apply at the loop break at the plant input, in the full-state-feedback case. Crucially, they do not survive the introduction of a Kalman filter; that is what Doyle's celebrated 1978 paper "Guaranteed margins for LQG regulators: there are none" pointed out.

Tuning — Q and R in practice

Once you fix the model (A, B, the relevant states), choosing Q and R is the entire design freedom. There are two layers of practice.

Bryson's rule for the starting point

Arthur Bryson's 1969 textbook prescribes diagonal weighting matrices with

Q_ii = 1 / (max acceptable | x_i | )²
R_jj = 1 / (max acceptable | u_j | )²

This normalises every state and every input to roughly unit weight at its allowable excursion. It is rarely the final answer but it is almost always a sane starting point — far better than typing in identity matrices.

The Q/R ratio is the only knob that matters

The Riccati equation is invariant under simultaneous scaling Q → λQ, R → λR — the optimal K is identical. What sweeps the design space is the ratio Q/R, or equivalently the relative diagonal entries.

Increase Q relative to R	Decrease Q relative to R (or raise R)
Larger gains K	Smaller gains K
Closed-loop poles further left	Closed-loop poles closer to imaginary axis
Tighter tracking, smaller state errors	More relaxed regulation
More actuator effort and bandwidth	Less actuator effort
More sensitivity to noise and modelling error	Robust to noise, smoother control

A typical workflow: start with Bryson, simulate, look at the actuator effort vs the tracking error, scale Q (or one diagonal of Q) up or down by a factor of 3 or 10 to push the trade-off, repeat. Three iterations usually nail it.

Worked example — the inverted pendulum on a cart

Linearising the cart-pendulum around the upright equilibrium gives a 4-state system: cart position p, cart velocity ṗ, pole angle θ from vertical, pole angular rate θ̇. The control u is the force on the cart. With m, M, ℓ, g chosen, the linearised dynamics are

A = [ 0   1     0           0
      0   0   −mg/M         0
      0   0     0           1
      0   0   (M+m)g/(Mℓ)   0 ]

B = [ 0,  1/M,  0,  −1/(Mℓ) ]'

Open loop, A has an eigenvalue at +√((M+m)g/(Mℓ)) — the pole falls over. For M = 1 kg, m = 0.1 kg, ℓ = 0.5 m, g = 9.81 m/s², take

Q = diag( 1,  0,  10,  0 )      penalise position and pole angle
R = 0.1                         single-input, scalar

Solving the CARE (a 4×4 matrix Riccati — a one-line solver call in MATLAB or SciPy) gives the optimal gain

K ≈ [ −3.16,  −5.50,  39.0,  10.4 ]

and the closed-loop A − BK has all four eigenvalues comfortably in the left half-plane. The pendulum balances. Crank R down (to e.g. 0.01) and K grows — the cart slams the pendulum upright much faster, but jitters more in the presence of noise. Crank R up (to 1) and the cart is gentler at the cost of larger settling time.

Solving the Riccati equation numerically

You almost never solve the CARE by hand. Three families of numerical methods are standard:

Schur method (Laub 1979). Form the 2n×2n Hamiltonian matrix and compute an ordered Schur decomposition. The stable invariant subspace gives P directly. Numerically robust, the default in MATLAB lqr and care.
Eigenvalue method (Potter 1966). The original — diagonalise the Hamiltonian, partition the eigenvectors, P = V₂₁ V₁₁⁻¹. Fragile near multiple eigenvalues; superseded by Schur.
Newton iteration / matrix sign function. Useful for very large problems where you start from a known approximate solution; convergence is quadratic once close.

One-liners:

% MATLAB / Octave
K = lqr(A, B, Q, R);

# Python / SciPy
from scipy.linalg import solve_continuous_are
P = solve_continuous_are(A, B, Q, R)
K = np.linalg.solve(R, B.T @ P)

For a 10-state plant this returns in microseconds. For 10⁴-state plants (PDE discretisations) the Schur method becomes expensive and you switch to ADI / projection schemes.

LQG — adding a Kalman filter

The Achilles heel of full-state LQR is that it assumes you measure every state. In practice you have

ẋ = A x + B u + w        w ~ N(0, W)   process noise
y = C x + v               v ~ N(0, V)   measurement noise

The separation principle of stochastic optimal control says you can design two pieces independently and bolt them together:

Build a Kalman filter that produces x̂(t), the minimum-variance estimate of x given the noisy y.
Apply the LQR control law to the estimate: u = − K x̂.

The composite controller is optimal for the joint linear-quadratic-Gaussian problem. The Kalman gain L = ΣC'V⁻¹ is itself a Riccati solution, dual to the LQR Riccati — and the estimation Riccati is

A Σ + Σ A' − Σ C' V⁻¹ C Σ + W = 0

Note the structural symmetry with the control Riccati: A swaps for A', B becomes C', Q becomes W, R becomes V. The duality of estimation and control is one of the elegant facts of the field. LQG is what flies in practice — every airliner inertial-aided autopilot is a flavour of LQG with gain scheduling.

Variants and extensions

Finite-horizon LQR. Replace ∫₀^∞ with ∫₀^T. The Riccati becomes a differential equation P(t), integrated backward from a terminal P(T) = Q_f. Used in trajectory optimisation and as the terminal cost of MPC.
Discrete-time LQR. For x[k+1] = A x[k] + B u[k] with cost Σ(x'Qx + u'Ru), the discrete algebraic Riccati equation P = A'PA − A'PB(R + B'PB)⁻¹B'PA + Q. Identical structure, slightly different formulas. MATLAB dlqr.
LQR tracking. To follow a reference r(t) rather than regulate to zero, augment the state with the integral of error and apply LQR to the augmented system. Gives "Type-1" tracking with no steady-state error.
LQI (LQR with integral action). The same idea — explicit integral state — is the practical default in industry. Eliminates steady-state error to step references and constant disturbances.
iLQR / DDP. Iterative LQR. Linearise a nonlinear system along a candidate trajectory, solve the resulting time-varying LQR, update the trajectory, repeat. The workhorse of modern robotics motion planning; the core inside many MPC implementations.
Loop-Transfer Recovery (LTR). A design trick that tunes the Kalman filter so the LQG loop transfer recovers the full-state LQR loop transfer — partly restoring the robustness margins that Doyle showed LQG can otherwise lack.
H∞ control. Different cost (∞-norm rather than 2-norm), explicit robustness to worst-case disturbances. Generalises LQR; reduces to LQR in the limit of vanishing disturbance.

Where LQR is in the wild

Aircraft autopilots. Pitch, roll, and yaw damper loops on the Boeing 777, 787, and the Airbus A350 are LQR-derived. Gains are scheduled with Mach and altitude, and the resulting system fits an LQG framework with inertial-measurement filtering.
Spacecraft attitude control. Reaction-wheel loops on virtually every commercial communications satellite, Hubble Space Telescope, and the ISS use LQR or LQG to point the body at sub-arcsecond accuracy.
Tesla Model S yaw and roll. The electronic stability program uses LQR-style state-feedback in the yaw plane, with vehicle speed and tyre-grip estimates feeding gain scheduling.
Segway and balancing robots. The textbook example. The inverted-pendulum-on-wheels system is precisely the cart-pendulum linearisation; an LQR designed in five minutes balances it indefinitely.
Hard-disk-drive head positioning. The voice-coil motor that positions the read/write head is regulated by an LQG controller running at tens of kHz. Sub-nanometre track-following depends on it.
Semiconductor wafer-stage motion. The litho stages in ASML scanners are nm-scale-precision LQG controllers under the hood, with hundreds of states and modal compensation built into Q.
Magnetically-levitated trains. Maglev gap controllers stabilise the unstable plant ẋ = Ax + Bu (gravity wants to pull the train onto the rails) with an LQR designed on the linearised model.

Common pitfalls

Forgetting the rank conditions. (A, B) must be stabilisable and (A, √Q) must be detectable for the Riccati to have a unique positive-semidefinite solution. If Q is rank-deficient and the unobservable states from √Q happen to be unstable in A, the CARE has no admissible solution. Cure: add a small ε I to Q or include the offending state explicitly.
Treating absolute Q and R values as meaningful. Only the ratio matters. Doubling both does nothing.
Hand-tuning K instead of Q and R. Engineers familiar with PID sometimes type gains directly. You lose every guarantee — the Riccati equation may have no solution that matches your K. Tune Q and R; let the solver give you K.
Believing the margins survive LQG. Adding a Kalman filter can shred the gain and phase margins. If you need the margins back, use Loop-Transfer Recovery, or switch to mixed-sensitivity H∞.
Linearising at the wrong operating point. LQR is local. If the state drifts more than ~20% of the linearisation radius, you need to gain-schedule or move to iLQR / MPC.
Ignoring actuator saturation. A quadratic cost happily commands u = ±100 if Q is large. Real actuators saturate, and the resulting trajectory is no longer the LQR trajectory. Add anti-windup logic or scale R aggressively.

Why LQR endures sixty-five years later

Modern control offers MPC, reinforcement learning, robust H∞, nonlinear backstepping. None of them have displaced LQR. Three reasons:

First, simplicity. The entire design procedure — pick Q and R, call lqr(A, B, Q, R), deploy K — fits in a sentence. Compare to MPC, which requires an online QP solver. LQR runs at gigahertz on a single multiply-and-add; the same controller can stabilise a satellite for thirty years on a 200-MHz radiation-hardened CPU.

Second, transparency. The eigenvalues of A − BK are visible, the Lyapunov function is x'Px, the gain margin is provable. When a regulator on an airliner needs certification, that transparency is decisive.

Third, completeness. LQR is the optimal solution to a clean problem statement. There is no better controller for a quadratic cost on linear dynamics — anything else is solving a different problem. It is what Bode plots and root locus would have converged to, given infinite design effort.

Sixty-five years after Kalman wrote it down, the LQR template is still the first thing a control engineer reaches for when handed an A, a B, and a settling-time spec.

Frequently asked questions

What problem does LQR actually solve?

Given a linear time-invariant plant ẋ = Ax + Bu, LQR finds the unique state-feedback law u = -Kx that minimises the infinite-horizon quadratic cost J = ∫₀^∞ (x'Qx + u'Ru) dt. Q ≽ 0 weights how much you care about each state error; R ≻ 0 weights how much you care about actuator effort. The optimisation balances them — push the state to zero as fast as you can without burning too much control. Once you choose Q and R, the law is determined, not tuned by hand.

What is the Riccati equation and why does it appear?

The continuous-time algebraic Riccati equation is A'P + PA − PBR⁻¹B'P + Q = 0. It is the stationarity condition that drops out of solving the LQR optimisation via dynamic programming (the Hamilton-Jacobi-Bellman equation has a quadratic value function V(x) = x'Px). Its unique positive-semidefinite solution P encodes the optimal cost-to-go from any state, and the optimal gain follows immediately as K = R⁻¹B'P. Numerically, solvers use the Schur or eigenvalue method on the associated Hamiltonian matrix; MATLAB's lqr() and Python's scipy.linalg.solve_continuous_are wrap this.

How do I pick Q and R?

The standard starting point is Bryson's rule: make Q and R diagonal, with Q_ii = 1/(max acceptable x_i)² and R_ii = 1/(max acceptable u_i)². This normalises every state and input to roughly unit weight. After that you sweep relative magnitude: scale R up to make the controller gentler (less control effort, slower response), scale Q up to make it more aggressive (tighter tracking, more actuator use). The ratio matters, not the absolute values — multiplying both by the same constant changes nothing.

Why is the closed loop automatically stable?

If (A, B) is stabilisable and (A, √Q) is detectable, the unique positive-semidefinite Riccati solution P makes A − BK Hurwitz — every closed-loop eigenvalue has negative real part. The proof uses V(x) = x'Px as a Lyapunov function and substitutes the Riccati equation into V̇. Stability falls out as a free consequence of optimality. This is one of the most useful guarantees in classical control: choose any Q and R that satisfy the rank conditions, run the solver, and you get a stabilising controller without doing a separate stability proof.

What are the famous LQR robustness margins?

For single-input LQR, Anderson and Moore proved in 1971 that the loop transfer at the plant input has gain margin from −6 dB to +∞ (i.e. you can multiply the gain by anything between 0.5 and ∞ without losing stability) and phase margin of at least 60°. These are the famous 'LQR margins'. They apply only at the loop break at the plant input — if you put a Kalman filter in the loop (LQG), there is no equivalent guarantee, as John Doyle pointed out in his 1978 paper 'Guaranteed margins for LQG regulators: there are none.'

What is LQG, and how does the separation principle work?

LQG = Linear-Quadratic-Gaussian control. In practice you rarely measure every state; you have noisy outputs y = Cx + v with process noise w added to the dynamics. The separation principle says you can solve the estimation problem and the control problem independently: design a Kalman filter to produce x̂, then apply u = -K x̂ with K from the LQR Riccati equation. The composite controller is optimal for the joint problem. The two design problems decouple cleanly, which is the practical reason engineers use LQG so often — you tune the filter for noise rejection and the regulator for control performance separately.

Where is LQR actually used?

Aircraft flight control augmentation (Boeing 777 and beyond use LQR-derived gains for pitch and yaw damping), spacecraft attitude control (Hubble, ISS, virtually every commercial satellite reaction-wheel loop), missile guidance, helicopter stabilisation, balancing robots and Segways, Tesla Model S yaw and roll control for electronic stability, magnetic-levitation trains, hard-disk-drive head positioning, semiconductor wafer-stage motion control. Anywhere the plant linearises cleanly around an operating point and you have a model — LQR is the first thing engineers reach for.

What if my system is nonlinear?

Three standard moves. First, linearise around the operating point and use LQR for local stabilisation — this works for the inverted pendulum near the upright, aircraft cruise, and most industrial regulators. Second, gain-schedule: precompute LQR gains at a grid of operating points and interpolate (used in flight control across the envelope). Third, switch to one of the nonlinear successors: state-dependent Riccati equation (SDRE), iLQR (iterative LQR used in robotics and MPC), or model-predictive control with an LQR terminal cost. The LQR template — quadratic cost, Riccati, state feedback — is the seed for most of modern optimal control.