Control & Estimation

Kalman Filter

Recursive optimal state estimation — predict, measure, blend, repeat

The Kalman filter is the recursive optimal estimator for a linear dynamical system driven by Gaussian noise. Each cycle it predicts the next state and its uncertainty from a dynamics model, then folds in the latest noisy measurement weighted by the Kalman gain — pulling the estimate toward whichever side (model or sensor) is more trustworthy. Rudolf Kalman published it in 1960. NASA used it to land Apollo on the Moon in 1969. Today it runs inside every GPS receiver, every drone autopilot, every SLAM stack, and the sensor-fusion layer of every self-driving car.

InventorRudolf E. Kálmán, 1960
First major useApollo guidance (EKF), 1969
Predictx̂ ← F x̂, P ← F P Fᵀ + Q
Updatex̂ ← x̂ + K (z − H x̂)
Kalman gainK = P Hᵀ (H P Hᵀ + R)⁻¹
Optimal underLinear dynamics + Gaussian noise

Interactive visualization

Press play, or step through manually. The visualization is yours to drive — try it before reading on.

Open visualization fullscreen ↗

Watch the 60-second explainer

A condensed visual walkthrough — narrated, captioned, under a minute.

The problem in one sentence

You have a noisy model of how something moves. You have a noisy sensor that occasionally tells you where it is. Neither is right; both have useful information. How do you combine them into the best possible estimate, on the fly, without storing every past measurement and every past prediction? That is the problem Rudolf Kalman wrote down in 1960, and the recursive equations he derived are still — sixty-six years later — the default answer in every domain where state has to be inferred from data.

The setup is uncannily general. Anything that evolves in time and is observed indirectly fits the template: a moving vehicle observed by GPS, a spacecraft tracked by ground radar, a chemical reactor inferred from thermocouples, a stock price filtered from tick noise, a finger position read by a capacitive sensor. The same four equations apply.

The two-step cycle

State the linear system in standard form. Let x be the state vector (position and velocity, say), F the dynamics matrix, H the measurement matrix, Q the process noise covariance, R the measurement noise covariance, and P the estimate covariance:

x_k = F x_{k-1} + w_k     w_k ~ N(0, Q)     dynamics
z_k = H x_k     + v_k     v_k ~ N(0, R)     measurement

The recursive filter is exactly four equations, split into two phases.

Predict — push the previous estimate forward through the dynamics model:

x̂_{k|k-1} = F x̂_{k-1|k-1}
P_{k|k-1} = F P_{k-1|k-1} Fᵀ + Q

The mean evolves deterministically. The covariance grows by the model's own uncertainty (Q) plus whatever uncertainty the previous estimate had, propagated through F. Without a measurement, every predict step makes the filter less sure of itself — the ellipse of possible states expands.

Update — fold in the new measurement z_k:

K_k       = P_{k|k-1} Hᵀ (H P_{k|k-1} Hᵀ + R)⁻¹      Kalman gain
x̂_{k|k}   = x̂_{k|k-1} + K_k (z_k − H x̂_{k|k-1})     posterior mean
P_{k|k}   = (I − K_k H) P_{k|k-1}                    posterior covariance

The innovation z_k − H x̂_{k|k-1} is the surprise — how far the measurement was from what the model predicted. The Kalman gain K_k decides how much of that surprise to absorb. With a precise sensor (R small) and an uncertain model (P large), K is close to one and the estimate jumps to the measurement. With a noisy sensor (R large) and a confident model (P small), K is near zero and the measurement barely nudges the estimate. The covariance P then contracts to reflect the new, sharper belief.

That is the whole algorithm. Predict, update, predict, update — for as long as data arrives.

Why the Kalman gain is what it is

The formula K = P Hᵀ (H P Hᵀ + R)⁻¹ looks arbitrary; it is actually the unique value that minimises the trace of the posterior covariance under the linear-Gaussian assumption. In one dimension with H = 1, it collapses to

K = P / (P + R)

Read this as "the fraction of total uncertainty that lives in the model". If the model has all the uncertainty (P ≫ R), K → 1 and we replace the prediction with the measurement. If the sensor has all the uncertainty (R ≫ P), K → 0 and we ignore the measurement. The Kalman filter is, in this sense, a precision-weighted average — exactly what a Bayesian update of two Gaussians produces, derived without ever invoking Bayes explicitly.

Apollo: the algorithm's first big job

Stan Schmidt at NASA Ames was reading Kalman's 1960 paper a few months after publication and immediately recognised it as the answer to Apollo's guidance problem. The spacecraft needed to estimate its position and velocity continuously across translunar coast, lunar orbit, descent, and ascent, using a mix of inertial measurements from the onboard IMU, optical sightings from the sextant, and intermittent radar tracks from Earth. No analytical formula could fuse those heterogeneous streams in real time on a 2 MHz machine with 4 KB of erasable memory.

Schmidt's contribution was the Extended Kalman Filter: orbital dynamics are nonlinear, so he linearised f and h around the current best estimate at each step, applied the standard Kalman equations to the linearised system, then re-linearised on the next cycle. The Apollo Guidance Computer ran this loop for the entire mission. Without it, the lunar entry corridor — a few kilometres wide, a hundred-thousand-kilometre throw — would have been unreachable. The Kalman filter went from theoretical curiosity to mission-critical infrastructure in nine years.

Worked example: tracking a 1-D constant-velocity target

Consider a target moving in one dimension at roughly constant velocity. State x = [position; velocity]. Sample period Δt = 1 s. The dynamics matrix is

F = [ 1  Δt ]      H = [ 1  0 ]    (we measure position only)
    [ 0   1 ]

Q = σ_a² [ Δt⁴/4  Δt³/2 ]    R = σ_z²
         [ Δt³/2  Δt²   ]

With σ_a (process acceleration) and σ_z (measurement noise) set, the filter initialises with a wide prior P₀ — say diag(100, 100). At the first measurement z₁, the gain K₁ is large (because P₀ ≫ R) and the estimate snaps near z₁. After a dozen consistent measurements, P contracts and K shrinks; the filter starts to smooth — successive measurements barely move the estimate, but a deviation that lasts more than a few samples is interpreted as a true acceleration and is tracked. This is the canonical "alpha-beta filter" of radar tracking; the Kalman recursion is its principled, model-derived generalisation.

The family tree of Kalman-like filters

The pure linear-Gaussian Kalman filter has spawned a small industry of variants for the cases where the original assumptions break.

Variant	What it relaxes	How	Where it's used
Standard KF	—	Linear F, H; Gaussian Q, R	Tracking, prediction, smoothing
Extended KF (EKF)	Nonlinear f, h	First-order Taylor linearisation	Apollo, GPS receivers, INS, robotics
Unscented KF (UKF)	Strongly nonlinear f, h	Sigma-point unscented transform	Attitude estimation, target tracking
Particle filter	Non-Gaussian, multimodal	Sequential Monte Carlo	SLAM, visual tracking, robotics
Information filter	Many sensors, sparse coupling	Parameterise by P⁻¹ instead of P	SLAM, decentralised fusion
Ensemble KF	Very-high-dimensional state	Sample covariance from ensemble	Weather forecasting (ECMWF, NOAA)
Square-root form	Numerical conditioning	Propagate Cholesky factor of P	Embedded systems, aerospace
Rauch-Tung-Striebel	Smoothing (use future data)	Forward KF + backward pass	Offline trajectory reconstruction

The EKF was the workhorse of the 1970s–2000s. The UKF, introduced by Julier and Uhlmann in 1995, is now preferred for strongly nonlinear problems because it avoids Jacobian computation and captures third-order moments. The particle filter wins when the posterior is multimodal — for example a robot localising in a long symmetric corridor where two positions are equally plausible. The information filter dominates in distributed estimation where every sensor can simply add its contribution Hᵀ R⁻¹ H without coordinating priors.

GPS + INS: the canonical fusion problem

A GPS receiver gives you a high-accuracy position fix every second, but only when satellites are visible — useless in tunnels, urban canyons, indoors. An inertial measurement unit (IMU) gives you accelerations and angular rates at hundreds of hertz, always available, but integrating them twice to get position causes drift that grows quadratically: a 1 milli-g bias on accelerometers produces ~150 m of error after one minute. Neither sensor alone is acceptable. Together, fused by a Kalman filter, they are nearly perfect.

The classical scheme is a 15-state EKF: 3 position, 3 velocity, 3 attitude, 3 accelerometer biases, 3 gyro biases. The IMU drives the predict step at hundreds of hertz, propagating the state and growing the covariance. Each incoming GPS fix triggers an update step — H selects position, R is the GPS covariance reported by the receiver, the innovation pulls position and velocity back toward truth, and the same gain corrections also estimate the IMU biases over time, because the same biases are responsible for the drift the GPS just corrected. After a few minutes the biases are nailed and the filter can coast through GPS outages for tens of seconds with metre-level accuracy.

Variants of this same architecture run in commercial aircraft (FOG and ring-laser-gyro INS coupled to differential GPS), in iPhone location services (sensor fusion of GPS, Wi-Fi, cell, and the M-series motion coprocessor), in quadcopter autopilots (PX4, ArduPilot), and in every level-3+ self-driving car shipping today.

Where Kalman lives in modern autonomy

Vehicle state estimation. One large EKF or UKF fuses IMU, wheel encoders, GPS, and visual odometry into vehicle position, velocity, and attitude. Output rate ~100 Hz, used by every downstream module.
Object tracking. One Kalman filter per detected object (other cars, pedestrians, cyclists). State is typically [position, velocity, possibly acceleration and yaw]. Used to associate detections across frames and predict where each agent will be a second from now.
Visual-Inertial Odometry (VIO). MSCKF, ROVIO, VINS-Mono — all run an EKF or sliding-window filter that fuses camera feature tracks with IMU prediction. The foundation of AR/VR positioning (Apple ARKit, Google ARCore) and small-drone autopilots.
SLAM front-end. Real-time SLAM systems use an EKF or information filter to recursively update the joint robot-and-map state. The back-end may switch to a pose-graph or factor-graph optimiser for global consistency, but the front-end recursion remains Kalman.
Battery state-of-charge estimation. Every modern EV battery management system runs an EKF over the cell electrochemical model to estimate state of charge and state of health from voltage, current and temperature measurements.
Radar tracking. Air-traffic control, naval, missile-defence — Kalman filters (often with multi-hypothesis tracking) maintain track on every blip from primary and secondary radars.

Tuning Q and R is the whole art

The filter equations are exact; the difficulty is supplying realistic Q and R. The measurement covariance R is the easier of the two — most sensors come with datasheet noise specifications, and many modern devices report a real-time uncertainty estimate (GPS receivers output a covariance matrix per fix). The process noise Q is the harder problem: it encodes all the dynamics the model does not include. Too small and the filter ignores measurements that disagree with its overconfident prediction (a divergent filter that gets steadily wronger). Too large and the filter chases every measurement (essentially a moving average with no smoothing benefit).

Practical recipes: (1) start with Q from a physical argument — typical jerk, typical wind gust, typical engine torque ripple. (2) Run the filter on logged data and inspect the innovation sequence; under correct Q and R, the normalised innovations should be unit-variance white noise (the NIS / NEES tests). (3) When that fails, scale Q up or down until the innovations look white. Adaptive filters do this online: the IAE and MMAE algorithms tune Q and R from running innovation statistics during operation.

Numerical pitfalls that haunt practitioners

Covariance loss of positive-definiteness. The update P ← (I − K H) P is mathematically positive-semidefinite but can fail numerically. The Joseph form P ← (I − K H) P (I − K H)ᵀ + K R Kᵀ is symmetric by construction and far more robust. Aerospace code always uses Joseph form.
Loss of symmetry. Force symmetry every step via P ← (P + Pᵀ) / 2 — a tiny accumulating asymmetry will destabilise the filter over hours.
Inverse of (H P Hᵀ + R). Solving the system rather than literally inverting is faster and more stable. Square-root and UD-factored filters keep the Cholesky factor of P throughout and never form P explicitly.
Initial covariance too small. An overconfident P₀ tells the filter to ignore early measurements. Initialise P₀ pessimistically and let it contract — the filter will converge from any starting point provided it is observable.
Mismatched timesteps. If sensors arrive asynchronously, propagate F and Q to each measurement's actual timestamp rather than the nominal cycle time. Asynchronous measurement Kalman filters are a standard pattern.

Common pitfalls in real deployments

Treating P as the truth. P is the covariance the filter believes given the Q and R you supplied; it has no necessary relation to actual error unless the model is correct. Always validate by simulation against ground truth.
Ignoring outliers. A single bad GPS fix can permanently bias the bias states. Gate the innovation by χ² before applying the update, and reject outliers above a threshold.
Linearising in the wrong frame. EKF Jacobians on rotations should use Lie-algebra (so(3)) error states rather than Euler angles, to avoid singularity and parameterisation artifacts. Error-state Kalman filters (ESKF) are the modern norm in IMU work.
Forgetting unobservability. Some states are not observable from some measurement sets (the IMU bias on an axis nobody excites is unobservable). The filter will not estimate them; pretending it does leads to silent failure.
Mixing position and orientation. Quaternions and rotation matrices are not vectors — additive Kalman updates break unitarity. Use a quaternion-error state or the SO(3) Lie group framework.
One-size-fits-all Q. Q usually needs to depend on regime: a vehicle Kalman filter should use larger Q during cornering than on a straight highway. Adaptive Q via innovation matching closes the gap.

Frequently asked questions

What problem does the Kalman filter actually solve?

Given a noisy dynamical model of how a state evolves and a stream of noisy measurements of that state, recover the best (minimum mean-square-error) estimate at every instant — without storing the full history. For linear dynamics and Gaussian noise, the Kalman filter is provably optimal. It is the recursive Bayesian estimator under Gaussian assumptions: the posterior mean after one predict-update cycle equals the posterior mean of the full batch problem.

What is the Kalman gain and why does it weight model versus measurement?

The Kalman gain K = P Hᵀ (H P Hᵀ + R)⁻¹ is the optimal blending coefficient between the model-predicted state and the new measurement. When the measurement covariance R is small (precise sensor) and the prior covariance P is large (uncertain model), K is close to one and the filter trusts the measurement. When R is large and P is small, K is near zero and the filter trusts the model. K is derived by minimising the trace of the posterior covariance — it is literally the value that makes the updated estimate least uncertain in expectation.

Why was the Kalman filter chosen for Apollo guidance in 1969?

Stan Schmidt at NASA Ames adapted Kalman's 1960 paper into the Extended Kalman Filter for the Apollo guidance computer, the first major non-trivial use of the algorithm. The AGC had only 4 KB of erasable memory and a 2 MHz processor; the EKF was the only practical estimator that could fuse onboard inertial measurements with intermittent star-sightings and ground-based radar updates in real time. It propagated position and velocity through the orbital dynamics between fixes, and contracted the covariance whenever a sighting arrived. Without it Apollo could not have hit the lunar entry corridor.

What is the difference between the Extended and Unscented Kalman Filter?

The EKF linearises a nonlinear dynamics or measurement function around the current mean using a first-order Taylor expansion (the Jacobians F = ∂f/∂x and H = ∂h/∂x), then applies the standard Kalman update. It works when nonlinearity is mild but biases badly when the local linear approximation is poor. The UKF avoids linearisation: it picks 2n+1 deterministic sigma points that capture the mean and covariance exactly, propagates each one through the full nonlinear function, and reconstructs a Gaussian from the propagated points. The UKF captures up to third-order nonlinear effects for Gaussian inputs at comparable cost to the EKF, and is the default modern choice for attitude estimation and other strongly nonlinear problems.

When does a Kalman filter fail?

When its assumptions break. The KF assumes (1) linear dynamics and measurement, (2) zero-mean Gaussian process noise Q and measurement noise R, (3) correctly specified Q, R, F, H, and (4) no outliers. Real systems violate all four. Common failures: gross outliers (a glitched GPS fix) pull the estimate off and the filter takes many cycles to recover; mis-specified Q either makes the filter blindly trust a wrong model (Q too small, filter diverges) or chase noise (Q too large, no smoothing); strongly nonlinear measurements break the EKF Jacobian linearisation; non-Gaussian multimodal posteriors (think two GPS fixes that disagree) cannot be represented by a single mean and covariance — that needs a particle filter or multi-hypothesis tracker.

How is a Kalman filter different from a low-pass filter or moving average?

A moving average has a fixed weight and a fixed time constant — it smooths everything equally regardless of how trustworthy the inputs are. A Kalman filter adapts its smoothing in real time. The Kalman gain shrinks when the model is confident (after many consistent measurements) and grows when the model is uncertain (after a manoeuvre, after a measurement gap). The KF also propagates an explicit uncertainty estimate (the covariance P), which downstream consumers can use — for example, a path planner that should drive more conservatively when localisation uncertainty is high.

Where does the Kalman filter live in modern autonomous vehicles?

Almost everywhere there is sensor fusion. Tesla, Waymo, Cruise, and every aerospace IMU vendor run Kalman-family filters to fuse GPS, IMU, wheel odometry, LIDAR, and camera measurements into a unified vehicle-state estimate (position, velocity, attitude, biases). Track-level filters maintain one EKF per detected object — pedestrians, other cars — for prediction. Visual-Inertial Odometry (VIO) systems like MSCKF and VINS-Mono are Kalman filters at heart. SLAM back-ends increasingly use factor graphs (a batch generalisation of the KF) for global optimisation, but recursive Kalman filters still dominate the real-time front end because they run in bounded time with a fixed-size state.

What is the information-form Kalman filter, and when is it useful?

The information filter is the Kalman filter parameterised by the inverse covariance Y = P⁻¹ (the information matrix) and the information vector y = P⁻¹ x̂ instead of P and x̂. The predict step becomes harder (it requires a matrix inverse) but the update step becomes trivial — measurement information just adds: Y⁺ = Y + Hᵀ R⁻¹ H, y⁺ = y + Hᵀ R⁻¹ z. This makes the information form ideal for fusing many independent sensors, for SLAM (where the information matrix is sparse), and for decentralised estimation across a network where each node can contribute its own Hᵀ R⁻¹ H without coordinating priors.