Recursion as Necessity: How the Kalman Filter Got to the Moon and Into Your Drone
Source: hackernews
Rudolf Kalman published “A New Approach to Linear Filtering and Prediction Problems” in the Journal of Basic Engineering in 1960. The paper was not celebrated immediately. Control theorists trained on Wiener filter methods found it strange. The Wiener filter required the entire history of observed data and assumed the process was stationary. Kalman’s filter required only the current state estimate and a covariance matrix. That was the breakthrough, and it was as much an engineering constraint as a mathematical one.
Stanley Schmidt at NASA Ames saw the paper and recognized something immediately useful. The Apollo Guidance Computer had roughly 4 KB of RAM. There was no feasible way to store a growing history of measurements from inertial sensors and cross-check them against a Wiener model. The recursive structure was not a nice property; it was a hard requirement. Schmidt implemented what became known as the Schmidt-Kalman filter for Apollo navigation, and every crewed Apollo mission used it. The filter ran on hardware that a modern microcontroller would embarrass, doing matrix arithmetic in real time to estimate position and velocity from noisy sensor data.
The Five Equations
The standard linear Kalman filter operates in a predict-update loop. The state vector x̂ holds your best estimate of whatever you are tracking. The covariance matrix P holds your uncertainty about that estimate.
Prediction propagates the state forward using a transition matrix F:
x̂ₖ⁻ = F x̂ₖ₋₁
Pₖ⁻ = F Pₖ₋₁ Fᵀ + Q
Update incorporates a new measurement zₖ:
K = Pₖ⁻ Hᵀ (H Pₖ⁻ Hᵀ + R)⁻¹
x̂ₖ = x̂ₖ⁻ + K(zₖ - H x̂ₖ⁻)
Pₖ = (I - KH) Pₖ⁻
K is the Kalman gain. When your predicted uncertainty P⁻ is large relative to the measurement noise R, K approaches 1 and the filter trusts the measurement. When R dominates, K approaches 0 and the filter trusts the model. The term (zₖ - H x̂ₖ⁻) is the innovation: how much the actual measurement differed from what the model predicted.
This is sometimes described as “just a weighted average,” and that framing appeared in a recent Hacker News discussion of Kalman filter tutorials. The framing is not wrong, but it undersells the structure. The weights are derived from first principles given Gaussian noise assumptions, and the covariance propagation is what makes the weights self-correcting over time. The filter does not need you to specify weights manually; it computes them from the noise models you provide.
A Concrete State Model: Constant Velocity Tracking
For a 2D tracker with position and velocity, the state vector is [x, ẋ, y, ẏ]. The constant velocity assumption gives a transition matrix:
import numpy as np
dt = 0.1 # seconds
F = np.array([
[1, dt, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, dt],
[0, 0, 0, 1]
])
This says: new position equals old position plus velocity times time step; velocity is constant. The model is wrong for any real object, which is exactly why Q exists.
The process noise covariance Q encodes unmodeled dynamics, typically acceleration the model does not represent. For a discrete white noise acceleration model with acceleration standard deviation σₐ:
sigma_a = 0.5 # m/s², tune this
Q = sigma_a**2 * np.array([
[dt**4/4, dt**3/2, 0, 0],
[dt**3/2, dt**2, 0, 0],
[ 0, 0, dt**4/4, dt**3/2],
[ 0, 0, dt**3/2, dt**2]
])
This form comes from integrating white noise acceleration over a time step and computing the resulting position and velocity covariances. The derivation is in Roger Labbe’s Kalman and Bayesian Filters in Python, which is the best free resource on the subject; interactive Jupyter notebooks, worked numerical examples throughout.
The Radar Problem and Why the EKF Exists
Straight Cartesian position measurements pair cleanly with the state vector above. Radar does not give you Cartesian coordinates. It gives you range r and bearing θ:
r = sqrt(x² + y²)
θ = atan2(y, x)
The observation function h(x) is nonlinear. The standard Kalman filter requires H to be a matrix, meaning h must be linear. The Extended Kalman Filter handles this by linearizing h at each time step using its Jacobian:
H = ∂h/∂x evaluated at x̂ₖ⁻
For the radar case:
H = [ x/r, y/r, 0, 0 ]
[-y/r², x/r², 0, 0 ]
where r = sqrt(x̂² + ŷ²). The EKF substitutes this linearized H into the standard update equations. Alex Becker’s kalmanfilter.net walks through this radar example numerically, step by step, which is the clearest exposition available for seeing what the Jacobian substitution actually does to the numbers.
The weakness is that linearization introduces error. With high curvature in h() or f(), or large uncertainty relative to the nonlinearity scale, the EKF can diverge. The Unscented Kalman Filter, developed by Julier and Uhlmann in the mid-1990s, avoids Jacobians by propagating a set of deterministically chosen sigma points through the nonlinear function directly. It achieves third-order accuracy for Gaussian distributions versus the EKF’s first-order. Particle filters extend to non-Gaussian distributions by representing the posterior with weighted samples, at a cost that scales poorly with state dimension.
A Python Skeleton
import numpy as np
class KalmanFilter:
def __init__(self, F, H, Q, R, P, x0):
self.F = F
self.H = H
self.Q = Q
self.R = R
self.P = P
self.x = x0
def predict(self):
self.x = self.F @ self.x
self.P = self.F @ self.P @ self.F.T + self.Q
return self.x
def update(self, z):
S = self.H @ self.P @ self.H.T + self.R
K = self.P @ self.H.T @ np.linalg.inv(S)
innovation = z - self.H @ self.x
self.x = self.x + K @ innovation
self.P = (np.eye(len(self.x)) - K @ self.H) @ self.P
return self.x, innovation
The innovation return value matters. If the filter is well-tuned, innovations should be white noise: zero mean, uncorrelated over time, with variance matching S. Plotting the innovation sequence and checking its autocorrelation is the standard diagnostic. Persistent bias in innovations means the model is wrong somewhere, usually Q.
Where the Real Engineering Lives
R is usually straightforward. Sensor manufacturers publish noise specifications; you can measure them directly with the sensor stationary. R for a GPS receiver giving horizontal position might be a 2x2 diagonal with values derived from the stated circular error probable.
Q is where engineers spend time. It encodes everything the model does not represent: wind gusts, throttle changes, road surface variation, human motion. Setting Q too small makes the filter slow to respond to real dynamics; innovations grow large and correlated. Setting Q too large makes the filter track noise and lose smoothing benefit. The σₐ parameter in the Q expression above is a tuning knob, not a measured quantity.
The innovation whiteness test is the practical tool for Q tuning. Run the filter on recorded data, plot the normalized innovations, compute their autocorrelation. If there is structure, Q needs adjustment. Some practitioners use adaptive algorithms that estimate Q online from the innovation sequence, though these add complexity and their own tuning parameters.
Modern applications of Kalman filtering are pervasive in ways that are easy to miss. GPS receivers run 8-state filters fusing pseudorange measurements from multiple satellites with a dynamics model. Drone flight controllers fuse IMU accelerometers and gyroscopes with barometric altitude and GPS using extended or unscented variants. The SORT object tracking algorithm used in many real-time video pipelines pairs a Kalman filter with the Hungarian algorithm for data association, maintaining a constant velocity state per tracked bounding box between detector frames.
The recursive structure that made the filter fit in 4 KB of Apollo memory is the same property that makes it viable in embedded systems today. Each update requires only the current state estimate and covariance, a fixed-size computation regardless of how long the filter has been running. Kalman’s insight in 1960 was that the sufficient statistic for Gaussian state estimation is exactly these two quantities. Everything after that, EKF, UKF, particle filters, is engineering around the cases where the Gaussian assumption breaks down.