The Predict-Update Loop That Landed on the Moon

The Kalman filter tutorial on kalmanfilter.net resurfaces on Hacker News with several hundred points every year or two, and each time the comments fill with people encountering the filter for the first time alongside people who have used it in production for decades. The enduring interest makes sense once you understand what the filter actually does: it is the exact Bayesian-optimal estimator for a linear system with Gaussian noise, and it achieves this recursively without storing any history. That combination of mathematical optimality and computational economy explains sixty-five years of continuous use.

The 1960 Paper and the State-Space Revolution

Rudolf Kálmán published “A New Approach to Linear Filtering and Prediction Problems” in the Journal of Basic Engineering in March 1960. The title undersells the contribution. The dominant approach before Kalman was the Wiener filter, which operated in the frequency domain and required the full history of measurements to compute an estimate. Kalman reformulated the problem in state space, representing what you want to track as a vector of variables and writing down how that vector evolves over time as a linear equation. The recursive structure that fell out of this reformulation was the essential insight: given the current estimate and its uncertainty, along with a new measurement, you can compute the updated estimate in closed form without touching any prior data.

This is not just computationally convenient. It matches how physical systems actually behave. A radar return tells you where an aircraft is now, not where it was twenty scans ago. The state-space formulation lets you encode that physics directly.

Kalman visited NASA Ames Research Center shortly after publication and met Stanley Schmidt, who recognized the implications immediately. Schmidt implemented a variant of the filter for the Apollo guidance computer, fusing inertial measurement unit data with star tracker observations and ground radar ranging. The Apollo Guidance Computer had only 4 kilobytes of RAM, and the filter’s property of requiring only the current state estimate and its covariance matrix, rather than accumulating all measurement history, made it the only viable approach on that hardware. The lunar descent navigation that put Apollo 11 on the surface ran on this algorithm.

The Predict-Update Cycle

The filter operates in two alternating steps. The predict step uses a physics model to project the state estimate forward in time; the update step corrects that prediction using a new measurement. Both are linear operations, and together they maintain a Gaussian posterior over the hidden state.

The state at time k is a vector x_k. The model says it evolves as:

x_k = F * x_{k-1} + w_k

where F is the state transition matrix encoding your physics and w_k is process noise drawn from a zero-mean Gaussian with covariance Q. Measurements arrive as:

z_k = H * x_k + v_k

where H maps the state into measurement space and v_k is measurement noise with covariance R.

The predict step:

x̂_{k|k-1} = F * x̂_{k-1|k-1}
P_{k|k-1}  = F * P_{k-1|k-1} * F^T + Q

The update step:

y_k     = z_k - H * x̂_{k|k-1}          # innovation
S_k     = H * P_{k|k-1} * H^T + R       # innovation covariance
K_k     = P_{k|k-1} * H^T * S_k^{-1}   # Kalman gain
x̂_{k|k} = x̂_{k|k-1} + K_k * y_k
P_{k|k}  = (I - K_k * H) * P_{k|k-1}

The Kalman gain K_k is where the insight lives. When R is small (a precise sensor), K grows and the update pulls the estimate strongly toward the measurement. When P_{k|k-1} is small (a confident model prediction), K shrinks and the measurement barely moves the estimate. The filter is a weighted combination of model and sensor, with weights derived automatically from the noise statistics you provide.

Why the Radar Example Works So Well

Tracking an aircraft with radar is the canonical pedagogical example because the setup maps directly onto every piece of the filter. The state is position and velocity; the radar measures position only; the flight path follows roughly constant-velocity dynamics with occasional maneuvers captured by process noise. Here is a minimal Python implementation using filterpy, Roger Labbe’s teaching library:

import numpy as np
from filterpy.kalman import KalmanFilter
from filterpy.common import Q_discrete_white_noise

dt = 1.0  # one second between radar scans

f = KalmanFilter(dim_x=4, dim_z=2)  # [px, py, vx, vy], observe [px, py]

f.F = np.array([[1, 0, dt, 0],
                [0, 1, 0, dt],
                [0, 0, 1,  0],
                [0, 0, 0,  1]])

f.H = np.array([[1, 0, 0, 0],
                [0, 1, 0, 0]])

f.R = np.eye(2) * 100.0     # radar position noise: 10m std dev
f.Q = Q_discrete_white_noise(dim=2, dt=dt, var=1.0, block_size=2)

f.x = np.zeros((4, 1))     # unknown initial position/velocity
f.P = np.eye(4) * 500.     # high initial uncertainty

The filter infers velocity from the sequence of position measurements. After three or four radar returns, the velocity estimate has converged, and the track is smoother than the raw radar returns. This is the filter doing exactly what it was designed to do: extracting state that the sensor never directly observes by exploiting the known relationship between state variables encoded in F.

The kalmanfilter.net tutorial walks through this with concrete numerical tables, which is the right pedagogical approach. Watching actual numbers flow through the predict and update equations is more illuminating than any diagram, because it makes the covariance updates tangible rather than abstract.

The Linearization Problem and Its Solutions

The filter as described is optimal only when the state transition and measurement functions are linear. Most interesting systems are not. A robot’s heading rotates nonlinearly with angular velocity. A GPS receiver converts satellite ranges into a Cartesian position through spherical geometry. Ballistic trajectories involve quadratic drag.

The Extended Kalman Filter (EKF) addresses this by linearizing nonlinear functions at the current estimate using their Jacobians. It has been the backbone of navigation systems and SLAM implementations for decades, but it diverges when the nonlinearity is severe or when the Jacobian is expensive or error-prone to derive manually.

The Unscented Kalman Filter (UKF), introduced by Simon Julier and Jeffrey Uhlmann in their 1997 paper, takes a different approach. Instead of approximating the function, it approximates the probability distribution using a small set of deterministic sample points called sigma points. For a state of dimension n, you need 2n+1 sigma points, propagated through the true nonlinear function, with the output mean and covariance recovered from the transformed points. No Jacobians required. The UKF captures second-order terms in the nonlinearity that the EKF misses and is numerically more stable in practice.

With JAX, even the EKF Jacobian derivation can now be automated:

import jax
import jax.numpy as jnp

def f(x, dt):
    return jnp.array([x[0] + x[1]*dt, x[1]])  # constant velocity

# Exact Jacobian via autodiff, no hand derivation
F_k = jax.jacobian(f)(x_current, dt)

This removes a significant source of implementation bugs and maintenance cost from EKF deployments.

Q and R: The Hardest Part of Using the Filter

The filter equations are straightforward. Tuning them to a real system is not. R is typically the easier matrix: sensor datasheets specify noise density, and you can estimate it empirically from stationary data. Q is harder because it represents unmodeled dynamics, which by definition you do not know in advance.

A Q that is too small causes the filter to trust its model too much. When the true system deviates from the model (an aircraft maneuvers, a robot slips on a wet floor), the filter adapts slowly and the estimates lag reality while the covariance P stays deceptively small. This is filter divergence, the most common failure mode in real deployments.

A Q that is too large produces noisy estimates that never smooth; the filter behaves like a first-order low-pass filter rather than a model-based estimator.

The principled diagnostic is the Normalized Innovation Squared (NIS): the statistic y_k^T * S_k^{-1} * y_k should follow a chi-squared distribution with dim_z degrees of freedom. If the sample average is consistently larger than expected, your Q or R is too small; if smaller, you are overestimating noise. This is a real-time tuning signal that many production systems never implement, though it is clearly described in Yaakov Bar-Shalom’s tracking textbook and in Roger Labbe’s free Jupyter book on Kalman and Bayesian filters.

The Connection to Modern Sequence Models

The Kalman filter’s state-space structure has become newly relevant in the context of deep sequence modeling. The filter is fundamentally a linear dynamical system: a hidden state evolves through matrix multiplication, and observations are linear projections of that state. The S4 architecture (Gu et al., 2021) draws directly on this formulation to build sequence models with long-range dependencies by parameterizing a structured state-space model with learnable matrices. The Mamba architecture (2023) extends this with selective state spaces where the transition and projection matrices are input-dependent, making them essentially learned Kalman-like filters that adapt their gain based on the content of each token.

KalmanNet (Revach et al., 2022, IEEE Transactions on Signal Processing) is a more direct synthesis: it replaces the analytically computed Kalman gain with an RNN that learns the gain from data, handling model mismatch that the classical filter cannot tolerate. The underlying predict-update structure stays the same; the gain computation becomes learned. The result is a system that degrades gracefully when the physics model is only approximately correct, which covers most real-world deployments.

The classical filter and these learned variants are solving the same estimation problem from different starting points. The classical filter has a precise prior and requires no data, but struggles with mismatch. The learned variants adapt but require training data and lose the interpretability that makes the classical filter easy to debug. For systems where you have good physics models (navigation, radar tracking, structural monitoring), the classical filter remains the better starting point, and the NIS diagnostic tells you when you have tuned it correctly.

The radar example that opens kalmanfilter.net is not just pedagogically convenient. It is a direct ancestor of every GPS receiver, autonomous vehicle object tracker, and spacecraft navigation system built since 1960, and its predict-update structure is now showing up inside the sequence models that underpin large language models. That arc from a noisy radar blip to the architecture of modern transformers is not obvious from a single tutorial, but it is there.