Two Sources of Truth: The Principled Uncertainty Arithmetic Behind the Kalman Filter

The kalmanfilter.net tutorial making rounds on Hacker News right now uses a radar example to walk through the Kalman filter from first principles. It is a good tutorial. But the radar example, while concrete, can make the filter feel like a domain-specific trick for tracking flying objects. It is not. The Kalman filter is a general solution to a problem that appears everywhere: you have a model of how something evolves over time, you have noisy observations of that thing, and you want the best possible estimate of its true state at every moment.

Understanding why the filter works, not just how to apply it, is what makes it reusable across GPS receivers, robot localization, financial time series, and sensor fusion in autonomous vehicles.

The Problem It Solves

Consider an aircraft at some position you do not know exactly. Your radar measures its position, but radar is noisy; the measurement might be off by 50 meters. You also have a motion model: the aircraft was somewhere a second ago, it has some velocity, and physics tells you roughly where it should be now. But your model is also imperfect; the aircraft might accelerate or turn, introducing model error.

A naive approach averages measurements. A slightly better approach weights recent measurements more heavily. Neither approach is principled about how much to trust the model versus the sensor, and neither maintains a running estimate of how uncertain the estimate actually is.

The Kalman filter handles all of this simultaneously. It maintains a state vector x (position, velocity, whatever you care about) and a covariance matrix P (the uncertainty around that state). Both evolve together. The uncertainty is not an afterthought; it drives the filter’s behavior.

The State Space Representation

The filter assumes your system obeys two linear equations. The process model describes how the state evolves:

x_k = F * x_{k-1} + w_k

The observation model describes what you actually measure:

z_k = H * x_k + v_k

Here F is the state transition matrix, H is the observation matrix, and w_k and v_k are zero-mean Gaussian noise terms with known covariances Q and R respectively.

For the radar example, the state is [position_x, velocity_x, position_y, velocity_y] and the state transition matrix encodes constant-velocity motion:

import numpy as np

dt = 1.0  # one-second time step

F = np.array([
    [1, dt, 0,  0],
    [0,  1, 0,  0],
    [0,  0, 1, dt],
    [0,  0, 0,  1],
])

# We only observe position, not velocity
H = np.array([
    [1, 0, 0, 0],
    [0, 0, 1, 0],
])

The observation matrix H maps from the 4D state space to the 2D measurement space. You measure x and y position but not velocity directly; the filter infers velocity from the sequence of position measurements.

The Two-Step Cycle

Every time step runs the same two phases: predict, then update.

Prediction propagates the state estimate forward using the motion model:

x_pred = F * x
P_pred = F * P * F^T + Q

The first equation is straightforward: apply the dynamics. The second propagates the uncertainty forward. Multiplying by F rotates the covariance ellipsoid through state space; adding Q inflates it because the motion model is not perfect. Uncertainty grows during prediction.

Update incorporates the new measurement:

innovation = z - H * x_pred
S = H * P_pred * H^T + R
K = P_pred * H^T * S^{-1}
x = x_pred + K * innovation
P = (I - K * H) * P_pred

The Kalman gain K is the key quantity. It computes how much to correct the prediction based on the measurement, given the relative uncertainties of each. When R is small (precise sensor), K is large and the filter trusts the measurement. When R is large (noisy sensor), K is small and the filter trusts its own prediction. After the update, the covariance P shrinks; a measurement, even a noisy one, reduces uncertainty.

Why This Is Optimal

The filter is not heuristic. Under four assumptions, linear dynamics, Gaussian process noise, Gaussian measurement noise, and known initial state distribution, the Kalman filter produces the minimum variance unbiased estimate at every time step.

The estimate is unbiased, meaning correct on average. The covariance is minimum, meaning no other linear estimator can achieve lower expected squared error. The posterior distribution after each update step is exactly Gaussian:

p(x_k | z_1, ..., z_k) = N(x̂_k, P_k)

This means the filter is not just outputting a point estimate with an attached error bar. The full posterior is known analytically, and it is Gaussian with mean x̂_k and covariance P_k. That is a strong claim, and it is true because the product of two Gaussians is Gaussian, and linear transformations preserve Gaussianity.

The classical derivation from Roger Labbe’s free Jupyter book shows this via Bayes’ theorem applied to Gaussian distributions. The prediction step is the Chapman-Kolmogorov equation under Gaussian assumptions; the update step is Bayes’ rule applied to a Gaussian likelihood. The Kalman filter is Bayesian filtering made tractable for linear-Gaussian systems.

A Concrete Python Implementation

Using filterpy, the most comprehensive Python Kalman filter library:

from filterpy.kalman import KalmanFilter
from filterpy.common import Q_discrete_white_noise

kf = KalmanFilter(dim_x=4, dim_z=2)
kf.F = F
kf.H = H

# Radar position noise: ~50m standard deviation
kf.R = np.diag([50.0**2, 50.0**2])

# Process noise: aircraft acceleration uncertainty ~1 m/s²
kf.Q = Q_discrete_white_noise(dim=2, dt=dt, var=1.0, block_size=2)

# Initial state and uncertainty
kf.x = np.array([[0.], [100.], [0.], [10.]])  # px, vx, py, vy
kf.P = np.diag([500.**2, 30.**2, 500.**2, 30.**2])

for z in measurements:
    kf.predict()
    kf.update(z)
    print(f"Position: ({kf.x[0,0]:.1f}, {kf.x[2,0]:.1f})")
    print(f"Velocity: ({kf.x[1,0]:.1f}, {kf.x[3,0]:.1f})")

With 50-meter radar noise and 50 time steps, the filter typically converges to position estimates within 5 to 10 meters of ground truth, and correctly infers velocities it never directly observes. The covariance P converges to a steady state where the growth from Q during prediction exactly balances the shrinkage from measurements during update.

The Variants and When You Need Them

The standard filter only works for linear systems. Most real systems are not linear. Three major extensions address this.

The Extended Kalman Filter (EKF) linearizes the nonlinear dynamics around the current estimate using Jacobians. If your process or observation function is f(x) instead of F*x, you compute ∂f/∂x at the current estimate and use that as the local linear approximation. This works for mildly nonlinear systems but can diverge when the linearization is poor, which happens when the initial estimate is far from the truth or the system curves sharply between updates.

The Unscented Kalman Filter (UKF), developed by Julier and Uhlmann in 1995, avoids Jacobians entirely. It selects a small set of deterministic sigma points that capture the mean and covariance of the current estimate, propagates each point through the exact nonlinear function, and reconstructs the output distribution from the weighted results. This is second-order accurate versus EKF’s first-order, and it handles moderately nonlinear systems substantially better without requiring any analytical derivatives.

from filterpy.kalman import UnscentedKalmanFilter, MerweScaledSigmaPoints

points = MerweScaledSigmaPoints(n=4, alpha=0.1, beta=2.0, kappa=-1)
ukf = UnscentedKalmanFilter(
    dim_x=4, dim_z=2, dt=dt,
    fx=nonlinear_process_fn,
    hx=nonlinear_obs_fn,
    points=points
)

Particle filters handle cases where EKF and UKF both fail: severely nonlinear dynamics or non-Gaussian noise. They represent the posterior as a weighted set of samples (particles), propagate each sample through the exact dynamics, then reweight by measurement likelihood. They are accurate for any system but scale poorly with state dimension; in high-dimensional spaces you need exponentially more particles to cover the space adequately.

For robotics, the standard reference is Thrun, Burgard, and Fox’s Probabilistic Robotics, which covers all three variants in the context of localization and SLAM. For tracking applications specifically, Bar-Shalom, Li, and Kirubarajan’s book remains the canonical treatment.

The Covariance Matrix Is Not Optional

Practitioners sometimes strip out the covariance tracking because it adds computation. This is a mistake. The covariance is not a diagnostic output; it is what the filter uses to compute the Kalman gain. Without accurate covariance tracking, the gain is wrong, and the filter degrades into something that happens to blend predictions with measurements without knowing how.

The covariance also serves as a diagnostic. If P grows without bound, the filter is diverging; the model is not capturing the dynamics or the noise parameters are miscalibrated. If P shrinks to near zero very quickly, Q is probably too small and the filter is over-trusting its model. Watching the evolution of P is how you debug a Kalman filter.

What Kalman Got Right in 1960

Rudolf Kalman’s original 1960 paper solved a batch estimation problem recursively. The insight was that you do not need all past measurements to compute the current optimal estimate; you only need the previous estimate and its covariance. That sufficiency result is what made real-time filtering possible on the hardware of the day.

NASA immediately applied the filter to the Apollo program. The Apollo Guidance Computer had 4 KB of RAM; the recursive form was the only one that fit. Stanley Schmidt implemented the first practical version for spacecraft navigation, fusing inertial measurements with star tracker observations and ground radar. The filter is what allowed Apollo to navigate to the Moon without continuous ground uplink.

Decades later, the Kalman filter is in every GPS receiver, every autonomous vehicle sensor fusion stack, and most implementations of SLAM for mobile robots. The mathematical core has not changed. What has changed is the computational budget for extending it to nonlinear and non-Gaussian domains, and the growing set of tools, from filterpy to dynamax to torchfilter, that make those extensions accessible without deriving everything from scratch.

The kalmanfilter.net tutorial is a solid entry point for the intuition. The next step is Labbe’s notebook book, which works through the derivations hands-on and covers the full variant hierarchy. After that, the original paper holds up remarkably well; it is short, precise, and still the clearest statement of what the filter actually guarantees.