The Algorithm That Navigated Apollo Is Still in Your GPS Chip

Rudolf Kalman published “A New Approach to Linear Filtering and Prediction Problems” in 1960. The paper introduced a recursive algorithm for estimating the state of a linear system from noisy observations, and it was immediately recognizable as something practically useful. NASA put it to work in the Apollo Guidance Computer less than a decade later. Today, it runs inside every GPS receiver, every inertial navigation unit, and most autonomous vehicle stacks. The kalmanfilter.net tutorial making the rounds on Hacker News uses a radar tracking scenario to build up the intuition, and that choice of example is not accidental. Radar is where the Kalman filter was first widely deployed, and the problem structure maps cleanly onto the filter’s core assumptions in a way that makes the math feel motivated rather than imposed.

The Problem the Filter Solves

You have a radar station tracking an aircraft. At each time step, the radar returns a noisy measurement of the aircraft’s range, or its 2D position, or both. The noise is real: atmospheric scattering, receiver electronics, quantization. You want to estimate where the aircraft is, where it is going, and how fast. A single measurement tells you roughly where it is but nothing reliable about velocity. Averaging recent measurements helps with noise but introduces lag. What you want is a way to combine a physical model of how the aircraft moves with the stream of noisy observations, giving each appropriate weight based on how uncertain each source is.

That is exactly what the Kalman filter does, and it does it optimally under two assumptions: the system dynamics are linear, and all noise is Gaussian with known covariance. When those assumptions hold, the Kalman filter is the minimum-variance unbiased estimator. No algorithm can do better with the same information.

State, Transition, and Measurement

The filter operates on a state vector. For the radar example, a natural choice is a two-component state: position and velocity.

x = [position, velocity]ᵀ

The state transition model describes how the state evolves over time. Under a constant-velocity assumption with time step dt:

A = [[1, dt],
     [0,  1]]

Multiplying the current state by A gives the predicted next state: position advances by velocity times dt, velocity stays constant. This is the kinematic model. It is not perfect (aircraft maneuver), but it is a useful prior.

The measurement model H maps from state space to measurement space. If the radar measures position only:

H = [[1, 0]]

Two noise terms enter the picture. Process noise Q captures how much the real system deviates from the kinematic model over each time step. Measurement noise R captures how noisy the radar returns are. Both are covariance matrices.

The Predict-Update Cycle

The filter alternates between two steps every time a new measurement arrives.

Predict: Use the state transition model to project the current estimate forward.

x̂⁻ = A · x̂
P⁻ = A · P · Aᵀ + Q

Here P is the estimate error covariance, a matrix that encodes how uncertain the current state estimate is. Applying the transition matrix grows this uncertainty (we are less sure about the future than the present), and adding Q accounts for unmodeled dynamics.

Update: When a measurement arrives, compute the Kalman gain and correct the prediction.

K = P⁻ · Hᵀ · (H · P⁻ · Hᵀ + R)⁻¹
x̂ = x̂⁻ + K · (z - H · x̂⁻)
P = (I - K · H) · P⁻

The term (z - H · x̂⁻) is the innovation: the difference between what was measured and what was predicted. The Kalman gain K determines how much of that innovation to incorporate into the updated estimate.

What the Kalman Gain Actually Does

The gain is the interpretive heart of the filter. When measurement noise R is large relative to prediction uncertainty P⁻, the denominator of K is dominated by R and K stays small. The filter leans on its own prediction and discounts the noisy measurement. When P⁻ is large (the prediction is uncertain), K grows and the filter trusts the measurement more. Over time, as the filter accumulates data, P typically converges to a steady-state value, and so does K.

This is the reason the filter is so appealing pedagogically: the balance between model and measurement is not a tuning parameter you set by hand, it emerges automatically from the relative uncertainties.

A Working Python Example

import numpy as np

dt = 1.0
A = np.array([[1, dt], [0, 1]])
H = np.array([[1, 0]])
Q = np.array([[0.1, 0], [0, 0.1]])  # process noise
R = np.array([[25.0]])              # radar measurement noise (5m std dev)

x = np.array([[0.0], [10.0]])       # initial: position=0m, velocity=10m/s
P = np.eye(2) * 500.0              # high initial uncertainty

measurements = [10, 22, 31, 44, 53, 66, 74, 90, 102, 115]

for z_val in measurements:
    z = np.array([[float(z_val)]])

    # Predict
    x_pred = A @ x
    P_pred = A @ P @ A.T + Q

    # Kalman gain
    S = H @ P_pred @ H.T + R
    K = P_pred @ H.T @ np.linalg.inv(S)

    # Update
    x = x_pred + K @ (z - H @ x_pred)
    P = (np.eye(2) - K @ H) @ P_pred

    print(f"meas={z_val:5.1f}m  pos={x[0,0]:6.2f}m  vel={x[1,0]:5.2f}m/s")

After a few steps, the velocity estimate stabilizes around 10 m/s even though the radar never directly measures velocity. The filter infers it from successive position measurements, weighted by their consistency with the kinematic model. That inference is not magic; it follows from the structure of A and H combined with the recursive covariance update.

Where This Came From

Before Kalman, the canonical solution to optimal linear filtering was the Wiener filter, developed by Norbert Wiener in 1949. The Wiener filter is optimal, but it is not recursive: it requires processing the entire signal history as a batch, which is computationally infeasible for real-time systems on 1960s hardware. Kalman’s insight was that the optimal estimate can be updated recursively, requiring only the current estimate and the new measurement. The history is summarized by P, not stored explicitly.

The Apollo program adopted a variant of the filter for the guidance computer’s navigation system, implemented by MIT’s Instrumentation Laboratory. The guidance computer had roughly 4 KB of RAM. A recursive estimator was not just elegant; it was the only viable option.

GPS receivers have used Kalman filters since the earliest implementations in the 1970s and 1980s. The filter fuses pseudorange measurements from multiple satellites with receiver clock models and, in tightly coupled designs, IMU data to bridge signal dropouts. The same structure, applied to a more complex state vector.

When Linearity Breaks

The constant-velocity radar model is linear: the next state is a linear function of the current state, and the measurement is a linear function of the state. Many real systems are not. A radar measuring range and bearing (polar coordinates) to an aircraft whose state is in Cartesian coordinates produces a nonlinear measurement equation. Aircraft with coordinated turns violate the constant-velocity transition model.

Two main extensions handle nonlinearity. The Extended Kalman Filter (EKF) linearizes the nonlinear functions at each step using their Jacobians. It is computationally efficient and works well when nonlinearities are mild, but it can diverge when they are severe because the linearization discards higher-order terms. The Unscented Kalman Filter (UKF), developed by Julier and Uhlmann in 1997, propagates a small set of carefully chosen sample points (sigma points) through the nonlinear functions and reconstructs the mean and covariance from the results. It captures nonlinear effects to the third order and typically outperforms the EKF for highly nonlinear systems, at modestly higher computational cost.

For problems where the noise is genuinely non-Gaussian, particle filters replace the Gaussian covariance representation with a cloud of weighted samples. They can handle arbitrary distributions but scale poorly with state dimension.

Why It Has Not Been Replaced

Deep learning has absorbed many classical signal processing pipelines, but Kalman filtering has remained stubbornly useful. Part of this is the uncertainty quantification: the covariance P is not just a side product; it is a calibrated measure of how much to trust the current estimate, which downstream systems can use for decision-making. Neural networks typically do not provide this kind of output without careful additional engineering.

The filter is also interpretable. You can look at Q and R and understand what physical assumptions are encoded. You can diagnose failures by monitoring the innovation sequence; a persistent bias suggests a bad model, not a bad implementation.

Modern autonomous vehicle stacks commonly run EKF or UKF as a backbone for sensor fusion across lidar, radar, camera, and IMU, with learning-based modules feeding into them rather than replacing them. The robot_localization package in ROS, widely used in mobile robotics, is essentially a configurable multi-sensor EKF. The filter’s assumptions are often only approximately met, and it still works well enough to be the default choice sixty-five years after it was published.

The radar example in the tutorial is effective precisely because it strips the problem to its essentials. One target, one sensor, two state variables, clean Gaussian noise. Once that case is clear, extending to GPS fusing with IMU, or a lidar point cloud fusing with wheel odometry, is a matter of expanding the matrices, not reconceiving the algorithm.