Why the Kalman Filter Is Optimal and What Happens When It Isn't

The Kalman filter sits at the intersection of control theory, statistics, and signal processing, and yet its core idea is simpler than its reputation suggests. Given a noisy measurement and an imperfect prediction, how do you blend them optimally? Rudolf Kalman answered that question in his 1960 paper “A New Approach to Linear Filtering and Prediction Problems”, and that answer has been flying spacecraft, guiding missiles, and navigating GPS receivers ever since.

kalmanfilter.net takes a sound pedagogical approach: start with a radar tracking a moving object, get the math to work in that specific case, then generalize. The algorithm is abstract enough that pure matrix notation can obscure what is actually happening, and concrete enough that a good physical example makes the abstractions fall into place.

This post follows the same path but pushes further: into why the algorithm is optimal, how the mathematics connects to Bayesian inference, and what happens when the linear assumptions break down.

The Problem: Tracking a Noisy Target

Imagine a radar system tracking an aircraft. Every second, the radar returns a position measurement, but those measurements are noisy: the aircraft is not exactly where the radar says it is. The aircraft also follows a physics-based trajectory, constrained by velocity and acceleration, so its future position is predictable to some degree. You have two sources of information: a noisy measurement grounded in reality, and a smooth physical prediction that accumulates error over time. The Kalman filter combines them at each timestep in a way that is provably optimal given the assumptions.

The state vector for this problem is:

x = [position, velocity]

At each step, the algorithm predicts where the aircraft will be based on its last known state, then corrects that prediction using the new radar measurement. The ratio of trust between prediction and measurement is determined by what the filter calls the Kalman gain, computed from the relative uncertainties in each source.

The Two Phases

The algorithm runs in two phases, prediction and update, repeated at every timestep.

Prediction uses the system’s physical model. For constant-velocity motion:

x̂(k|k-1) = A · x̂(k-1|k-1)

where A is the state transition matrix. For position and velocity with timestep dt:

A = [[1, dt],
     [0,  1]]

This encodes the physics: the new position is the old position plus velocity times time, and velocity stays constant between measurements. Along with the state prediction, the covariance matrix P, which tracks uncertainty about the state, also propagates:

P(k|k-1) = A · P(k-1|k-1) · A^T + Q

Q is the process noise covariance: how much the real world deviates from the model. For an aircraft, it accounts for wind gusts and pilot maneuvers that the constant-velocity model cannot capture.

Update uses the radar measurement to correct the prediction:

K = P(k|k-1) · H^T · (H · P(k|k-1) · H^T + R)^(-1)
x̂(k|k) = x̂(k|k-1) + K · (z - H · x̂(k|k-1))
P(k|k) = (I - K · H) · P(k|k-1)

K is the Kalman gain. H maps the state to measurement space; for a position-only radar, H = [1, 0]. R is the measurement noise covariance, characterizing the radar’s accuracy. z is the raw measurement.

The gain K does the real work. When measurement noise R is small relative to prediction uncertainty, K is large and the algorithm leans on the measurement. When the prediction is confident and measurements are noisy, K is small and the algorithm trusts the model more. The gain is computed such that the resulting estimate minimizes mean-square error, which is what makes the filter optimal rather than merely reasonable.

A Minimal Python Implementation

import numpy as np

def kalman_filter(measurements, dt=1.0, process_noise=1.0, measurement_noise=10.0):
    x = np.array([[0.0], [0.0]])  # [position, velocity]

    A = np.array([[1, dt], [0, 1]])   # State transition
    H = np.array([[1, 0]])            # Measurement matrix (position only)
    Q = process_noise * np.array([    # Process noise covariance
        [dt**4 / 4, dt**3 / 2],
        [dt**3 / 2, dt**2]
    ])
    R = np.array([[measurement_noise]])  # Measurement noise covariance
    P = np.eye(2) * 500                  # Initial covariance (high uncertainty)

    estimates = []
    for z in measurements:
        # Predict
        x = A @ x
        P = A @ P @ A.T + Q

        # Update
        S = H @ P @ H.T + R
        K = P @ H.T @ np.linalg.inv(S)
        x = x + K @ (np.array([[z]]) - H @ x)
        P = (np.eye(2) - K @ H) @ P

        estimates.append(x[0, 0])

    return estimates

Feed this a sequence of noisy position readings and it returns smoother position estimates, plus a velocity estimate inferred entirely from the position sequence. The velocity state is never directly measured; the filter reconstructs it from the pattern of position observations across time. That secondary inference is one of the more satisfying things about the algorithm: you get hidden state estimation for free as a byproduct of filtering.

The Q matrix above uses the discrete-time Wiener process acceleration model, which assumes the acceleration at each timestep is an independent Gaussian random variable. This is a standard choice for tracking problems with unknown maneuvers.

Why It Is Optimal

The Kalman filter is the minimum mean-square-error estimator for linear Gaussian systems. From the Bayesian perspective, the filter maintains a Gaussian probability distribution over the state at each timestep. The prediction step propagates that distribution forward through the system model, widening it according to process noise. The update step conditions the distribution on the new measurement, producing a posterior that is also Gaussian. The mean of that posterior is the state estimate; the covariance is the uncertainty.

The Gaussian stays Gaussian because both the system and the noise are linear and Gaussian, which is precisely what enables the closed-form solution. The same equations can be derived from a least-squares perspective, and both derivations give identical results, which is a satisfying sign that the algorithm is doing something genuinely fundamental rather than merely convenient.

Tuning Q and R matters more than most introductions acknowledge. They are not free parameters to adjust until the output looks smooth; they are your probabilistic model of how the world works and how accurate your sensor is. Setting R too low tells the filter to trust measurements that are unreliable, making it jitter. Setting Q too low tells it to trust a model that cannot capture the system’s actual behavior, making it lag behind reality. In practice, R is often derivable from sensor datasheets, while Q usually requires physical modeling or empirical estimation from logged data. Getting these right is where most of the engineering effort goes in real deployments.

When the Linear Assumption Fails

Many real systems are nonlinear. A radar tracking a sharply maneuvering aircraft operates in a regime where the constant-velocity model is a poor approximation; a robot navigating with bearing-only sensors faces a nonlinear measurement relationship; GPS position estimation from raw pseudoranges involves nonlinear geometry.

Two extensions address this. The Extended Kalman Filter (EKF) linearizes the system at each timestep using Jacobian matrices, then applies the standard Kalman equations to the linearized version. It works well for mildly nonlinear systems but can diverge when the Jacobian is a poor local approximation of the true function, which happens in cases with strong curvature or large uncertainty.

The Unscented Kalman Filter (UKF), introduced by Julier and Uhlmann in 1997, takes a different approach: rather than linearizing the function, it propagates a set of carefully chosen sample points called sigma points through the nonlinear function and reconstructs Gaussian statistics from the outputs. For many nonlinear problems the UKF is more accurate than the EKF and avoids computing Jacobians, at the cost of more function evaluations per timestep. The standard formulation uses 2n + 1 sigma points for an n-dimensional state, so the cost scales linearly rather than with the Jacobian’s quadratic complexity.

For fully non-Gaussian problems, particle filters replace the Gaussian assumption with a weighted set of samples, which can represent arbitrary distributions including multimodal ones. Particle filters are powerful but computationally expensive; in high-dimensional state spaces they require exponentially more samples to maintain coverage, which limits their practical use to problems where the state dimension is manageable.

Applications That Depend on This

The Apollo Guidance Computer used an early Kalman filter implementation to estimate spacecraft trajectory in real time during the moon landings, one of the first production applications of the algorithm at scale. Battin’s implementation had to run on hardware with 4 KB of erasable memory, which required careful algorithmic choices about numerical precision and update scheduling.

Modern GPS receivers use the filter to fuse satellite timing measurements with inertial sensor data, producing smoother position estimates than either source alone can provide. Autonomous vehicle sensor fusion is essentially a large-scale Kalman problem: combining LIDAR, camera, radar, IMU, and GPS data into a unified state estimate at hundreds of hertz. The robot_localization package in ROS, widely used in robotics research and field deployments, exposes a configurable EKF and UKF implementation built around this exact structure.

Signal processing for biomedical sensors, computer vision object tracking, and financial time-series estimation all draw on the same predict-and-update structure, because the underlying problem is the same: a hidden state that evolves according to a model, observed through noisy sensors.

The Practical Takeaway

The Kalman filter is worth understanding not just as a specific tool but as a framework for reasoning about estimation under uncertainty. The predict-and-update structure captures something real about how to combine prior knowledge with new evidence, and once it is familiar, recognizing equivalent structures in other domains becomes straightforward.

The explicit modeling of Q and R forces you to quantify how much you trust the model versus the sensor, which is a discipline that clarifies thinking about any system with hidden state and noisy observations. The gap between a working Kalman filter and a well-tuned one is almost entirely in those two matrices. Getting the code right takes an afternoon; getting Q and R right takes an understanding of the physical system and the measurement process. Keeping that distinction clear is most of what filter engineering actually is.