Kalman filtering (and filtering in general) considers the following setting: we have a sequence of states , which evolves under random perturbations over time. Unfortunately we cannot observe
, we can only observe some noisy function of
, namely,
. Our task is to find the best estimate of
given our observations of
.
Consider the equations
where ,
and
and
are independent. (We let
be the sub-matrix of the covariance matrix corresponding to
and so forth…)
The Kalman filter has two update stages: a prediction update and a measurement update. These are
and
where
The matrix is often referred to as the Kalman Gain. Assuming the initial state
is known and deterministic
in the above.
We will use the following proposition, which is a standard result on normally distributed random vectors, variances and covariances,
Prop 1. Let be normally distributed vector with mean
and covariance
, i.e.
i) For any matrix and (constant) vector
, we have that
ii) If we take then
conditional on
gives
iii) ,
.
We can justify the Kalman filtering steps by proving that the conditional distribution of is given by the Prediction and measurement steps. Specifically we have the following.
Theorem 1.
where and
.
Proof. We show the result by induction supposing that
Since is a linear function of
, we have that
where, by Prop 1ii), we have that
Given , we have by Prop 1iii) that
and
. Thus
Thus applying Prop 1ii), we get that
$ \square$
Literature
The Kalman filter is generally credited to Kalman and Bucy. The method is now standard in many text books on control and machine learning.
Kalman, Rudolph E., and Richard S. Bucy. “New results in linear filtering and prediction theory.” (1961): 95-108.