We briefly describe an Online Bayesian Framework which is sometimes referred to as Assumed Density Filer (ADF). And we review a heuristic proof of its convergence in the Gaussian case.
Bayes Rule gives
For data , parameter
and new data point
.
ADF suggests projecting data at time to a parameter (vector)
. This gives a routine that consists of the following two steps. (See [Opper] for the main reference article)
Update:
Project:
Here is the KL-divergence of distributions
and
Remark. Note that for exponential families of distributions:
then matching moments of gives the minimization of the above.
Let’s assumes that is a normally distributed with mean
and covariance matrix
.
Under this one can argue that obeys the recursion
(1)
and obeys the recursion:
(2)
Here is normal with mean zero and covariance
. The partial derivative,
, above is taken with respect to the
th component of
.
Quick Justification of (1) and (2)
Note that
A similar calculation gives the other expression on .
For
This gives the differential equation
This implies
because
We assumes is drawn IID from a distribution
. We assumes there is an attractive fixed point
satisfying
(3)
So
The last approximation that removes the normal distribution error needs justifying. The inequality with assumes that
(in the case where they are not equal – i.e. when the model is miss specified – we just puts in some matrix
instead of
)
In principle should not be too far from
, because
imply that
so the variance of
goes to zero at rate
justifying the approximation for
. From the above we see that “const” is
(or
if the
)). So
Next we start to analyse the error:
He notes that by and then a Taylor expansion that
Next we see that using
The sum on the right-hand side goes to zero because of . So we get
It is also possible to analyze
. The above expressions give
(again using ) which is solved by
This is actually the same convergence as expected by MLE estimates.
Literature
This is based on reading Opper and Winther:
Opper, Manfred, and Ole Winther. “A Bayesian approach to on-line learning.” On-line learning in neural networks (1998): 363-378.