We briefly describe an Online Bayesian Framework which is sometimes referred to as Assumed Density Filer (ADF). And we review a heuristic proof of its convergence in the Gaussian case.
Bayes Rule gives
For data , parameter and new data point .
ADF suggests projecting data at time to a parameter (vector) . This gives a routine that consists of the following two steps. (See [Opper] for the main reference article)
Here is the KL-divergence of distributions and
Remark. Note that for exponential families of distributions:
then matching moments of gives the minimization of the above.
Let’s assumes that is a normally distributed with mean and covariance matrix .
Under this one can argue that obeys the recursion
and obeys the recursion:
Here is normal with mean zero and covariance . The partial derivative, , above is taken with respect to the th component of .
Quick Justification of (1) and (2)
A similar calculation gives the other expression on .
This gives the differential equation
We assumes is drawn IID from a distribution . We assumes there is an attractive fixed point satisfying
The last approximation that removes the normal distribution error needs justifying. The inequality with assumes that (in the case where they are not equal – i.e. when the model is miss specified – we just puts in some matrix instead of )
In principle should not be too far from , because
so the variance of goes to zero at rate justifying the approximation for . From the above we see that “const” is (or if the )). So
Next we start to analyse the error:
He notes that by and then a Taylor expansion that
Next we see that using
The sum on the right-hand side goes to zero because of . So we get
It is also possible to analyze
. The above expressions give
(again using ) which is solved by
This is actually the same convergence as expected by MLE estimates.
This is based on reading Opper and Winther:
Opper, Manfred, and Ole Winther. “A Bayesian approach to on-line learning.” On-line learning in neural networks (1998): 363-378.