Due to some projects, I’ve decided to start a set of posts on Monte-Carlo and its variants. These include Monte-Carlo (MC), Markov chain Monte-Carlo (MCMC), Sequential Monte-Carlo (SMC) and Multi-Level Monte-Carlo (MLMC). I’ll probably expand these posts further at a later point.
Here we cover “vanilla” Monte-Carlo, importance sampling and self-normalized importance sampling:
The idea of Monte-Carlo simulation is quite simple. You want to evaluate the expectation of where
is a some random variable (or set of random variables) with distribution
and
is a real-valued. Note that the expectation is really an integral:
So we can think of Monte-carlo as evaluating an integral.
Given you can sample then the expectation
can be approximated by
In particular, if the samples are i.i.d. the strong law of large numbers gives that, with probability ,
Also it holds that
(since the variance satisfies for
and
independent). So here we see the errors go down at rate
.
A Classical Example. Let where
are independent. Let
Note that the area of the quarter circle
is
. Then
The rate of convergence in this example is pretty atrocious when compared with numerical methods. 1 However the example gets the main idea across: there is some difficult to calculate quantity (namely ), we generate random variables
we do a calculation to get a random variable of interest
and then we repeat until we get a good average. The method is extremely simple and generalizable (to situations where other numerical methods are not readily available).
Importance Sampling.
If we want to calculate
we don’t need to sample from , we can sample from another distribution
instead (and this can help improve convergence). We can use
instead of
because
where
(Above is the probability density function (pdf) of
over the pdf of
for continuous random variables or is the probability mass function of
over
for discrete random variables, and in general is the Radon-Nikodym derivative.)
Thus when applying important sampling, we sample and we perform the estimate
The following lemma, although not entirely practical, gives good insights as to why importance sampling can help
Lemma [the Perfect Importance Sampler] If and we sample from
with
then the estimator is such that
Proof.
![\begin{aligned}
\mathbb E_{\nu} [ \tilde Z^2 ]
=
\mathbb E_\mu \left[
Z^2
\frac{\mathbb E_\mu [Z]}{Z}
\right]
=
\mathbb E_\mu [ Z]^2
=
\mathbb E_\nu [ \tilde Z]^2 \, ,\end{aligned}](https://appliedprobability.files.wordpress.com/2021/05/4660d97e187783dc462032ab1efc1d61.png)
where the last inequality follows by . Therefore .
The above suggest that we should choose with probability proportional to
to get low variance.2 Of course, we don’t know
in advance, so we cannot sample in this way. However, in practice, any sampling mechanism that concentrates selection around the area of interest would likely have a good impact on performance. Indeed importance sampling can substantially improve selection related to sampling from the underlying distribution
.
Self-Normalized Importance Sampling.
In importance sampling, we apply a weight to each sample, here we know that
. However, sometimes we only know these weights upto some constant (i.e. we don’t know the correct normalizing constant which happens a lot in Bayesian statistics) In that case, we can renormalize with the following self-normalized importance sample:
So long as the weights remain bounded, the rate of convergence is comparable to that of MCMC.
Proposition. If the weights are bounded then for all bounded function
it holds that
Proof. Note that for
Note that the term in square brackets is bounded by . Thus applying this equality we have that
Now notice that, since , we have that
The above inequality applies to both terms in (by taking ). Thus we have, as required, that
References
Monte-Carlo is by now a very standard method. Buffon’s Needle and the calculation of by Ulam and Von Neumann and coauthors are classical early examples, see Metropolis. The texts of Kroese et al. and Asmussen and Glynn provide good text book accounts.
Metropolis, N. “The Beginning of the Monto-Carlo Method.” Los Alamos Science 15 (1987): 125-130.
Asmussen, Søren, and Peter W. Glynn. Stochastic simulation: algorithms and analysis. Vol. 57. Springer Science & Business Media, 2007.
Kroese, Dirk P., Thomas Taimre, and Zdravko I. Botev. Handbook of monte carlo methods. Vol. 706. John Wiley & Sons, 2013.