In the *Cross Entropy Method*, we wish to estimate the likelihood

Here is a random variable whose distribution is known and belongs to a parametrized family of densities . Further is often a solution to an optimization problem.

It is assumed that is a relatively rare event, say of order . One way to estimate is to take IID samples and then average

This gives us an unbiased estimate. However, because the event is rare, we may require a large number of samples in order to provide a good estimate of . *Importance sampling* is a straightforward method that can mitigate these effects. Here sample each from a different density and perform the estimate

where W(x) = f(x)/g(x). Such a change can improve the estimation of . For instance, the choice gives the correct estimate of with certainty. Clearly cannot work in practice; it requires knowledge of , the parameter which we are estimating. Nonetheless a good estimate of may be possible. The point of the cross entropy method is to find a reasonable choice for so that importance sampling can be applied.

In particular, the *cross-entropy method* advocates estimating by according to a parameter choice that minimizes the relative entropy between and . Namely,

With a little rearrangement, we see that this is equivalent to the optimization

With importance sampling in mind, we apply a change of measure to the above integral to get

where . The above optimization may not be explicitly solvable so we replace it with the (importance sampled) optimization

where are IID samples from density .

In many examples the above optimization in concave and solvable either by descent methods or by differentiating and solving for the condition

The reason for incorporating importance sampling into the estimate is because the event could be too unlikely to form an accurate estimate under the choice . The idea behind the cross entropy method is to apply a *multi-level* approach where we sequentially increase and as follows

- We estimate the -th percentile of by taking by ordering the samples and calculating the empirical -th percentile, which gives the update

The event is chosen to be reasonably likely, e.g. . Note that since, in principle, the parameter is such that less likely under when compared with the previous choice of .

- Given this , we find as the best estimate according to our cross-entropy rule . This, in principle, should increase the likelihood of the event when compared with prior choice of .
- When passes the required target level then the have found a parameter choice that gives reasonable importance to the event and we can calculate with importance sampling rule .