Multi-Level Monte Carlo (MLMC)

Multi-Level Monte-Carlo is an Monte-carlo method for calculating numerically accurate estimates when fine grained estimates are expensive, but cheap coarse-grained estimates can be used to supplement this. We considered the simulation of stochastic differential equations, which is the application first proposed, but we note that the approach applies in a variety of other settings.

Aim. Suppose X_t obeys the stochastic differential equation

and the aim is to estimate

at some time T for some function f. (E.g. think of estimating the expected value of a put or call option – we could consider more exotic path dependent objects too.)

Monte-Carlo. In a “vanilla” Monte-Carlo approach we would estimate with

where X^{(n)}_T, n=1,...,N, are IID samples of X_T. Assuming that the variance is finite then the Central Limit Theorem will give a convergence rate to the mean of order {1}/{\sqrt{N}}. Or expressed slightly differently to get a Root-Mean-Square (RMS) estimate of order \epsilon, i.e.

we need about N =O(\epsilon^{-2}) samples.

Numerical and simulation error. However, the assumption here is that we can simulate X_T exactly, which is not in general possible. We need to numerically simulate the SDE (which introduces numerical errors). Suppose now that \hat X^{(n)}_T, n=1,...,N are now estimates of X_T (which may not have the same distribution as X_T and, also, may not be independent). In that case, the usual bias-variance decomposition is

As before the variance goes down at \frac{1}{N} under Monte-Carlo simulation but we also need to care about the bias, specifically, we also need to control how good is our approximation is to \bar f.

Numerical Approximation of SDEs. The simplest scheme for simulating an SDE is the Euler-Maruyama scheme: Here where t_{n+1}-t_n=h and \Delta W_n \sim N(0,h). This can be improved by the following scheme due to Milstein: Is can be shown that the error of the Euler-Maruyama scheme can be found to be

while for Milstein’s method

Bias and Variance again. We can now see that the bias and variance from above is

Thus to get this error to be less than \epsilon^2 [so that RMS is less than \epsilon] we require N= O(\epsilon^2) samples with time steps of size h=\epsilon. Thus the computational cost is h N = O (\epsilon^{-2}).

The aim of Multi-Level Monte-Carlo (MLMC) is to reduce the computational cost required to get the RMS below \epsilon. While there are methods for reducing the number of samples N [such as quasi-monte carlo], MLMC aims to reduce the cost of generating paths by performing coarse grained simulations combined with a small number of fine grained simulations.

Simulating Brownian Motion at different levels. The easiest way to approximate a brownian motion is to take normal distributions Z^{(l)}_1,...,Z^{(l)}_N\sim \mathcal N(0,h:=2^{-l}) and then take

Here for “level” l we can use this to construct a Brownian motion we an approximation of order h=2^{-l}. Notice that since Z^{(l)}_{2i}+Z^{(l)}_{2i+1}\sim N(0,2h:=2^{-l+1}) also get for free a Brownian motion at level l-1, by taking

We can also go the other way from course grained to fine grained, since if B_0=0 and B_h=b then B_{h/2} = b/2 + \mathcal N(0,1/4). Although the former method is perhaps simpler than this latter approach.

Numerical Approximation of SDEs with different levels. Now suppose that we wish to compute \mathbb E [f(X^{(l)}) - E f(X^{(l-1)}] under the Brownian motion construction above. Since both estimates are calculated under the same Brownian motion this has some impact on the variance. Specifically

(Here we assume that h_l/h_{l-1} is constant and thus are of the same order.) The cost of a calculating f(X_T^{(l)}) - f(X_T^{(l-1)}) is

Multi-Level Estimation. Notice that we can form an estimator of with

where we take independent samples with (\hat P^{(n)}_l,P^{(n)}_{l-1}) \sim (f(X_T^{(l)}),f(X_T^{(l-1)})). Here N_l is the number of samples that we perform at level l. Notice here we apply a differing number of samples to each term in the sum. Notice the estimator is an unbiased estimate of level L because

The total cost of the estimation is

The total variance of estimation is

(Here we work on the basis that there is an independent mechanism generating the samples from level l-1 to level l as discussed when simulating Brownian motion above.)

The following result calculates the optimal number of samples to perform at each level.

The cost minimization

has optimal solution

and the optimal cost is

The proof is a straight-forward convex optimization argument. It is interesting that there is a non-trival solution reached. It should be noted that without applying a multi-level approach (i.e. doing just one level) the variance of a single sample path is typically V_0 and the cost is a single path is C_L. I.e. we don’t get any benefits of coupling across levels on the variance of the estimation and we still have to pay the price of each high accuracy simulation. This leads to an overall cost of roughly

If we think of C_l being increasing multiplicatively in l and V_l as decreasing multiplicatively in l, then the benefit of using a multilevel approach is determined by

In both cases we see a multiplicative reduction in cost for the same variance estimate of the SDE. This bound can be improved by splitting into cases more careful. But, since L is roughly \log (\frac{1}{\epsilon}) we see that the above bound capture the main charateristics. We refer the reader to Giles’ original paper for more detail.


Multi-Level Monte Carlo is first presented in a very readable paper by Giles ’08. It has since found a wealth of applications. Giles maintains and excellent summary of the area on his webpage [link].

M.B. Giles. ‘Multi-level Monte Carlo path simulation’. Operations Research, 56(3):607-617, 2008.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: