Multi-Level Monte Carlo (MLMC)

Multi-Level Monte-Carlo is an Monte-carlo method for calculating numerically accurate estimates when fine grained estimates are expensive, but cheap coarse-grained estimates can be used to supplement this. We considered the simulation of stochastic differential equations, which is the application first proposed, but we note that the approach applies in a variety of other settings.

Aim. Suppose $X_t$ obeys the stochastic differential equation

$\begin{aligned} dX_t = \mu(X_t) dt + \sigma(X_t) dB_t \end{aligned}$

and the aim is to estimate

$\bar f:=\mathbb E f(X_T)$

at some time $T$ for some function $f$ . (E.g. think of estimating the expected value of a put or call option – we could consider more exotic path dependent objects too.)

Monte-Carlo. In a “vanilla” Monte-Carlo approach we would estimate with

$\begin{aligned} \tilde f_N= \frac{1}{N} \sum_{n=1}^N f(X^{(n)}_T) \end{aligned}$

where $X^{(n)}_T$ , $n=1,...,N$ , are IID samples of $X_T$ . Assuming that the variance is finite then the Central Limit Theorem will give a convergence rate to the mean of order ${1}/{\sqrt{N}}$ . Or expressed slightly differently to get a Root-Mean-Square (RMS) estimate of order $\epsilon$ , i.e.

$\begin{aligned} \mathbb V(\tilde f_N)^{\frac{1}{2}} := \mathbb E [ (\tilde f_N- \bar f)^2]^{1/2} = O(\epsilon)\end{aligned}$

we need about $N =O(\epsilon^{-2})$ samples.

Numerical and simulation error. However, the assumption here is that we can simulate $X_T$ exactly, which is not in general possible. We need to numerically simulate the SDE (which introduces numerical errors). Suppose now that $\hat X^{(n)}_T$ , $n=1,...,N$ are now estimates of $X_T$ (which may not have the same distribution as $X_T$ and, also, may not be independent). In that case, the usual bias-variance decomposition is

$\begin{aligned} \mathbb E [ (\tilde f_N - \bar f)^2] = \mathbb V(\tilde f_N) + (\mathbb E [\tilde f_N ]- \bar f )^2 \,.\end{aligned}$

As before the variance goes down at $\frac{1}{N}$ under Monte-Carlo simulation but we also need to care about the bias, specifically, we also need to control how good is our approximation is to $\bar f$ .

Numerical Approximation of SDEs. The simplest scheme for simulating an SDE is the Euler-Maruyama scheme: Here $\begin{aligned} \hat X_{t_{n+1}} = \hat X_{t_n} + \mu (\hat X_{t_n} ) h + \sigma (\hat X_{t_n}) \Delta W_{n} \tag{Euler-Maruyama}\end{aligned}$ where $t_{n+1}-t_n=h$ and $\Delta W_n \sim N(0,h)$ . This can be improved by the following scheme due to Milstein: $\begin{aligned} \hat X_{t_{n+1}} = \hat X_{t_n} + \mu (\hat X_{t_n} ) h + \sigma (\hat X_{t_n}) \Delta W_{n} +\frac{1}{2} \sigma'(\hat X_{t_n}) \sigma(\hat X_{t_n}) [ (\Delta W_{n})^2 - h ] \tag{Milstein}\end{aligned}$ Is can be shown that the error of the Euler-Maruyama scheme can be found to be

$\begin{aligned} \mathbb E [ f(\hat X_{T})] - \mathbb E[ f(X_T)] = O(h) \qquad \text{and} \qquad \mathbb E \Big[ \sup_{t\in [0,T]} (\hat X_t - X_t)^2 \Big]^{\frac{1}{2}}= O(h^{\frac{1}{2}})\end{aligned}$

while for Milstein’s method

$\begin{aligned} \mathbb E [ f(\hat X_{T})] - \mathbb E[ f(X_T)] = O(h) \qquad \text{and} \qquad \mathbb E \Big[ \sup_{t\in [0,T]} (\hat X_t - X_t)^2 \Big]^{\frac{1}{2}}= O(h) \,.\end{aligned}$

Bias and Variance again. We can now see that the bias and variance from above is

$\begin{aligned} \mathbb E [ (\tilde f_N - \bar f)^2] = \mathbb V(\tilde f_N) + (\mathbb E \tilde f_N - \bar f )^2 = O\left( \frac{1}{N} + h^2 \right)\end{aligned}$

Thus to get this error to be less than $\epsilon^2$ [so that RMS is less than $\epsilon$ ] we require $N= O(\epsilon^2)$ samples with time steps of size $h=\epsilon$ . Thus the computational cost is $h N = O (\epsilon^{-2})$ .

The aim of Multi-Level Monte-Carlo (MLMC) is to reduce the computational cost required to get the RMS below $\epsilon$ . While there are methods for reducing the number of samples $N$ [such as quasi-monte carlo], MLMC aims to reduce the cost of generating paths by performing coarse grained simulations combined with a small number of fine grained simulations.

Simulating Brownian Motion at different levels. The easiest way to approximate a brownian motion is to take normal distributions $Z^{(l)}_1,...,Z^{(l)}_N\sim \mathcal N(0,h:=2^{-l})$ and then take

$\begin{aligned} B^{(l)}_t = \sum_{i: 2^{-l} i \leq t} Z^{(l)}_i \, .\end{aligned}$

Here for “level” $l$ we can use this to construct a Brownian motion we an approximation of order $h=2^{-l}$ . Notice that since $Z^{(l)}_{2i}+Z^{(l)}_{2i+1}\sim N(0,2h:=2^{-l+1})$ also get for free a Brownian motion at level $l-1$ , by taking

$\begin{aligned} B^{(l-1)}_t = \sum_{i: 2^{-l+1} i \leq t} Z^{(l)}_{2i}+Z^{(l)}_{2i+1} \, .\end{aligned}$

We can also go the other way from course grained to fine grained, since if $B_0=0$ and $B_h=b$ then $B_{h/2} = b/2 + \mathcal N(0,1/4)$ . Although the former method is perhaps simpler than this latter approach.

Numerical Approximation of SDEs with different levels. Now suppose that we wish to compute $\mathbb E [f(X^{(l)}) - E f(X^{(l-1)}]$ under the Brownian motion construction above. Since both estimates are calculated under the same Brownian motion this has some impact on the variance. Specifically

$\begin{aligned} V_l= \mathbb V [ f(X_T^{(l)}) - f(X_T^{(l-1)})] &= \mathbb V [ f(X_T^{(l)}) -f(X_T) - f(X_T^{(l-1)}) + f(X_T)] \\ & \leq \left( \mathbb V [f(X_T^{(l)}) -f(X_T) ]^2 + \mathbb V [ f(X_T^{(l-1)}) -f(X_T) ]^2 \right)^{\frac{1}{2}} \\ & = \begin{cases} O(h^{(l)}) & \text{Euler-Maruyama}\\ O((h^{(l)})^2) & \text{Milstein} \end{cases}\end{aligned}$

(Here we assume that $h_l/h_{l-1}$ is constant and thus are of the same order.) The cost of a calculating $f(X_T^{(l)}) - f(X_T^{(l-1)})$ is

$\begin{aligned} C_l = O(h_l)\end{aligned}$

Multi-Level Estimation. Notice that we can form an estimator of with

$\begin{aligned} \frac{\hat P^{(n)}_0}{N_0}+\sum_{l=1}^L \frac{1}{N_l} \sum_{n=1}^{N_l} \left( \hat P_l^{(n)} - \hat P^{(n)}_{l-1} \right)\end{aligned}$

where we take independent samples with $(\hat P^{(n)}_l,P^{(n)}_{l-1}) \sim (f(X_T^{(l)}),f(X_T^{(l-1)}))$ . Here $N_l$ is the number of samples that we perform at level $l$ . Notice here we apply a differing number of samples to each term in the sum. Notice the estimator is an unbiased estimate of level $L$ because

$\begin{aligned} \mathbb E [\hat P_L] =\mathbb E [\hat P_0] + \sum_{l=1}^L \mathbb E [ \hat P_l - \hat P_{l-1} ] \, . \end{aligned}$

The total cost of the estimation is

$\begin{aligned} C = \sum_{l=0}^L N_l C_l \, .\end{aligned}$

The total variance of estimation is

$\begin{aligned} V = \sum_{l=0}^L \frac{V_l}{N_l}\end{aligned}$

(Here we work on the basis that there is an independent mechanism generating the samples from level $l-1$ to level $l$ as discussed when simulating Brownian motion above.)

The following result calculates the optimal number of samples to perform at each level.

The cost minimization

$\begin{aligned} & \text{minimize} \quad \sum_{l=0}^L N_l C_l \\ & \text{subject to} \quad \sum_{l=0}^L \frac{V_l}{N_l} \leq \epsilon^2 & \text{over} \quad N_l \geq 0, \;\; l=0,...,L,\end{aligned}$

has optimal solution

$\begin{aligned} N^\star_l = \frac{1}{\epsilon^2} \sqrt{\frac{V_l}{C_l}} \left( \sum_{l'=0}^L \sqrt{V_{l'}C_{l'}} \right)\end{aligned}$

and the optimal cost is

$\begin{aligned} C^\star = \frac{1}{\epsilon^2} \left( \sum_{l'=0}^L \sqrt{V_{l'}C_{l'}} \right)^2\end{aligned}$

The proof is a straight-forward convex optimization argument. It is interesting that there is a non-trival solution reached. It should be noted that without applying a multi-level approach (i.e. doing just one level) the variance of a single sample path is typically $V_0$ and the cost is a single path is $C_L$ . I.e. we don’t get any benefits of coupling across levels on the variance of the estimation and we still have to pay the price of each high accuracy simulation. This leads to an overall cost of roughly

$\tilde C:=C_L V_0 / \epsilon^2$

If we think of $C_l$ being increasing multiplicatively in $l$ and $V_l$ as decreasing multiplicatively in $l$ , then the benefit of using a multilevel approach is determined by

$\begin{aligned} \frac{C^\star }{\tilde C} = \left( \sum_{l'=0}^L \sqrt{V_{l'}C_{l'}} \right)^2 / C_L V_0 \leq \begin{cases} \sqrt{L} \frac{V_0}{C_L} &\text{if } \sqrt{V_{l'}C_{l'}} \text{ is decreasing} \\ \sqrt{L} \frac{V_L}{C_0} &\text{if } \sqrt{V_{l'}C_{l'}} \text{ is increasing} \end{cases}\end{aligned}$

In both cases we see a multiplicative reduction in cost for the same variance estimate of the SDE. This bound can be improved by splitting into cases more careful. But, since $L$ is roughly $\log (\frac{1}{\epsilon})$ we see that the above bound capture the main charateristics. We refer the reader to Giles’ original paper for more detail.

References.

Multi-Level Monte Carlo is first presented in a very readable paper by Giles ’08. It has since found a wealth of applications. Giles maintains and excellent summary of the area on his webpage [link].

M.B. Giles. ‘Multi-level Monte Carlo path simulation’. Operations Research, 56(3):607-617, 2008.

References.

Share this:

Leave a comment Cancel reply