## Sequential Monte Carlo (SMC)

Sequential Monte-Carlo is a general method of sampling from a sequence of probability distributions $\hat \eta_1,...,\hat \eta_t$.

Continue reading “Sequential Monte Carlo (SMC)”

## Multi-Level Monte Carlo (MLMC)

Multi-Level Monte-Carlo is an Monte-carlo method for calculating numerically accurate estimates when fine grained estimates are expensive, but cheap coarse-grained estimates can be used to supplement this. We considered the simulation of stochastic differential equations, which is the application first proposed, but we note that the approach applies in a variety of other settings.

Continue reading “Multi-Level Monte Carlo (MLMC)”

## Markov Chain Monte Carlo (MCMC)

Markov chain Monte Carlo is a variant of the Monte Carlo, where samples are no longer independent but instead are sampled from a Markov chain. This can be useful in Bayesian statistics, or when we sequentially adjust a small number of parameters for a more complex combined distribution.

We cover MCMC, its use in Bayesian statistics, Metropolis-Hastings, Gibbs Sampling, and Simulated Annealing.

Continue reading “Markov Chain Monte Carlo (MCMC)”

## Monte-Carlo (MC)

Due to some projects, I’ve decided to start a set of posts on Monte-Carlo and its variants. These include Monte-Carlo (MC), Markov chain Monte-Carlo (MCMC), Sequential Monte-Carlo (SMC) and Multi-Level Monte-Carlo (MLMC). I’ll probably expand these posts further at a later point.

Here we cover “vanilla” Monte-Carlo, importance sampling and self-normalized importance sampling:

## Neural Tangent Kernel

The Neural Tangent Kernel is a way of understanding the training performance of Neural Networks by relating them to Kernel methods. Here we overview the results of the paper [Jacot et al. here]. The paper considers a deep neural network with a fixed amount of data and a fixed depth. The weights applied to neurons are initially independent and normally distributed. We take a limit where the width of each layer tends to infinity.

## Stochastic Control Notes Update – 2021

I’ve updated the notes for this year’s stochastic control course, here:

Asside from general tidying. New material includes:

• Equilibrium distributions of Markov chains
• Occupancy measure of infinite time horizon MDPs
• Linear programming as an algorithm for solving MDPs
• Convergence of Asynchronous Value Iteration
• (s,S) Inventory Control
• POMDP (though there is still more to add)
• Calculus of Variations (though there is still more to add)
• Pontyagin’s Maximum Prinicple
• Linear-Quadratic Lyapunov functions (Sylvester’s equation and Hurwitz matrices)
• (some) Online Convex Optimization
• Stochastic Bandits (UCB and Lai-Robbins Lower bound)
• Gittins’ Index Theorem.
• Sequential/Stochastic Linear Regression (Lai and Wei)
• More discussion on TD methods
• Discussion on double Q-learning and Dueling/Advantage updating
• Convergence proof for SARSA
• Policy Gradients (some convergence arguments from Bhanhari and Russo, but still more to do)
• Cross Entropy Method (but still more to do)
• Several new appendices (but mostly from old notes)

Like last year, I will likely update the notes further (and correct typos) towards the end of the course.

## Inventory Control

We consider the problem where there is an amount of stock $x_t$ at time $t$. You can perform the action to order $a_t$ units of stock. Further the demand at time $t$ is $d_t$. We assume $d_t$ is independent over $t$. The change in the amount of stock follows the dynamic: ## Lai Robbins Lower-Bound

We continue the earlier post on finite-arm stochastic multi-arm bandits. The results so far have suggest that, for independent identically distributed arms, the correct size of the regret is of the order $\log T$. We now more formally prove this with the Lai and Robbins lower-bound .