Applied Probability Notes

Probability: a short introduction

Next semester, I will teach a short course on Probability for university students who have not taken probability before, who know some basic mathematics, but who are not necessarily going to be studying mathematic.

The notes for this are here:

Probability_notes Download

Sequential Monte Carlo (SMC)

Sequential Monte-Carlo is a general method of sampling from a sequence of probability distributions $\hat \eta_1,...,\hat \eta_t$ .

Multi-Level Monte Carlo (MLMC)

Multi-Level Monte-Carlo is an Monte-carlo method for calculating numerically accurate estimates when fine grained estimates are expensive, but cheap coarse-grained estimates can be used to supplement this. We considered the simulation of stochastic differential equations, which is the application first proposed, but we note that the approach applies in a variety of other settings.

Markov Chain Monte Carlo (MCMC)

Markov chain Monte Carlo is a variant of the Monte Carlo, where samples are no longer independent but instead are sampled from a Markov chain. This can be useful in Bayesian statistics, or when we sequentially adjust a small number of parameters for a more complex combined distribution.

We cover MCMC, its use in Bayesian statistics, Metropolis-Hastings, Gibbs Sampling, and Simulated Annealing.

Monte-Carlo (MC)

Due to some projects, I’ve decided to start a set of posts on Monte-Carlo and its variants. These include Monte-Carlo (MC), Markov chain Monte-Carlo (MCMC), Sequential Monte-Carlo (SMC) and Multi-Level Monte-Carlo (MLMC). I’ll probably expand these posts further at a later point.

Here we cover “vanilla” Monte-Carlo, importance sampling and self-normalized importance sampling:

Slides

Here are slides from this year’s stochastic control course:

1_dynamic_programming

2_markov_chains

3_markov-decision-processes

4_infinite-time-horizon

5_algorithms-for-mdps

6_optimal-stopping

7_lqr__kalman

8_continuous-time-dynamic-programming

9_stochastic-integration-

10_diffusion-control

11_merton-portfolio-optimization

Neural Tangent Kernel

The Neural Tangent Kernel is a way of understanding the training performance of Neural Networks by relating them to Kernel methods. Here we overview the results of the paper [Jacot et al. here]. The paper considers a deep neural network with a fixed amount of data and a fixed depth. The weights applied to neurons are initially independent and normally distributed. We take a limit where the width of each layer tends to infinity.

Continue reading “Neural Tangent Kernel”

Stochastic Control Notes Update – 2021

I’ve updated the notes for this year’s stochastic control course, here:

Stochastic_Control_2021_Jan.pdfDownload

Asside from general tidying. New material includes:

Equilibrium distributions of Markov chains
Occupancy measure of infinite time horizon MDPs
Linear programming as an algorithm for solving MDPs
Convergence of Asynchronous Value Iteration
(s,S) Inventory Control
POMDP (though there is still more to add)
Calculus of Variations (though there is still more to add)
Pontyagin’s Maximum Prinicple
Linear-Quadratic Lyapunov functions (Sylvester’s equation and Hurwitz matrices)
(some) Online Convex Optimization
Stochastic Bandits (UCB and Lai-Robbins Lower bound)
Gittins’ Index Theorem.
Sequential/Stochastic Linear Regression (Lai and Wei)
More discussion on TD methods
Discussion on double Q-learning and Dueling/Advantage updating
Convergence proof for SARSA
Policy Gradients (some convergence arguments from Bhanhari and Russo, but still more to do)
Cross Entropy Method (but still more to do)
Several new appendices (but mostly from old notes)

Like last year, I will likely update the notes further (and correct typos) towards the end of the course.

Inventory Control

We consider the problem where there is an amount of stock $x_t$ at time $t$ . You can perform the action to order $a_t$ units of stock. Further the demand at time $t$ is $d_t$ . We assume $d_t$ is independent over $t$ . The change in the amount of stock follows the dynamic:

$\begin{aligned} x_{t+1} = x_t + a_t -d_t \,.\end{aligned}$

Continue reading “Inventory Control”

Lai Robbins Lower-Bound

We continue the earlier post on finite-arm stochastic multi-arm bandits. The results so far have suggest that, for independent identically distributed arms, the correct size of the regret is of the order $\log T$ . We now more formally prove this with the Lai and Robbins lower-bound .