## Probability: a short introduction

Next semester, I will teach a short course on Probability for university students who have not taken probability before, who know some basic mathematics, but who are not necessarily going to be studying mathematic.

The notes for this are here:

## Neural Tangent Kernel

The Neural Tangent Kernel is a way of understanding the training performance of Neural Networks by relating them to Kernel methods. Here we overview the results of the paper [Jacot et al. here]. The paper considers a deep neural network with a fixed amount of data and a fixed depth. The weights applied to neurons are initially independent and normally distributed. We take a limit where the width of each layer tends to infinity.

## Inventory Control

We consider the problem where there is an amount of stock $x_t$ at time $t$. You can perform the action to order $a_t$ units of stock. Further the demand at time $t$ is $d_t$. We assume $d_t$ is independent over $t$. The change in the amount of stock follows the dynamic: ## Lai Robbins Lower-Bound

We continue the earlier post on finite-arm stochastic multi-arm bandits. The results so far have suggest that, for independent identically distributed arms, the correct size of the regret is of the order $\log T$. We now more formally prove this with the Lai and Robbins lower-bound .

## Euler-Maruyama

The Euler-Maruyama scheme is a method of approximating an stochastic differential equation. Here we investigate two forms of error the scheme: the weak error and the strong error. The aim is to later we will cover Multi-Level Monte Carlo (MLMC) and related topics.

## Stochastic Finite Arm Bandits

We discuss a canonical multi-arm bandit setting. The stochastic multi-arm bandit with finite arms and bounded rewards. We let $a=1,...,N$ index the set of arms. We let $\mathcal A=\{1,...,N\}$ be the set of arms. If you play the arm $a$ at time $t\in\mathbb N$, you receive rewards $r_t(a)$ which are independent and identically distributed in $t$. However, the distribution between arms may change.

We let $\bar r(a)$ be the mean of arm $a$. We want to find the machine with highest mean reward.

Continue reading “Stochastic Finite Arm Bandits”

## Perturbation of Markov Matrices

We consider the impact on the stationary distribution of perturbing a Markov chain’s transition matrix. This argument due to Seneta. We show if $P$ and $\tilde P$ are probability matrices of a irreducible finite state Markov chains with stationary distributions $\pi$ and $\pi$ then we can show that

Theorem. $\displaystyle || \pi - \tilde \pi ||_{TV} \leq \frac{1}{1- \tau(P)} \sum_{x} \left\| P_{x, \cdot} -\tilde P_{x, \cdot} \right\|_{TV}$

where $\tau(P)$ is the total variation between the rows of $P$.

Continue reading “Perturbation of Markov Matrices”

## Cross Entropy Method

The Cross Entropy Method (CEM) is a generic optimization technique. It is a zero-th order method, i.e. you don’t gradients.1 So, for instance, it works well on combinatorial optimization problems, as well as reinforcement learning.

Further, all these results on policy gradient are [thus far] for deterministic models. I wrote a short paper that deals with a stochastic regret bound for a policy gradient algorithm much closer to projected gradient descent, again in the bandit setting. I’ll summarize this, as it might be possible to use it to derive stochastic regret bounds for policy gradients more generally. At the end, I discuss some issues around getting some proofs to work in the stochastic case and emphasize the point that $1/q^\star$ is a good choice of lyapunov function [where $latex$q^\star\$ is the probability of not choosing the optimal action.]