## Cross Entropy Method

The Cross Entropy Method (CEM) is a generic optimization technique. It is a zero-th order method, i.e. you don’t gradients.1 So, for instance, it works well on combinatorial optimization problems, as well as reinforcement learning.

## Lyapunov Functions

Lyapunov functions are an extremely convenient device for proving that a dynamical system converges. We cover:

• The Lyapunov argument
• La Salle’s Invariance Principle
• An Alternative argument for Convex Functions
• Exponential Convergence Rates

## Kalman Filter

Kalman filtering (and filtering in general) considers the following setting: we have a sequence of states $x_t$, which evolves under random perturbations over time. Unfortunately we cannot observe $x_t$, we can only observe some noisy function of $x_t$, namely, $y_t$. Our task is to find the best estimate of $x_t$ given our observations of $y_t$. Continue reading “Kalman Filter”

## Bayesian Online Learning

We briefly describe an Online Bayesian Framework which is sometimes referred to as Assumed Density Filer (ADF). And we review a heuristic proof of its convergence in the Gaussian case.

## Stochastic Linear Regression

We consider the following formulation of Lai, Robbins and Wei (1979), and Lai and Wei (1982). Consider the following regression problem, for $n=1,2,...$ where $\epsilon_n$ are unobservable random errors and $\beta_1,...,\beta_p$ are unknown parameters.

Typically for a regression problem, it is assumed that inputs $x_{1},...,x_{n}$ are given and errors are IID random variables. However, we now want to consider a setting where we sequentially choose inputs $x_i$ and then get observations $y_i$, and errors $\epsilon_i$ are a martingale difference sequence with respect to the filtration $\mathcal F_i$ generated by $\{ x_j, y_{j-1} : j\leq i \}$.

## Robbins-Munro

We review the proof by Robbins and Munro for finding fixed points. Stochastic gradient descent, Q-learning and a bunch of other stochastic algorithms can be seen as variants of this basic algorithm. We review the basic ingredients of the original proof.

## Linear Programming

A linear program is a constrained optimization problem with a linear objective, $f(x):=c^\top x$, and linear functional constraints, $h(x):=Ax$, we could for instance write 