Control – Applied Probability Notes

Cross Entropy Method

The Cross Entropy Method (CEM) is a generic optimization technique. It is a zero-th order method, i.e. you don’t gradients.¹ So, for instance, it works well on combinatorial optimization problems, as well as reinforcement learning.

Stochastic Control Notes Update

I’ve updated the stochastic control notes here:

Stochastic_Control_2020_May.pdf

These still remain a work in progress.

(Typos/errors/mishaps found are welcome)

A quick summary of what is new:

I’ve updated a section on the ODE method for stochastic approximation.
I’ve improved the discussion around Temporal Difference methods and included some proofs.
I’ve added a proof of convergence for SARSA.
I’ve added a section on Lyapunov functions in continuous time
- La Salle, exponential convergence, and online convex optimization…
I’ve started a section on Policy Gradients but there is more recent proofs to include
I’ve started a section on Deep Learning for RL.

Lyapunov Functions

Lyapunov functions are an extremely convenient device for proving that a dynamical system converges. We cover:

The Lyapunov argument
La Salle’s Invariance Principle
An Alternative argument for Convex Functions
Exponential Convergence Rates

Continue reading “Lyapunov Functions”

Kalman Filter

Kalman filtering (and filtering in general) considers the following setting: we have a sequence of states $x_t$ , which evolves under random perturbations over time. Unfortunately we cannot observe $x_t$ , we can only observe some noisy function of $x_t$ , namely, $y_t$ . Our task is to find the best estimate of $x_t$ given our observations of $y_t$ . Continue reading “Kalman Filter”

Temporal Difference Learning – Linear Function Approximation

For a Markov chain $\hat{x} = (\hat x_t : t\in\mathbb Z_+)$ , consider the reward function

$\label{TD:Reward} R(x) := \mathbb E_{x} \left[ \sum_{t=0}^\infty r(\hat x_t) \right]$

associated with rewards given by $r = (r(x) : x\in\mathcal X)$ . We approximate the reward function $R(x)$ with a linear approximation,

$R(x;\bm w) = \bm w^\top \bm \phi (x) = \sum_{j\in \mathcal J} w_j \phi_j(x) .$

Continue reading “Temporal Difference Learning – Linear Function Approximation”

Stochastic Linear Regression

We consider the following formulation of Lai, Robbins and Wei (1979), and Lai and Wei (1982). Consider the following regression problem,

$y_n = \beta_1 x_{n1} + ... + \beta_p x_{np} + \epsilon_n$

for $n=1,2,...$ where $\epsilon_n$ are unobservable random errors and $\beta_1,...,\beta_p$ are unknown parameters.

Typically for a regression problem, it is assumed that inputs $x_{1},...,x_{n}$ are given and errors are IID random variables. However, we now want to consider a setting where we sequentially choose inputs $x_i$ and then get observations $y_i$ , and errors $\epsilon_i$ are a martingale difference sequence with respect to the filtration $\mathcal F_i$ generated by $\{ x_j, y_{j-1} : j\leq i \}$ .