## Stochastic Control Notes Update

I’ve updated the stochastic control notes here:

## Stochastic_Control_2020_May.pdf

These still remain a work in progress.

(Typos/errors/mishaps found are welcome)

A quick summary of what is new:

• I’ve updated a section on the ODE method for stochastic approximation.
• I’ve improved the discussion around Temporal Difference methods and included some proofs.
• I’ve added a proof of convergence for SARSA.
• I’ve added a section on Lyapunov functions in continuous time
• La Salle, exponential convergence, and online convex optimization…
• I’ve started a section on Policy Gradients but there is more recent proofs to include
• I’ve started a section on Deep Learning for RL.

## Kalman Filter

Kalman filtering (and filtering in general) considers the following setting: we have a sequence of states $x_t$, which evolves under random perturbations over time. Unfortunately we cannot observe $x_t$, we can only observe some noisy function of $x_t$, namely, $y_t$. Our task is to find the best estimate of $x_t$ given our observations of $y_t$. Continue reading “Kalman Filter”

## Temporal Difference Learning – Linear Function Approximation

For a Markov chain $\hat{x} = (\hat x_t : t\in\mathbb Z_+)$, consider the reward function

associated with rewards given by $r = (r(x) : x\in\mathcal X)$. We approximate the reward function $R(x)$ with a linear approximation,

## Stochastic Linear Regression

We consider the following formulation of Lai, Robbins and Wei (1979), and Lai and Wei (1982). Consider the following regression problem,

for $n=1,2,...$ where $\epsilon_n$ are unobservable random errors and $\beta_1,...,\beta_p$ are unknown parameters.

Typically for a regression problem, it is assumed that inputs $x_{1},...,x_{n}$ are given and errors are IID random variables. However, we now want to consider a setting where we sequentially choose inputs $x_i$ and then get observations $y_i$, and errors $\epsilon_i$ are a martingale difference sequence with respect to the filtration $\mathcal F_i$ generated by $\{ x_j, y_{j-1} : j\leq i \}$.