Probability – Page 2 – Applied Probability Notes

A. Mathematical Notation

This is the appendix in the notes here.

Continue reading “A. Mathematical Notation”

The exponential family of distributions are a particularly tractable, yet broad, class of probability distributions. They are tractable because of a particularly nice [Fenchel] duality relationship between natural parameters and moment parameters. Moment parameters can be estimated by taking the empirical mean of sufficient statistics and the duality relationship can then recover an estimate of the distributions natural parameters.

Kalman Filter

Kalman filtering (and filtering in general) considers the following setting: we have a sequence of states $x_t$ , which evolves under random perturbations over time. Unfortunately we cannot observe $x_t$ , we can only observe some noisy function of $x_t$ , namely, $y_t$ . Our task is to find the best estimate of $x_t$ given our observations of $y_t$ . Continue reading “Kalman Filter”

Mixing Times

In loose terms, the mixing time is the amount of time to wait before you can expect a Markov chain to be close to its stationary distribution. We give an upper bound for this.

Continue reading “Mixing Times”

Bayesian Online Learning

We briefly describe an Online Bayesian Framework which is sometimes referred to as Assumed Density Filer (ADF). And we review a heuristic proof of its convergence in the Gaussian case.

Continue reading “Bayesian Online Learning”

Stochastic Linear Regression

We consider the following formulation of Lai, Robbins and Wei (1979), and Lai and Wei (1982). Consider the following regression problem,

$y_n = \beta_1 x_{n1} + ... + \beta_p x_{np} + \epsilon_n$

for $n=1,2,...$ where $\epsilon_n$ are unobservable random errors and $\beta_1,...,\beta_p$ are unknown parameters.

Typically for a regression problem, it is assumed that inputs $x_{1},...,x_{n}$ are given and errors are IID random variables. However, we now want to consider a setting where we sequentially choose inputs $x_i$ and then get observations $y_i$ , and errors $\epsilon_i$ are a martingale difference sequence with respect to the filtration $\mathcal F_i$ generated by $\{ x_j, y_{j-1} : j\leq i \}$ .

Continue reading “Stochastic Linear Regression”

Markov’s Inequality, Chebychev’s Inequality and the Weak Law of Large Numbers

Robbins-Munro

We review the proof by Robbins and Munro for finding fixed points. Stochastic gradient descent, Q-learning and a bunch of other stochastic algorithms can be seen as variants of this basic algorithm. We review the basic ingredients of the original proof.

Continue reading “Robbins-Munro”