Zero-Order Stochastic Optimization: Keifer-Wolfowitz

We want to optimize the expected value of some random function. This is the problem we solved with Stochastic Gradient Descent. However, we assume that we no longer have access to unbiased estimate of the gradient. We only can obtain estimates of the function itself. In this case we can apply the Kiefer-Wolfowitz procedure.

The idea here is to replace the random gradient estimate used in stochastic gradient descent with a finite difference. If the increments used for these finite differences are sufficiently small, then over time convergence can be achieved. The approximation error for the finite difference has some impact on the rate of convergence.

Discrete Probability Distributions

There are some probability distributions that occur frequently. This is because they either have a particularly natural or simple construction. Or they arise as the limit of some simpler distribution. Here we cover

• Bernoulli random variables
• Binomial distribution
• Geometric distribution
• Poisson distribution.

(This is a section in the notes here.)

Counting Principles

(This is a section in the notes here.)

Counting in Probability. If each outcome is equally likely, i.e. $\mathbb P( \omega ) = p$ for all $\omega \in \Omega$, then since

(where $|\Omega|$ is the number of outcomes in the set $\Omega$ ) it must be that