Optimal Stopping

An Optimal Stopping Problem is an Markov Decision Process where there are two actions: $a=0$ meaning to stop, and $a=1$ meaning to continue. Here there are two types of costs

This defines a stopping problem.

Assuming that time is finite, the Bellman equation is

for $s\in \mathbb N$ and $C_0(x)=k(x)$.

Def [OLSA rule] In the one step lookahead (OSLA) rule we stop when ever $x\in S$ where

We call $S$ the stopping set. In words, you stop whenever it is better stop now rather than continue one step further and then stop.

Def [Closed Stopping Set] We say the set $S\subset {\mathcal X}$ is closed, it once inside that said you cannot leave, i.e.

Prop 1. If, for the finite time stopping problem, the set $S$ given by the one step lookahead rule is closed then the one step lookahead rule is an optimal policy.

Proof. Given the set $S$ is closed, we argue that if $C_{s-1}(x)=k(x)$ for $x\in S$ then $C_s(x)=k(x)$:If $x\in S$ then since $S$ is closed $\hat{X} \in S$. In otherwords $C_{s-1}(\hat{X}) =k(\hat{X})$. Therefore, in this case, Bellman’s equation becomes

The last inequality above follows by the definition of $x\in S$.

We now proceed by induction. The OSLA rule is optimal for $s=1$ steps, since OSLA is exactly the optimal policy for one step.

Suppose that the result is holds for upto $s-1$ steps. Now consider the Optimal Stopping Problem with $s$ steps. If $x\in S$ then $C_s(x)=k(x)$. So it is better to stop. If $x\notin S$, then clearly it’s better to continue. $\square$

Ex. [The Secretary Problem].  There are $N$ candidates for a secretary job. You interview candidates sequentially. After each interview, you must either accept or reject the candidate. We assume each candidate has the rank: $1,2,...,N$ And arrive for interview uniformly at random. Find the policy that maximises the probability that you hire the best candidate.

Ans. All that matters at each time is if the current candidate is the best so far. At time $t$ let

Since is uniform random where the best candidate is

Under the chosen policy, we let

be our reward function. Now

Thus the Bellman equation for the above problem is

Notice that $R_t \geq R_{t+1}$. Let $t^*$ be the smallest $t$ such that $R_{t^*} \geq (t^*-1)/N$ . Starting from $R_{N+1}=0$ note that so long as $latex R_{t+1}<\frac{t}{N}$ holds in second case in the above expression, we have that

Thus

Thus our condition for the optimal $t^*$ is to take the smallest $t^*$ such that

In other words, the optimal policy is to interview the first $t^*$ candidates and then accept the next best candidate.

Ex [The Secretary Problem, continued] Argue that as $N\rightarrow \infty$, the optimal policy is to interview $e^{-1}$ of the candidates and then to accept the next best candidate.

Ans. From [[OS:Secretary]], the optimal condition is

We know that as $N\rightarrow \infty$

Thus $\log N - \log t^* \geq 1$ for $t^*/N =e^{-1}$.

Optimal stopping in infinite time

We now give conditions for the one step look ahead rule to be optimal for infinite time stopping problems.

Prop. If the following two conditions hold

• $K = \max_{x} k(x) < \infty$
• $C= \min_{x} c(x) > 0$

Proof. Suppose that the optimal policy $\pi$ stops at time $\tau$ then

Therefore if we follow optimal policy $\pi$ but for the $s$ time horizon problem and stop at $s$ if $\tau \geq s$ then

Thus $L_s(x) \rightarrow L(x)$.

As before (for the finite time problem), it is no optimal to stop if $x\notin S$ and for the finite time problem $L_s(x) = k(x)$ for all $x\in S$. Therefore, since $L_s(x) \rightarrow L(x)$, we have that $L(x)=k(x)$ for all $x\in S$ and there for it is optimal to stop for $x\in S$. $\square$

The one step lookahead rule is not always the correct solution to an optimal stopping problem.

Def. [Concave Majorant] For a function $r:\{0,...,N\}\rightarrow \mathbb R_+$ a concave majorant is a function $G$ such that

• $G(x)\geq \frac{1}{2} G(x-1) + \frac{1}{2} G(x+1)$
• $G(x) \geq r(x)$.

Prop 3 [Stopping a Random Walk] Let $X_t$ be a symmetric random walk on $\{0,...,N\}$ where the process is automatically stopped at $0$ and $N$. For each $x\in \{ 0,...,N\}$, there is a positive reward of $r(x)$ for stopping. We are asked to maximize where $T$ is our chosen stopping time. The optimal value function $V(x)$ is the minimal concave majorant, and that it is optimal to stop whenever $V(x)=r(x)$.

Proof. The Bellman equation is

with $R(x)=r(0)$ and $R(N)=r(N)$. Thus the optimal value function is a concave majorant.

We will show that the optimal policy is the minimal concave majorant of $r(x)$. We do so by, essentially applying induction on value iteration. First $R_0(x)=0 \leq G(x)$ for any concave majorant of $r(x)$. Now suppose that $R_{s-1}$, the function reached after $s-1$ value iterations, satisfies $R_{s-1}(x)\leq G(x)$ for all $x$, then

Since value iteration converges $R_s(x) \nearrow V(x)$, where $V(x)$ satisfies $V(x) \leq G(x)$, as required.

Finally observe that from the Bellman equation the optimal stopping rule is to stop whenever $V(x)=r(x)$ for the minimal concave majorant. $\square$