Max-Weight Scheduling

We define a discrete-time queueing network where there are restrictions on which queues can be served simultaneously. We give a policy for serving queues which is stable whenever it is possible to stabilize the queueing network.

Let ${\mathcal J}$ index the set of queues. We let ${\mathcal S}$ be the set of schedules. Each $\sigma=(\sigma_j:j\in{\mathcal J})\in{\mathcal S}$ is a $0-1$ vector where $\sigma_j=1$ indicates that queue $j$ will be served under schedule $\sigma$ and $\sigma_j=0$ indicates that the queue will not be served by schedule $\sigma$ . We let $< {\mathcal S} >$ be the convex combination of schedules in ${\mathcal S}$ .
We assume that we can always choose not to serve a queue. That is we can always set a component zero: $\sigma\in{\mathcal S}\implies(\sigma_{-j},0)\in{\mathcal S},\; \forall j\in{\mathcal J}$ . This avoids the issue of accidentally serving an empty queue and ensures $<{\mathcal S} >^\circ \neq \emptyset$ .
We let $a(t)=(a_j(t):j\in{\mathcal J})\in{\mathbb Z}_+^{\mathcal J}$ be the number of arrivals occurring at each queue at time $t$ . Also, we define $\bar{a}(t)={\mathbb E}[a(t)|a(t-1),...,a(1)]$ .
We let $Q(t)=(Q_j(t):j\in{\mathcal J})$ be the size of each queue at time $t$ , and use the shorthand $Q^{\Sigma}(t)=\sum_{j\in{\mathcal J}} Q_j(t)$ .
We define the MaxWeight scheduling policy to be the policy that solves the optimization
$\label{MW:Opt} \sigma(t)\in\argmax_{s\in{\mathcal S}} \sum_{j\in{\mathcal J}} s_j Q_j(t-1)$
We could define the above optimization as $\max_{s\in<{\mathcal S}>} \sigma\cdot Q(t-1)$ , i.e. maximizing over convex set $<{\mathcal S}>$ instead of the discrete set ${\mathcal S}$ . This is because any linear program has a solution at an extreme point which must, for these problems, be a subset of the set ${\mathcal S}$ .
We assume our network is a single-hop network: once a job is served at its queue it leaves the network.

The following result states that provided the expected arrivals $\bar{a}(t)$ are within the interior of the set of schedules $< {\mathcal S} >^\circ$ then we can expect the long run queue size to be bounded.

Theorem: If there exists $\epsilon >0$ such that for all $t\in{\mathbb Z}_+$ , $\bar{a}(t)+\epsilon\mathbf{1} \in < {\mathcal S} >^\circ$ and $\sup_{t\in{\mathbb Z}_+}{\mathbb E} ||a(t)-\bar{a}(t)||^2<\infty$ then ¹ there exists constants $c$ and $\hat{\epsilon}$ such that

$\limsup_{T\rightarrow\infty}\bE\left[ \frac{1}{T}\sum_{t=0}^{T-1} Q^\Sigma(t) \right] \leq \frac{c}{2\hat{\epsilon}}.\label{MW:ineq}$

Proof: Note the crux of this argument is that for all queue sizes $q$ , $q\cdot a$ is by some margin strictly less than the MaxWeight choice $\max_{\sigma\in <{\mathcal S}>} q\cdot \sigma$ . Given $\epsilon$ above, we aim to use $\hat{\epsilon}$ , the biggest such that $\bar{a}(t)+\hat{\epsilon}\mathbf{1} \in < {\mathcal S} >$ for all time. This can be defined as the biggest $\hat{\epsilon}$ such that $\hat{\epsilon}\leq \epsilon(t):= \max_{\sigma\in<{\mathcal S}>} \min_{j\in{\mathcal J}} (\sigma_j - \bar{a}_j(t))$ for all $t\in{\mathbb Z}_+$ i.e. so the distance of the smallest component to the boundary is maximized. Given this optimization description we observe the following

From this observation, we see one way that we can use $\hat{\epsilon}$ to bound the gap between $q\cdot \bar{a}(t)$ and $\max_{q\in<{\mathcal S}>} q\cdot \sigma$ .

$\begin{aligned} \bE ||Q(t+1)||^2 - \bE ||Q(t)||^2 & = 2 \bE\left[ Q(t)\cdot (a(t+1)-\sigma(t+1)) \right] +\bE || a(t+1)-\sigma(t+1)||^2\\ % &= 2\bE\left[ Q^\Sigma(t) \frac{Q(t)}{Q^\Sigma(t)}\cdot (\bar{a}(t+1)-\sigma(t+1))\right] + 2\bE\left[Q(t)\cdot (a(t+1)-\bar{a}(t+1))\right] + c\\ % %& = 2 \frac{Q(t)}{Q^\Sigma(t)}\cdot (\bar{a}(t+1)-\sigma(t+1)) Q^\Sigma(t)\\ % & \leq - 2\bE \left[ Q^\Sigma(t) \right]\min_{q: \sum_{j} q_j=1} \max_{\sigma\in <{\mathcal S}>} q\cdot (\sigma-\bar{a}(t+1)) + c\\ % &\leq - 2 \bE \left[ Q^\Sigma(t) \right] \max_{\sigma\in <{\mathcal S}>} \min_{q: \sum_{j} q_j=1} q\cdot (\sigma-\bar{a}(t+1)) + c\\ % &\leq - 2\hat{\epsilon} \bE \left[ Q^\Sigma(t) \right] + c.\end{aligned}$

Summing the above expression from $t=0,...,T-1$ , we gain the expression

$\begin{aligned} - \bE ||Q(0)||^2 \leq \bE ||Q(T)||^2 - \bE ||Q(0)||^2 \leq -2 \hat{\epsilon} \bE \left[ \sum_{t=0}^{T-1} Q^\Sigma(t) \right] + cT.\end{aligned}$

Rearranging, dividing by $T$ , and letting $T\rightarrow\infty$ , we obtain the desired expression. $\square$

The above schedule, result, and proof is a special case of the Blackwellâs Approachability Theorem. Here wish to approach the set ${\mathbb R}_{-}^{\mathcal J}$ and our Blackwell condition [Theorem [Blackwell:Blackwell].2] states that for every $\bar{a}$ there exist a $\sigma\in{\mathcal S}$ such that $\bar{a}\leq \sigma$ component-wise.

We now study $Q(t)$ as a Markov chain. To do this, we assume $\{a(t)\}_{t=1}^\infty$ are independent identically distributed random variables with mean $\bar{a}$ . The result below proves that this Markov chain is transient whenever $\bar{a}\notin <{\mathcal S} >$ . Thus $<{\mathcal S} >^\circ$ is the largest open region for which our Markov chain can be positive recurrent. We show that this Markov chain is positive recurrent for $\bar{a}\in <{\mathcal S}>^\circ$ .

As just described, if it is possible to find a policy to stabilize the queueing network then the MaxWeight scheduler will also stabilize the network. This property is sometimes called maximum stable.

Proposition: Given $\{a(t)\}_{t=1}^\infty$ are iid random variables with mean $\bar{a}$ and finite variance.

If $\bar{a}\in <{\mathcal S} >^\circ$ then $\{Q(t)\}_{t=0}^\infty$ is a positive recurrent Markov chain.

If $\bar{a}\notin <{\mathcal S} >$ then $\{Q(t)\}_{t=0}^\infty$ is a transient Markov chain.

Proof: If $\{a(t)\}_{t=1}^\infty$ are iid then $Q(t+1)$ is a function of $Q(t)$ and independent random variable $a(t)$ and so $\{Q(t)\}_{t=0}^\infty$ defines a discrete-time Markov chain. First suppose $\bar{a}\in <{\mathcal S} >^\circ$ . The above Theorem holds. If the Markov chain $\{Q(t)\}_{t=0}^\infty$ were not positive recurrent then $\lim_{T\rightarrow\infty}\frac{1}{T} \sum_{t=0}^{T-1} {\mathbb I}[Q(t)=q]=0$ for all $q\in{\mathbb Z}_+^{\mathcal J}$ . In particular, this would hold for all $q$ such that $q^\Sigma \leq c/2\epsilon$ which would then imply would not hold. Thus by this contradiction, this Markov chain is positive recurrent.

Now suppose $\bar{a}\notin <{\mathcal S} >$ . As $<{\mathcal S} >$ is convex, there exists a hyper-plane separating $\bar{a}$ and $<{\mathcal S} >$ , i.e. there exists a $p\in{\mathbb R}^{\mathcal J}$ and $\delta>0$ such that

$\label{MW:plane} p\cdot \bar{a} \geq p\cdot \sigma + \delta,\qquad \forall \sigma\in <{\mathcal S} >.$

We define $\hat{\sigma}(T)=\sum_{t=1}^T \sigma(t)/T$ and $\hat{a}(T)=\sum_{t=1}^T a(t)/T$ , the average schedule and arrivals by time $T$ . Now

$\liminf_{T\rightarrow\infty} p\cdot \frac{Q(T)}{T}= \liminf_{T\rightarrow\infty} p\cdot \hat{a}(T) - p\cdot \hat{\sigma}(T) = \liminf_{T\rightarrow\infty} p\cdot \bar{a}- p\cdot \hat{\sigma}(T)> \delta.$

In the second equality, we use the Strong Law of Large Numbers on $\{a(t)\}_{t=1}^\infty$ . In the final inequality, we use our separating hyper-plane . Thus as $\lim_{T\rightarrow\infty} p\cdot Q(T) =\infty$ , our Markov chain visits each set of states $\{ q\in{\mathbb Z}_+^{\mathcal J} : p\cdot q \leq c\}$ finitely many times, for each $c>0$ . So, our Markov chain is transient. $\square$

Other maximum stable schedulers exist. For instance, we could estimate $\bar{a}$ and schedule. That is record $\hat{a}(t)=\sum_{\tau=1}^t a(\tau)/t$ and chooses a random schedule with mean $\sigma(t)=\hat{a}(t-1)+\epsilon(t)\mathbf{1}$ where $\epsilon(t)\in {\mathbb R}$ is maximal so that $\hat{a}(t-1)+\epsilon(t)\in< {\mathcal S} >$ . These $\hat{a}(t)$ â and hence $\sigma(t)$ â would converge to give a stable queueing network. However, we needed to record the arrival history.

The MaxWeight scheduler has the advantage that we do not need to record any historical information beyond the queueing networks state in order to construct a schedule. Such policies are often called on-line policies.
It can be computationally expensive to calculate a solution to at each time-step. So, despite the desirable properties that we have just proven, the MaxWeight policy can be impractical.