8 1. INTRODUCTION
limit with an optimal utility, and it conjectures that the utility of the actual network is close to this
fluid limit when an exponential averaging parameter is scaled. It makes a statement concerning weak
limits of scaled systems. A related primal-dual algorithm is used in (36) and shown to converge to
utility-optimality as a parameter is scaled.
Our drift-plus-penalty approach can be viewed as a dual-based approach to the stochastic
problem (rather than a primal-dual approach), and it reduces to the well known dual subgradient
algorithm for linear and convex programs when applied to non-stochastic problems (see (37)(22)(17)
for discussions on this). One advantage of the drift-plus-penalty approach is the explicit convergence
analysis and performance bounds,resulting in the [O(1/V ),O(V )]performance-delay tradeoff.This
tradeoff is not shown in the alternative approaches described above.The dual approach is also robust
to non-ergodic variations and has “universal scheduling” properties, i.e., properties that hold for sys-
tems with arbitrary sample paths, as shown in Section 4.9 (see also (38)(39)(40)(41)(42)). However,
one advantage of the primal-dual approach is that it provides local optimum guarantees for problems
of minimizing f(
x) for non-convex functions f(·) (see Section 5.5 and (43)). Related dual-based ap-
proaches are used for “infinitely backlogged” systems in (31)(44)(45)(46) using static optimization,
fluid limits, and stochastic gradients, respectively. Related algorithms for channel-aware scheduling
in wireless downlinks with different analytical techniques are developed in (47)(48)(49).
We note that the [O(1/V),O(V )] performance-delay tradeoff achieved by the drift-plus-
penalty algorithm on general systems is not necessarily the optimal tradeoff for particular networks.
An optimal [O(1/V ), O(
√
V)] energy-delay tradeoff is shown by Berry and Gallager in (50) for a
single link with known channel statistics, and optimal performance-delay tradeoffs for multi-queue
systems are developed in (51)(52)(53) and shown to be achievable even when channel statistics are
unknown. This latter work builds on the Lyapunov optimization method, but it uses a more aggres-
sive drift steering technique. A place-holder technique for achieving near-optimal delay tradeoffs is
developed in (37) and related implementations are in (54)(55).
1.6 ON GENERAL MARKOV DECISION PROBLEMS
The penalties ˆx
m
(α(t), ω(t)), described in Section 1.2, depend only on the network control action
α(t) and the random event ω(t) (where ω(t) is generated by “nature” and is not influenced by
past control actions). In particular, the queue backlogs
Q(t) are not included in the penalties. A
more advanced penalty structure would be ˆx
m
(α(t), ω(t), z(t)), where z(t) is a controlled Markov
chain (possibly related to the queue backlog) with transition probabilities that depend on control
actions. Extensions of Lyapunov optimization for this case are developed in Chapter 7 using a
drift-plus-penalty metric defined over renewal frames (56)(57)(58).
A related 2-timescale approach to learning optimal decisions in Markov decision problems is
developed in (59), and learning approaches to power-aware scheduling in single queues are developed
in (60)(61)(62)(63). Background on dynamic programming and Markov decision problems can be
found in (64)(65)(66), and approximate dynamic programming, neuro-dynamic programming, and
Q-learning theory can be found in (67)(68)(69). All of these approaches may suffer from large