没有合适的资源?快使用搜索试试~ 我知道了~
首页Approximate Dynamic Programming:Modeling
资源详情
资源推荐
Approximate Dynamic Programming - I:
Modeling
Warren B. Powell
December 7, 2009
Abstract
The first step in solving a stochastic optimization problem is providing a mathematical model. How
the problem is modeled can impact the solution strategy. In this chapter, we provide a flexible
modeling framework that uses a classic control-theoretic framework, avoiding devices such as one-
step transition matrices. We describe the five fundamental elements of any stochastic, dynamic
program. Different notational conventions are introduced, and the types of policies that can be used
to guide decisions are described in detail. This discussion puts approximate dynamic programming in
the context of a variety of other algorithmic strategies by using the modeling framework to describe
a wide range of policies. A brief discussion of model-free programming is also provided.
1 Introduction
Stochastic optimization problems pose unique challenges in how they are represented mathematically.
These problems arise in a number of different communities, often in the context of problems which
introduce specific computational characteristics. As a result, a number of contrasting notational
styles have evolved which complicate our ability to communicate research across communities. This
is particularly problematic in the general area of multistage, stochastic optimization problems, where
different communities have made significant algorithmic contributions which have applications to a
wide variety of problems.
The range of problems that can be modeled as stochastic, dynamic optimization problems is vast.
Examples of major problem classes include:
• Optimization over stochastic graphs - This is a fundamental problem class that addresses the
problem of managing a single entity in the presence of different forms of uncertainty with finite
actions.
• Dynamic resource allocation problems - These include scheduling people and machines, routing
vehicles, managing inventories, and investing in new facilities and technologies. These problems
arise in supply chain management, personnel management, health care, military operations,
agriculture and energy.
• Demand management - These problems include booking strategies for airlines, hotels, hospitals,
vendor-managed inventories, and incentives to control the demand for energy.
• Management of financial portfolios - How should a portfolio be spread over different investments
to strike a balance between risk and return?
• R & D portfolio problems - How should research and development portfolios be managed to
reach specific technological goals? What investment strategy should we pursue to ensure that
we will meet government targets for renewable energy in 30 years? These decisions need to be
made in the presence of uncertainty about prices, climate, technology and government policy.
• Pricing problems - How should products and services be priced to maximize total revenue?
• Engineering control problems - How much CO2 should we release into the atmosphere? What
time window should you commit to for providing service? At what speed should you fly your
aircraft?
• Sensor management problems - We would like to manage a team of technicians collecting
information about the presence of disease in the population, the concentration of pollution or
radiation in the atmosphere, or the concentration of pollutants in the water.
These problems are hardly exhaustive, but hint at the range of applications and types of complex-
ities that we might encounter. In all of these problems, we face the challenge of making decisions
sequentially, in that we make a decision, and then observe information that we did not know when we
made the first decision. We then get to make another decision, after which we see more information.
The goal is to make decisions over time that achieve some objective.
There are several ways to model these problems, and different communities have evolved mod-
eling and algorithmic strategies to deal with specific problem classes. For example, the simulation
1
community typically uses myopic policies (rules that do not directly consider the impact of decisions
now on the future) which might depend on one or more tunable parameters. For example, an (q, Q)
inventory policy orders new product if the inventory falls below q, and places an order to bring the
total inventory up to Q. In this case, q and Q are tunable parameters which can be optimized to
find the best policy, indirectly taking into account the impact of decisions now on the future.
The Markov decision process community assumes that we can represent our system as being in a
state s at time t. If we choose action a, then we let p(s
0
|s, a) be the probability that we then land in
state s
0
. If C(s, a) is the contribution (reward) we earn if we choose action a when in state s, then
we can find the best action by solving Bellman’s optimality equation, given by
V (s) = max
a
C(s, a) + γ
X
s
0
p(s
0
|s, a)V (s
0
)
, (1)
where γ is a discount factor. We are assuming that we are maximizing total discounted contributions
over an infinite horizon. The challenge is computing the value V (s) for each (discrete) state s. There
are powerful algorithms for solving this problem, but they require enumerating the set of potential
states. While there are many problems that can be solved with this strategy, the method breaks
down when s consists of a vector of elements. This produces the well-known curse of dimensionality
of dynamic programming.
Approximate dynamic programming (ADP) is both a modeling and algorithmic framework for
solving stochastic optimization problems. Most of the literature has focused on the problem of
approximating V (s) to overcome the problem of multidimensional state variables. In addition to
the problem of multidimensional state variables, there are many problems with multidimensional
random variables, and multidimensional decision variables (most commonly referred to as actions
in the dynamic programming community, or controls in the engineering literature). These three
challenges make up what have been called the three curses of dimensionality.
It is important in any presentation on dynamic programming to acknowledge the different com-
munities that have contributed to the field. The challenge of making good decisions over time in
the presence of uncertainty arises in a number of fields, and as a result it is not surprising to see
similar ideas being developed under different notational systems and different vocabularies. These
communities include:
• Discrete Markov decision processes (MDP’s) - This covers research in computer science as well
as the MDP community in operations research. These problems are typically characterized by
discrete states (with possibly many states), and discrete actions, but typically not very many
actions.
• Control theory - These communities include engineering in the physical sciences and economics.
Problems are often modeled in continuous time, with decision variables (controls) that are
typically continuous and low-dimensional (e.g. one to a dozen dimensions). Randomness often
arises in the form of measurement noise.
• Stochastic programming - This community deals with vector-valued (and often high- dimen-
sional) decision vectors and general forms of uncertainty which are represented using scenario
trees. This community typically does not use Bellman’s optimality equation as an algorithmic
device.
2
剩余15页未读,继续阅读
waterlike007
- 粉丝: 0
- 资源: 4
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- zigbee-cluster-library-specification
- JSBSim Reference Manual
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功