ApproximateDynamicProgramming:Modeling

需积分: 9 8 浏览量更新于2023-06-29 1 收藏 175KB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源推荐

Approximate Dynamic Programming - I:

Modeling

Warren B. Powell

December 7, 2009

1 Introduction

Stochastic optimization problems pose unique challenges in how they are represented mathematically.

These problems arise in a number of diﬀerent communities, often in the context of problems which

introduce speciﬁc computational characteristics. As a result, a number of contrasting notational

styles have evolved which complicate our ability to communicate research across communities. This

is particularly problematic in the general area of multistage, stochastic optimization problems, where

diﬀerent communities have made signiﬁcant algorithmic contributions which have applications to a

wide variety of problems.

The range of problems that can be modeled as stochastic, dynamic optimization problems is vast.

Examples of major problem classes include:

• Optimization over stochastic graphs - This is a fundamental problem class that addresses the

problem of managing a single entity in the presence of diﬀerent forms of uncertainty with ﬁnite

actions.

• Dynamic resource allocation problems - These include scheduling people and machines, routing

vehicles, managing inventories, and investing in new facilities and technologies. These problems

arise in supply chain management, personnel management, health care, military operations,

agriculture and energy.

• Demand management - These problems include booking strategies for airlines, hotels, hospitals,

vendor-managed inventories, and incentives to control the demand for energy.

• Management of ﬁnancial portfolios - How should a portfolio be spread over diﬀerent investments

to strike a balance between risk and return?

• R & D portfolio problems - How should research and development portfolios be managed to

reach speciﬁc technological goals? What investment strategy should we pursue to ensure that

we will meet government targets for renewable energy in 30 years? These decisions need to be

made in the presence of uncertainty about prices, climate, technology and government policy.

• Pricing problems - How should products and services be priced to maximize total revenue?

• Engineering control problems - How much CO2 should we release into the atmosphere? What

time window should you commit to for providing service? At what speed should you ﬂy your

aircraft?

• Sensor management problems - We would like to manage a team of technicians collecting

information about the presence of disease in the population, the concentration of pollution or

radiation in the atmosphere, or the concentration of pollutants in the water.

These problems are hardly exhaustive, but hint at the range of applications and types of complex-

ities that we might encounter. In all of these problems, we face the challenge of making decisions

sequentially, in that we make a decision, and then observe information that we did not know when we

made the ﬁrst decision. We then get to make another decision, after which we see more information.

The goal is to make decisions over time that achieve some objective.

There are several ways to model these problems, and diﬀerent communities have evolved mod-

eling and algorithmic strategies to deal with speciﬁc problem classes. For example, the simulation

community typically uses myopic policies (rules that do not directly consider the impact of decisions

now on the future) which might depend on one or more tunable parameters. For example, an (q, Q)

inventory policy orders new product if the inventory falls below q, and places an order to bring the

total inventory up to Q. In this case, q and Q are tunable parameters which can be optimized to

ﬁnd the best policy, indirectly taking into account the impact of decisions now on the future.

The Markov decision process community assumes that we can represent our system as being in a

state s at time t. If we choose action a, then we let p(s

|s, a) be the probability that we then land in

state s

. If C(s, a) is the contribution (reward) we earn if we choose action a when in state s, then

we can ﬁnd the best action by solving Bellman’s optimality equation, given by

V (s) = max



C(s, a) + γ

p(s

|s, a)V (s

)



, (1)

where γ is a discount factor. We are assuming that we are maximizing total discounted contributions

over an inﬁnite horizon. The challenge is computing the value V (s) for each (discrete) state s. There

are powerful algorithms for solving this problem, but they require enumerating the set of potential

states. While there are many problems that can be solved with this strategy, the method breaks

down when s consists of a vector of elements. This produces the well-known curse of dimensionality

of dynamic programming.

Approximate dynamic programming (ADP) is both a modeling and algorithmic framework for

solving stochastic optimization problems. Most of the literature has focused on the problem of

approximating V (s) to overcome the problem of multidimensional state variables. In addition to

the problem of multidimensional state variables, there are many problems with multidimensional

random variables, and multidimensional decision variables (most commonly referred to as actions

in the dynamic programming community, or controls in the engineering literature). These three

challenges make up what have been called the three curses of dimensionality.

It is important in any presentation on dynamic programming to acknowledge the diﬀerent com-

munities that have contributed to the ﬁeld. The challenge of making good decisions over time in

the presence of uncertainty arises in a number of ﬁelds, and as a result it is not surprising to see

similar ideas being developed under diﬀerent notational systems and diﬀerent vocabularies. These

communities include:

• Discrete Markov decision processes (MDP’s) - This covers research in computer science as well

as the MDP community in operations research. These problems are typically characterized by

discrete states (with possibly many states), and discrete actions, but typically not very many

actions.

• Control theory - These communities include engineering in the physical sciences and economics.

Problems are often modeled in continuous time, with decision variables (controls) that are

typically continuous and low-dimensional (e.g. one to a dozen dimensions). Randomness often

arises in the form of measurement noise.

• Stochastic programming - This community deals with vector-valued (and often high- dimen-

sional) decision vectors and general forms of uncertainty which are represented using scenario

trees. This community typically does not use Bellman’s optimality equation as an algorithmic

device.

剩余15页未读，继续阅读

waterlike007

粉丝: 0
资源: 4

会员权益专享

Approximate Dynamic Programming:Modeling

会员权益专享

最新资源

Approximate Dynamic Programming:Modeling

dynamic programming.pdf

dynamic programming

Approximate.Dynamic.Programming.

Approximate Dynamic Programming:Algorithms

Approximate Dynamic Programming Solving the Curses of Dimensionality

approximate dynamic programming

r = 5 t = np.linspace(0, 2 * np.pi, 100) x = r * np.cos(t) y = r * np.sin(t) approx = pathlength(x, y) exact = 2 * np.pi * r

使用ADP方法迭代求解最优控制

在线ADP近似动态规划算法

在线ADP近似动态规划代码

策略迭代的ADP是如何实现的

policy iteration ADP 和 time-based ADP 的区别

近似动态规划 python

强化学习与近似动态规划

adp和actorcritic

请用matlab来写二分法代码

求e的近似值，保留四位小数（e=1+1/1!+1/2!+1/3!+...+1/n!)

最优控制动态规划HJB

Postgres聚合函数有哪些

会员权益专享

最新资源