动态规划与最优控制第三版

需积分: 9 130 浏览量更新于2024-07-23 收藏 1.57MB PDF 举报

"《动态规划第三版》是Dimitri P. Bertsekas撰写的一本关于动态规划和最优控制的著作，由Athena Scientific出版社出版。这本书详细探讨了动态规划这一数学优化方法及其在最优控制问题中的应用。" 动态规划（Dynamic Programming）是一种解决最优化问题的数学方法，它通过将复杂问题分解为更小的子问题来求解。这个概念最初由Richard Bellman在20世纪50年代提出，主要用于处理多阶段决策过程中的优化问题。在《动态规划第三版》中，作者Dimitri P. Bertsekas深入浅出地介绍了这一领域的核心理论和实践应用。书的内容可能涵盖了以下几个方面： 1. **基础理论**：包括动态规划的基本概念、基本方程（如Bellman方程）以及状态转移和决策过程的描述。这部分会解释如何定义状态空间、动作空间、奖励函数和策略，并介绍如何用动态规划求解这些问题。 2. **离散时间与连续时间动态规划**：离散时间动态规划适用于离散时间步长的决策问题，而连续时间动态规划则处理时间连续的情况。书中可能详细讨论了这两种情况下的最优控制问题，包括动态规划算法的设计和分析。 3. **数值方法**：动态规划通常涉及大量的计算，因此书中可能会介绍各种数值方法，如迭代算法、价值迭代、策略迭代和线性规划等，用于近似求解大规模或高维度问题。 4. **应用案例**：书中可能包含各种实际应用示例，如资源分配、项目调度、网络优化、机器人路径规划和经济模型等，这些案例有助于读者理解动态规划的实际应用和价值。 5. **最优控制理论**：最优控制是动态规划的一个重要应用领域，书中可能涵盖了LQR（线性二次调节器）、HJB（Hamilton-Jacobi-Bellman）方程等相关理论，以及它们在控制系统设计中的应用。 6. **扩展主题**：除了基础内容外，第三版可能还包含了最新的研究成果和技术，比如与随机动态规划、部分观察的马尔科夫决策过程（POMDPs）或强化学习的联系。 7. **附录和索引**：为了便于查阅，书中可能包含了详细的参考文献、数学公式解析和一个全面的索引，方便读者深入研究特定主题。 Dimitri P. Bertsekas是一位在系统科学和优化领域有着深厚造诣的学者，他的作品通常以其严谨的数学表述和丰富的实例解析而受到赞誉。通过阅读这本书，读者可以系统地学习动态规划，并将其应用于实际的最优化问题中。

Figure

1.1.1

Inventory

control

example.

period

the

current

stock

(state)

the

stock

ordered

(control)

Uk,

and

the

demand

(random

distur-

bance)

'Wk

determine

the

cost

r(xk)+cUk

and

the

stock

Xk+1

+Uk

'Wk

the

period.

.........

---

Eh,

otherwise,

Introduction

/tk(Xk)

amount

that

should

ordered

time

k if

the

stock

Xk.

and

want

minimize

J1t"(xo)

for a given

over all

'if

that

satisfy

the

constraints

the

problem.

This

is a

typical

dynamic

programming

problem.

will

analyze

this

problem

various

forms

subsequent

sections.

For

example,

we will show

Section

4.2

that

for a

reasonable

choice

the

cost

function,

the

optimal

ordering

policy is of

the

form

The

sequence

'if

{{to,

...

jlN

will

referred

as a policy

contr-ol law.

For

each

'if,

the

corresponding

cost

for a fixed

initial

stock

:ro is

minimize

the

expected

cost.

The

meaning

jlk

that,

for

each

and

each

possible value

Xk,

Sec. 1.1

Chap.

dk+1

The

Dynamic

Programming

Algorithm

IDemand

Period k

eriod

Stocl<

Perio

Inventory System

Xk+ 1 =

Stock ordered at

Period

Stocl< at P

Cost of Penod k

r(xk) +

CUI<

(b)

The

purchasing

cost

C'Uk,

where

c is

cost

per

unit

ordered.

There

is also a

terminal

cost

R(XN) for

being

left

with

inventory

the

end

periods.

Thus,

the

total

cost over N

periods

where

is a

suitable

threshold

level

determined

the

data

the

problem.

other

words,

when

stock

falls

below

the

threshold

Sk,

order

just

enough

bring

stock

Sk.

want

minimize

this

cost

proper

choice

the

orders

Uo,

...

UN-I,

subject

the

natural

constraint

2::

0 for all k.

this

point

need

distinguish

between

closed-loop

and

open-

loop

minimization

the

cost.

open-loop

minimization

we select all

orders

Uo,

...

UN-I

once

time

without

waiting

see

the

subsequent

demand

levels.

closed-loop

minimization

postpone

placing

the

order

until

the

last

possible

moment

(time

when

the

current

stock

will

known.

The

idea

that

since

there

penalty

for

delaying

the

order

time

can

take

advantage

information

that

becomes

available

between

times

and

(the

demand

and

stock

level

past

periods).

Closed-loop

optimization

central

importance

dynamic

program-

ming

and

the

type

optimization

that

we will

consider

almost

exclusively

this

book.

Thus,

our

basic

formulation,

decisions

are

made

stages

while

gathering

information

between

stages

that

will

used

enhance

the

quality

the

decisions.

The

effect

this

the

structure

the

resulting

optimization

problem

quite

profound.

particular,

closed-loop inven-

tory

optimization

are

not

interested

finding

optimal

numerical

values

the

orders

but

rather

want

find

optimal rule

for

selecting at each

pe'f'iod k

o'f'der

for

each possible value

stock

that

can conceivably

occur-.

This

"action

versus

strategy"

distinction.

Mathematically,

closed-loop

inventory

optimization,

want

find

sequence

functions

Itk,

k = 0,

...

- 1,

mapping

stock

into

order

The

preceding

example

illustrates

the

main

ingredients of

the

basic

problem

formulation:

(a)

discrete-time

system

the

form

where

is some function; for

example

the

inventory case, we have

fk(Xk,

Uk,

'Wk)

Xli:

-I-

'ILk

'Wk·

(b)

Independent

random

parame"ters

'Wk.

This

will

generalized

al-

lowing

the

probability

distribution

'Wk

depend

and

Uk;

the

context

the

inventory example, we

can

think

of a

situation

where

the

level of

demand

'Wk

is influenced by

the

current

stock

level

Xk·

the

example,

we have 'Uk

general,

the

constraint

set

will

depend

and

the

time

index

that

is,

'Uk E

Uk(Xk).

To see how

constraints

dependent

can

arise

the

inventory

context,

think

of a

situation

where

there

upper

bound

the

level of

stock

that

can

accommodated,

Xk.'

(d)

addit'lve cost

the

form

Introduction

Suppose

that

produce

certain

product,

four

operations

must

performed

certain

machine.

The

operations

are

denoted

A, B,

and

assume

that

operation

can

performed

only

after

operation

has

been

performed,

and

operation

can

performed

only

after

operation

has

been

performed.

(Thus

the

sequence

CDAB

is allowable

but

the

sequence

CDBA

not.)

The

setup

cost C

for

passing

from

any

operation

'IT/,

any

other

operation

n is given.

There

is also

initial

startup

cost SA

for

starting

with

operation

C, respectively.

The

cost

a sequence is

the

sum

the

setup

costs

associated

with

it; for

example,

the

operation

sequence

ACDB

has

cost

Example

1.1.2

Deterministic

Scheduling

Problem)

Thus

discrete-state

system

can

equivalently

described in

terms

of a difference

equation

terms

transition

probabilities.

Depend-

ing

the

given problem,

may

notationally

mathematically

convenient

use one

description

over

the

other.

The

following examples

illustrate

discrete-state

problems.

The

first

example involves a

deterministic problem,

that

is, a

problem

where

there

is no

stochastic

uncertainty.

such

problem,

when

control

is chosen

a given

state,

the

state

is fully

determined;

that

is, for

any

state

control u,

and

time

the

transition

probability

Pij (u, k) is equal

1 for a

single

state

and

is 0 for all

other

candidate

states.

The

other

three

examples involve

stochastic

problems,

where

the

state

resulting

from

a given choice of

control

a given

state

cannot

determined

a priori.

where

wet,

u,"j) is

the

set

Sec.

1.1

Chap. 1

The

Dynamic

Programming

Algorithm

where

are

some functions; in

the

inventory example, we have

(e)

Optimization over (closed-loop) policies,

that

is, rules for choosing

for each k

and

each possible value of

Xk.

This

type

state

transition

can

alternatively

described

terms

the

discrete-time

system

equation

the

preceding example,

the

state

was a continuous

real

variable,

and

easy

think. of multidimensional generalizations

where

the

state

n-dimensional vector of real variables.

is also possible, however,

that

the

state

takes

values from a discrete set, such as

the

integers.

A version of

the

inventory problem where a discrete viewpoint is

natural

arises

when

stock

measured

whole

units

(such as

cars),

each

of which is

a significant fraction of

xk,

Uk,

Wk.

appropriate

then

take

state

space

the

set

of all integers

rather

than

the

set

real

numbers.

The

form of

the

system

equation

and

the

cost

per

period

will, of

course,

stay

the

same.

Generally,

there

are

many

situations

where

the

state

naturally

dis-

crete

and

there

is no continuous

counterpart

the

problem. Such sit-

uations

are

often conveniently specified in

terms

the

probabilities of

transition

between

the

states.

What

we need

know is Pij

(u,

k), which

the

probability

time

that

the

state

will

given

that

the

current

state

is 'i,

and

the

control selected is u, Le.,

Discrete-State

and

Finite-State

Problems

where

the

probability

distribution

the

random

parameter

Conversely, given a

discrete-state

system

the

form

together

with

the

probability

distribution

Pk(Wk

Xk,

Uk)

Wk,

can

provide

equivalent

transition

probability

description.

The

corresponding

transition

probabilities

are

given

can

view

this

problem

a sequence

three

decisions,

namely

the

choice

the

first

three

operations

performed

(the

last

operation

determined

from

the

preceding

three).

appropriate

consider

state

the

set

operations

already

performed,

the

initial

state

being

artificial

state

corresponding

the

beginning

the

decision process.

The

possible

state

transitions

corresponding

the

possible

states

and

decisions for

this

problem

is shown

Fig. 1.1.2. Here

the

problem

deterministic,

Le.,

a given

state,

each

choice

control

leads

uniquely

determined

state.

For

example,

state

the

decision

perform

operation

D leads

state

ACD

with

certainty,

and

has

cost

CCD.

Deterministic

problems

with

a finite

number

states

can

conveniently

represented

terms

transition

graphs'

such

the

one

Fig. 1.1.2.

The

optimal

solution

corresponds

the

path

that

starts

the

initial

state

and

ends

some

state

the

terminal

time

and

has

minimum

sum

arc

costs

plus

the

terminal

cost. We will

study

systematically

problems

this

type

Chapter

剩余46页未读，继续阅读

ljl199141

粉丝: 0
资源: 1

动态规划与最优控制第三版

Abstract Dynamic Programming THIRD EDITION

算法导论 第三版 Introduction to Algorithms, Third Edition

MySQL™ The definitive guide to using, programming, and administering MySQL 4.1 and 5.0 Third Edition.chm

Introduction to algorithm(Third edition)Solutions

Android 6 for Programmers, AN APP-DRIVEN APPROACH THIRD EDITION (pdf)

[算法导论].[Introduction.to.Algorithms].Third Edition

C++ Programming Language 3rd Edition（英文，含标签）

Prentice.Hall.C++.GUI.Programming.with.Qt.4.2nd.Edition.2008.chm

Programming ASP.NET

IntroductiontoAlgorithms,.Edition Solutions

最新资源

算法导论第三版 Introduction to Algorithms, Third Edition