没有合适的资源？快使用搜索试试~ 我知道了~

首页强化学习和最优控制（Dimitri P. Bertsekas）扩展演讲/摘要

强化学习和最优控制（Dimitri P. Bertsekas）扩展演讲/摘要

强化学习

需积分: 7 20 下载量 10 浏览量更新于2023-05-02 评论收藏 4.83MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

试读

72页

强化学习和最优控制（Dimitri P. Bertsekas）扩展演讲/摘要

资源详情

资源评论

资源推荐

Ten Key Ideas for

Reinforcement Learning and Optimal Control

Dimitri P. Bertsekas

Department of Electrical Engineering and Computer Science

Massachusetts Institute of Technology

and

School of Computing, Informatics, and Decision Systems Engineering

Arizona State University

August 2019

(Periodically Updated)

Bertsekas (M.I.T.) Reinforcement Learning 1 / 82

Reinforcement Learning (RL): A Happy Union of AI and

Decision/Control/Dynamic Programming (DP) Ideas

Decision/

Control/DP

Principle of

Optimality

Markov Decision

Problems

POMDP

Policy Iteration

Value Iteration

AI/RL

Learning through

Experience

Simulation,

Model-Free Methods

Feature-Based

Representations

A*/Games/

Heuristics

Complementary

Ideas

Late 80s-Early 90s

Historical highlights

Exact DP, optimal control (Bellman, Shannon, 1950s ...)

First impressive successes: Backgammon programs (Tesauro, 1992, 1996)

Algorithmic progress, analysis, applications, ﬁrst books (mid 90s ...)

Machine Learning, BIG Data, Robotics, Deep Neural Networks (mid 2000s ...)

Bertsekas (M.I.T.) Reinforcement Learning 2 / 82

AlphaGo (2016) and AlphaZero (2017)

AlphaZero (Google-Deep Mind)

Plays diﬀerent!

Learned fr om scratch ... with 4 hours of training!

Plays much better than all chess programs

Same algorithm learned multiple games (Go, Shogi)

Methodology:

Simulation-based approximation to a form of the policy iteration method of DP

Uses self-learning, i.e., self-generated data for policy evaluation, and Monte Carlo

tree search for policy improvement

The success of AlphaZero is due to:

A skillful implementation/integration of known ideas

Awesome computational power

Bertsekas (M.I.T.) Reinforcement Learning 3 / 82

Approximate DP/RL Methodology is now Ambitious and Universal

Exact DP applies (in principle) to a very broad range of optimization problems

From Deterministic to Stochastic

From Combinatorial optimization to Optimal control w/ inﬁnite state/control spaces

From One decision maker to Two player games

... BUT is plagued by the curse of dimensionality and need for a math model

Approximate DP/RL overcomes the difﬁculties of exact DP by:

Approximation (use neural nets and other architectures to reduce dimension)

Simulation (use a computer model in place of a math model)

State of the art:

Broadly applicable methodology: Can address broad range of challenging

problems. Deterministic-stochastic-dynamic, discrete-continuous, games, etc

There are no methods that are guaranteed to work for all or even most problems

There are enough methods to try with a reasonable chance of success for most

types of optimization problems

Role of the theory: Guide the art, delineate sound ideas, ﬁlter out bad ideas

Bertsekas (M.I.T.) Reinforcement Learning 4 / 82

About this Talk

The purpose of this talk

To selectively explain some of the key ideas of RL and its connections with DP.

To provide a road map for further study (mostly from the perspective of DP).

To provide a guide for reading my book (abbreviated RL-OC):

Bertsekas, "Reinforcement Learning and Optimal Control" Athena Scientiﬁc, 2019; see

also the monograph "Rollout, Policy Iteration and Distributed RL" 2020, which deals

with rollout, multiagent problems, and distributed asynchronous algorithms.

For slides and videolectures from 2019 and 2020 ASU courses, see my website.

References

Quite a few Exact DP books (1950s-present starting with Bellman). My books:

My two-volume textbook "Dynamic Programming and Optimal Control" was updated in

2017.

My mathematically oriented research monograph “Stochastic Optimal Control" (with S.

E. Shreve) came out in 1978.

My latest mathematically oriented research monograph “Abstract DP" came out in

2018.

Quite a few Approximate DP/RL/Neural Nets books (1996-Present)

Bertsekas and Tsitsiklis, Neuro-Dynamic Programming, 1996

Sutton and Barto, 1998, Reinforcement Learning (new edition 2018)

Many surveys on all aspects of the subject

Bertsekas (M.I.T.) Reinforcement Learning 5 / 82

剩余71页未读，继续阅读

最优控制大作业（强化学习）

标题：强化学习与最优控制的大作业资源描述简介：本资源描述提供关于强化学习与最优控制的大作业的概述和基本要求，并提供相关资源和指导，旨在帮助学生深入理解和应用强化学习与最优控制的方法和技术。资源中包含大作业的主题、目标、背景知识要求、实施步骤和评估指标，以及参考资料和工具的推荐。内容：大作业主题和目标确定强化学习与最优控制的应用领域（例如机器人控制、自动驾驶、资源分配等）说明大作业的目标，如设计一个最优控制器、解决某个优化问题等背景知识要求强化学习基本概念和算法（如Q-learning、策略梯度等）最优控制理论基础（如LQR、LQG等）编程和仿真工具的基本使用（如Python、MATLAB、Simulink等）大作业实施步骤系统建模和问题定义选择适当的强化学习算法和最优控制方法实现算法和方法，并进行仿真实验分析和评估实验结果评估指标性能指标：例如控制器的稳定性、收敛速度、系统响应等实现复杂度：例如算法复杂度、计算资源消耗等结果分析和解释：例如实验结果的可解释性和合理性参考资料和工具推荐强化学习和最优控制相关教材、论文和在线资源编

强化学习与最优控制最新最全课件资料.zip

【三维装箱】遗传和模拟退火算法求解三维装箱优化问题【含Matlab源码 031期】.zip

一苇度过

粉丝: 12
资源: 5

上传资源快速赚钱

我的内容管理收起

我的资源快来上传第一个资源

我的收益

登录查看自己的收益

我的积分登录查看自己的积分

我的C币登录后查看C币余额

我的收藏

我的下载

下载帮助

会员权益专享

强化学习和最优控制（Dimitri P. Bertsekas）扩展演讲/摘要

评论0

会员权益专享

最新资源

强化学习和最优控制（Dimitri P. Bertsekas）扩展演讲/摘要

评论0

最优控制大作业（强化学习）

强化学习与最优控制 最新最全课件资料.zip

强化学习和最优控制的《十个关键点》【81页PPT汇总】.pdf

强化学习与最优控制。pdf

introdection to prebabillity dimitri p. bertsekos. john n. t sitsiblis

关于强化学习的书籍推荐

推荐一本强化学习的教材

可以推荐强化学习的书籍吗

data networks dimitri bertsekas solution

linux安装doxygen

概率论导论中文版pdf

doxygen安装centos

reinforcement learning and optimal control

最优控制理论 PDF

强化学习解最优控制的matlab代码.rar_EVX8_matlab_matlab 强化学习_强化学习matlab_强化学习控制

Reinforcement Learning and Optimal Control by Dimitri P. Bertsekas (MIT)

网络优化：连续和离散模型（英文文字版）【Dimitri P. Bertsekas】

毕业设计基于STC12C5A、SIM800C、GPS的汽车防盗报警系统源码.zip

基于tensorflow2.x卷积神经网络字符型验证码识别.zip

【三维装箱】遗传和模拟退火算法求解三维装箱优化问题【含Matlab源码 031期】.zip

会员权益专享

最新资源

强化学习与最优控制最新最全课件资料.zip