没有合适的资源?快使用搜索试试~ 我知道了~
首页强化学习和最优控制(Dimitri P. Bertsekas)扩展演讲/摘要
强化学习和最优控制(Dimitri P. Bertsekas)扩展演讲/摘要
需积分: 7 20 下载量 10 浏览量
更新于2023-05-02
评论
收藏 4.83MB PDF 举报
强化学习和最优控制(Dimitri P. Bertsekas)扩展演讲/摘要
资源详情
资源评论
资源推荐
Ten Key Ideas for
Reinforcement Learning and Optimal Control
Dimitri P. Bertsekas
Department of Electrical Engineering and Computer Science
Massachusetts Institute of Technology
and
School of Computing, Informatics, and Decision Systems Engineering
Arizona State University
August 2019
(Periodically Updated)
Bertsekas (M.I.T.) Reinforcement Learning 1 / 82
Reinforcement Learning (RL): A Happy Union of AI and
Decision/Control/Dynamic Programming (DP) Ideas
Decision/
Control/DP
Principle of
Optimality
Markov Decision
Problems
POMDP
Policy Iteration
Value Iteration
AI/RL
Learning through
Experience
Simulation,
Model-Free Methods
Feature-Based
Representations
A*/Games/
Heuristics
Complementary
Ideas
Late 80s-Early 90s
Historical highlights
Exact DP, optimal control (Bellman, Shannon, 1950s ...)
First impressive successes: Backgammon programs (Tesauro, 1992, 1996)
Algorithmic progress, analysis, applications, first books (mid 90s ...)
Machine Learning, BIG Data, Robotics, Deep Neural Networks (mid 2000s ...)
Bertsekas (M.I.T.) Reinforcement Learning 2 / 82
AlphaGo (2016) and AlphaZero (2017)
AlphaZero (Google-Deep Mind)
Plays different!
Learned fr om scratch ... with 4 hours of training!
Plays much better than all chess programs
Same algorithm learned multiple games (Go, Shogi)
Methodology:
Simulation-based approximation to a form of the policy iteration method of DP
Uses self-learning, i.e., self-generated data for policy evaluation, and Monte Carlo
tree search for policy improvement
The success of AlphaZero is due to:
A skillful implementation/integration of known ideas
Awesome computational power
Bertsekas (M.I.T.) Reinforcement Learning 3 / 82
Approximate DP/RL Methodology is now Ambitious and Universal
Exact DP applies (in principle) to a very broad range of optimization problems
From Deterministic to Stochastic
From Combinatorial optimization to Optimal control w/ infinite state/control spaces
From One decision maker to Two player games
... BUT is plagued by the curse of dimensionality and need for a math model
Approximate DP/RL overcomes the difficulties of exact DP by:
Approximation (use neural nets and other architectures to reduce dimension)
Simulation (use a computer model in place of a math model)
State of the art:
Broadly applicable methodology: Can address broad range of challenging
problems. Deterministic-stochastic-dynamic, discrete-continuous, games, etc
There are no methods that are guaranteed to work for all or even most problems
There are enough methods to try with a reasonable chance of success for most
types of optimization problems
Role of the theory: Guide the art, delineate sound ideas, filter out bad ideas
Bertsekas (M.I.T.) Reinforcement Learning 4 / 82
About this Talk
The purpose of this talk
To selectively explain some of the key ideas of RL and its connections with DP.
To provide a road map for further study (mostly from the perspective of DP).
To provide a guide for reading my book (abbreviated RL-OC):
I
Bertsekas, "Reinforcement Learning and Optimal Control" Athena Scientific, 2019; see
also the monograph "Rollout, Policy Iteration and Distributed RL" 2020, which deals
with rollout, multiagent problems, and distributed asynchronous algorithms.
I
For slides and videolectures from 2019 and 2020 ASU courses, see my website.
References
Quite a few Exact DP books (1950s-present starting with Bellman). My books:
I
My two-volume textbook "Dynamic Programming and Optimal Control" was updated in
2017.
I
My mathematically oriented research monograph “Stochastic Optimal Control" (with S.
E. Shreve) came out in 1978.
I
My latest mathematically oriented research monograph “Abstract DP" came out in
2018.
Quite a few Approximate DP/RL/Neural Nets books (1996-Present)
I
Bertsekas and Tsitsiklis, Neuro-Dynamic Programming, 1996
I
Sutton and Barto, 1998, Reinforcement Learning (new edition 2018)
Many surveys on all aspects of the subject
Bertsekas (M.I.T.) Reinforcement Learning 5 / 82
剩余71页未读,继续阅读
一苇度过
- 粉丝: 12
- 资源: 5
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- zigbee-cluster-library-specification
- JSBSim Reference Manual
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0