没有合适的资源?快使用搜索试试~ 我知道了~
首页An Introduction to Deep Reinforcement Learning
介绍深度强化学习的教材,非常实用。摘要:Deep reinforcement learning is the combination of reinforce- ment learning (RL) and deep learning. This field of research has been able to solve a wide range of complex decision- making tasks that were previously out of reach for a machine. Thus, deep RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance, and many more. This manuscript provides an introduction to deep reinforcement learning models, algorithms and techniques. Particular focus is on the aspects related to generalization and how deep RL can be used for practical applications. We assume the reader is familiar with basic machine learning concepts.
资源详情
资源评论
资源推荐
An Introduction to Deep
Reinforcement Learning
Vincent François-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare and Joelle
Pineau (2018), “An Introduction to Deep Reinforcement Learning”, Foundations and
Trends in Machine Learning: Vol. 11, No. 3-4. DOI: 10.1561/2200000071.
Vincent François-Lavet
McGill University
vincent.francois-lavet@mcgill.ca
Peter Henderson
McGill University
peter.henderson@mail.mcgill.ca
Riashat Islam
McGill University
riashat.islam@mail.mcgill.ca
Marc G. Bellemare
Google Brain
bellemare@go ogle.com
Joelle Pineau
Faceb ook, McGill University
jpineau@cs.mcgill.ca
Boston — Delft
arXiv:1811.12560v2 [cs.LG] 3 Dec 2018
Contents
1 Introduction 2
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Machine learning and deep learning 6
2.1 Supervised learning and the concepts of bias and overfitting 7
2.2 Unsupervised learning . . . . . . . . . . . . . . . . . . . . 9
2.3 The deep learning approach . . . . . . . . . . . . . . . . . 10
3 Introduction to reinforcement learning 15
3.1 Formal framework . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Different components to learn a policy . . . . . . . . . . . 20
3.3 Different settings to learn a policy from data . . . . . . . . 21
4 Value-based methods for deep RL 24
4.1 Q-learning . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Fitted Q-learning . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Deep Q-networks . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Double DQN . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5 Dueling network architecture . . . . . . . . . . . . . . . . 29
4.6 Distributional DQN . . . . . . . . . . . . . . . . . . . . . 31
4.7 Multi-step learning . . . . . . . . . . . . . . . . . . . . . . 32
4.8
Combination of all DQN improvements and variants of DQN
34
5 Policy gradient methods for deep RL 36
5.1 Stochastic Policy Gradient . . . . . . . . . . . . . . . . . 37
5.2 Deterministic Policy Gradient . . . . . . . . . . . . . . . . 39
5.3 Actor-Critic Methods . . . . . . . . . . . . . . . . . . . . 40
5.4 Natural Policy Gradients . . . . . . . . . . . . . . . . . . 42
5.5 Trust Region Optimization . . . . . . . . . . . . . . . . . 43
5.6 Combining policy gradient and Q-learning . . . . . . . . . 44
6 Model-based methods for deep RL 46
6.1 Pure model-based methods . . . . . . . . . . . . . . . . . 46
6.2 Integrating model-free and model-based methods . . . . . 49
7 The concept of generalization 53
7.1 Feature selection . . . . . . . . . . . . . . . . . . . . . . . 58
7.2
Choice of the learning algorithm and function approximator
selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.3 Modifying the objective function . . . . . . . . . . . . . . 61
7.4 Hierarchical learning . . . . . . . . . . . . . . . . . . . . . 62
7.5 How to obtain the best bias-overfitting tradeoff . . . . . . 63
8 Particular challenges in the online setting 66
8.1 Exploration/Exploitation dilemma . . . . . . . . . . . . . . 66
8.2 Managing experience replay . . . . . . . . . . . . . . . . . 71
9 Benchmarking Deep RL 73
9.1 Benchmark Environments . . . . . . . . . . . . . . . . . . 73
9.2 Best practices to benchmark deep RL . . . . . . . . . . . 78
9.3 Open-source software for Deep RL . . . . . . . . . . . . . 80
10 Deep reinforcement learning beyond MDPs 81
10.1 Partial observability and the distribution of (related) MDPs 81
10.2 Transfer learning . . . . . . . . . . . . . . . . . . . . . . . 86
10.3 Learning without explicit reward function . . . . . . . . . . 89
10.4 Multi-agent systems . . . . . . . . . . . . . . . . . . . . . 91
11 Perspectives on deep reinforcement learning 94
11.1 Successes of deep reinforcement learning . . . . . . . . . . 94
11.2
Challenges of applying reinforcement learning to real-world
problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
11.3 Relations between deep RL and neuroscience . . . . . . . . 96
12 Conclusion 99
12.1 Future development of deep RL . . . . . . . . . . . . . . . 99
12.2 Applications and societal impact of deep RL . . . . . . . . 100
Appendices 103
References 106
An Introduction to Deep
Reinforcement Learning
Vincent François-Lavet
1
, Peter Henderson
2
, Riashat Islam
3
, Marc
G. Bellemare
4
and Joelle Pineau
5
1
McGill University; vincent.francois-lavet@mcgill.ca
2
McGill University; peter.henderson@mail.mcgill.ca
3
McGill University; riashat.islam@mail.mcgill.ca
4
Google Brain; bellemare@google.com
5
Facebook, McGill University; jpineau@cs.mcgill.ca
ABSTRACT
Deep reinforcement learning is the combination of reinforce-
ment learning (RL) and deep learning. This field of research
has been able to solve a wide range of complex decision-
making tasks that were previously out of reach for a machine.
Thus, deep RL opens up many new applications in domains
such as healthcare, robotics, smart grids, finance, and many
more. This manuscript provides an introduction to deep
reinforcement learning models, algorithms and techniques.
Particular focus is on the aspects related to generalization
and how deep RL can be used for practical applications. We
assume the reader is familiar with basic machine learning
concepts.
剩余139页未读,继续阅读
江南小白龙
- 粉丝: 57
- 资源: 15
上传资源 快速赚钱
- 我的内容管理 收起
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
会员权益专享
最新资源
- RTL8188FU-Linux-v5.7.4.2-36687.20200602.tar(20765).gz
- c++校园超市商品信息管理系统课程设计说明书(含源代码) (2).pdf
- 建筑供配电系统相关课件.pptx
- 企业管理规章制度及管理模式.doc
- vb打开摄像头.doc
- 云计算-可信计算中认证协议改进方案.pdf
- [详细完整版]单片机编程4.ppt
- c语言常用算法.pdf
- c++经典程序代码大全.pdf
- 单片机数字时钟资料.doc
- 11项目管理前沿1.0.pptx
- 基于ssm的“魅力”繁峙宣传网站的设计与实现论文.doc
- 智慧交通综合解决方案.pptx
- 建筑防潮设计-PowerPointPresentati.pptx
- SPC统计过程控制程序.pptx
- SPC统计方法基础知识.pptx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0