没有合适的资源?快使用搜索试试~ 我知道了~
首页An Introduction to Deep Reinforcement Learning
介绍深度强化学习的教材,非常实用。摘要:Deep reinforcement learning is the combination of reinforce- ment learning (RL) and deep learning. This field of research has been able to solve a wide range of complex decision- making tasks that were previously out of reach for a machine. Thus, deep RL opens up many new applications in domains such as healthcare, robotics, smart grids, finance, and many more. This manuscript provides an introduction to deep reinforcement learning models, algorithms and techniques. Particular focus is on the aspects related to generalization and how deep RL can be used for practical applications. We assume the reader is familiar with basic machine learning concepts.
资源详情
资源评论
资源推荐

An Introduction to Deep
Reinforcement Learning
Vincent François-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare and Joelle
Pineau (2018), “An Introduction to Deep Reinforcement Learning”, Foundations and
Trends in Machine Learning: Vol. 11, No. 3-4. DOI: 10.1561/2200000071.
Vincent François-Lavet
McGill University
vincent.francois-lavet@mcgill.ca
Peter Henderson
McGill University
peter.henderson@mail.mcgill.ca
Riashat Islam
McGill University
riashat.islam@mail.mcgill.ca
Marc G. Bellemare
Google Brain
bellemare@go ogle.com
Joelle Pineau
Faceb ook, McGill University
jpineau@cs.mcgill.ca
Boston — Delft
arXiv:1811.12560v2 [cs.LG] 3 Dec 2018

Contents
1 Introduction 2
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Machine learning and deep learning 6
2.1 Supervised learning and the concepts of bias and overfitting 7
2.2 Unsupervised learning . . . . . . . . . . . . . . . . . . . . 9
2.3 The deep learning approach . . . . . . . . . . . . . . . . . 10
3 Introduction to reinforcement learning 15
3.1 Formal framework . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Different components to learn a policy . . . . . . . . . . . 20
3.3 Different settings to learn a policy from data . . . . . . . . 21
4 Value-based methods for deep RL 24
4.1 Q-learning . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Fitted Q-learning . . . . . . . . . . . . . . . . . . . . . . 25
4.3 Deep Q-networks . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Double DQN . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5 Dueling network architecture . . . . . . . . . . . . . . . . 29
4.6 Distributional DQN . . . . . . . . . . . . . . . . . . . . . 31
4.7 Multi-step learning . . . . . . . . . . . . . . . . . . . . . . 32

4.8
Combination of all DQN improvements and variants of DQN
34
5 Policy gradient methods for deep RL 36
5.1 Stochastic Policy Gradient . . . . . . . . . . . . . . . . . 37
5.2 Deterministic Policy Gradient . . . . . . . . . . . . . . . . 39
5.3 Actor-Critic Methods . . . . . . . . . . . . . . . . . . . . 40
5.4 Natural Policy Gradients . . . . . . . . . . . . . . . . . . 42
5.5 Trust Region Optimization . . . . . . . . . . . . . . . . . 43
5.6 Combining policy gradient and Q-learning . . . . . . . . . 44
6 Model-based methods for deep RL 46
6.1 Pure model-based methods . . . . . . . . . . . . . . . . . 46
6.2 Integrating model-free and model-based methods . . . . . 49
7 The concept of generalization 53
7.1 Feature selection . . . . . . . . . . . . . . . . . . . . . . . 58
7.2
Choice of the learning algorithm and function approximator
selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.3 Modifying the objective function . . . . . . . . . . . . . . 61
7.4 Hierarchical learning . . . . . . . . . . . . . . . . . . . . . 62
7.5 How to obtain the best bias-overfitting tradeoff . . . . . . 63
8 Particular challenges in the online setting 66
8.1 Exploration/Exploitation dilemma . . . . . . . . . . . . . . 66
8.2 Managing experience replay . . . . . . . . . . . . . . . . . 71
9 Benchmarking Deep RL 73
9.1 Benchmark Environments . . . . . . . . . . . . . . . . . . 73
9.2 Best practices to benchmark deep RL . . . . . . . . . . . 78
9.3 Open-source software for Deep RL . . . . . . . . . . . . . 80
10 Deep reinforcement learning beyond MDPs 81
10.1 Partial observability and the distribution of (related) MDPs 81
10.2 Transfer learning . . . . . . . . . . . . . . . . . . . . . . . 86
10.3 Learning without explicit reward function . . . . . . . . . . 89
10.4 Multi-agent systems . . . . . . . . . . . . . . . . . . . . . 91

11 Perspectives on deep reinforcement learning 94
11.1 Successes of deep reinforcement learning . . . . . . . . . . 94
11.2
Challenges of applying reinforcement learning to real-world
problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
11.3 Relations between deep RL and neuroscience . . . . . . . . 96
12 Conclusion 99
12.1 Future development of deep RL . . . . . . . . . . . . . . . 99
12.2 Applications and societal impact of deep RL . . . . . . . . 100
Appendices 103
References 106

An Introduction to Deep
Reinforcement Learning
Vincent François-Lavet
1
, Peter Henderson
2
, Riashat Islam
3
, Marc
G. Bellemare
4
and Joelle Pineau
5
1
McGill University; vincent.francois-lavet@mcgill.ca
2
McGill University; peter.henderson@mail.mcgill.ca
3
McGill University; riashat.islam@mail.mcgill.ca
4
Google Brain; bellemare@google.com
5
Facebook, McGill University; jpineau@cs.mcgill.ca
ABSTRACT
Deep reinforcement learning is the combination of reinforce-
ment learning (RL) and deep learning. This field of research
has been able to solve a wide range of complex decision-
making tasks that were previously out of reach for a machine.
Thus, deep RL opens up many new applications in domains
such as healthcare, robotics, smart grids, finance, and many
more. This manuscript provides an introduction to deep
reinforcement learning models, algorithms and techniques.
Particular focus is on the aspects related to generalization
and how deep RL can be used for practical applications. We
assume the reader is familiar with basic machine learning
concepts.
剩余139页未读,继续阅读
















安全验证
文档复制为VIP权益,开通VIP直接复制

评论0