anintroductiontoreinforcementlearningbySutton

Sutton

强化学习

需积分: 9 108 浏览量更新于2023-05-29 评论 1 收藏 49.14MB PDF 举报

身份认证购VIP最低享 7 折!

领优惠券(最高得80元）

资源详情

资源评论

资源推荐

Reinforcement Learning:

An Introduction

Second edition, in progress

****Complete Draft****

November 5, 2017

Richard S. Sutton and Andrew G. Barto

 2014, 2015, 2016, 2017

The text is now complete, except possibly for one more case study to be added to Chapter 16.The

references still need to be thoroughly checked, and an index still needs to b e added. Please send any

errors to rich@richsutton.com and barto@cs.umass.edu. We are also very interested in correcting any

important omissions in the “Bibliographical and Historical Remarks” at the end of each chapter. If

you think of something that really should have been cited, please let us know and we can try to get it

corrected before the ﬁnal version is printed.

ABradfordBook

The MIT Press

Cambridge, Massachusetts

London, England

iv CONTENTS

3.3 Returns and Episodes ..................................... 43

3.4 Uniﬁed Notation for Episodic and Continuing Tasks .................... 45

3.5 Policies and Value Functions ................................. 46

3.6 Optimal Policies and Optimal Value Functions ....................... 50

3.7 Optimality and Approximation ................................ 54

3.8 Summary ............................................ 55

4 Dynamic Programming 59

4.1 Policy Evaluation (Prediction) ................................ 60

4.2 Policy Improvement ...................................... 62

4.3 Policy Iteration ......................................... 64

4.4 Value Iteration ......................................... 67

4.5 Asynchronous Dynamic P r ogram mi ng ............................ 69

4.6 Generalized Policy Iteration .................................. 70

4.7 Eﬃciency of Dynamic Programming ............................. 71

4.8 Summary ............................................ 71

5 Monte Carlo Methods 75

5.1 Monte Carlo Predicti on .................................... 76

5.2 Monte Carlo Estimation of Action Values .......................... 79

5.3 Monte Carlo Control ...................................... 80

5.4 Monte Carlo Control without Exploring Starts ....................... 82

5.5 O↵-policy Prediction via Importance Sampling ....................... 84

5.6 Incremental Implementation .................................. 89

5.7 O↵-policy Monte Carlo Control ................................ 90

5.8 *Discounting-aware Importance Sampling .......................... 92

5.9 *Per-reward Importance Sampling .............................. 93

5.10 Summary ............................................ 94

6 Temporal-Di↵ eren ce Learning 97

6.1 TD Prediction ......................................... 97

6.2 Advantages of TD Prediction Methods ............................ 101

6.3 Optimality of TD(0) ...................................... 103

6.4 Sarsa: On-policy TD Control ................................. 105

6.5 Q-learning: O↵-policy TD Control .............................. 107

6.6 Expected Sarsa ......................................... 109

6.7 Maximization Bias and Double Learni ng ........................... 110

6.8 Games, Afterstates, and Other Special Cases ........................ 112

6.9 Summary ............................................ 113

7 n-step Bootstrapping 115

7.1 n-step TD Prediction ...................................... 115

CONTENTS v

7.2 n-step Sarsa ........................................... 119

7.3 n-step O↵-policy Learning by Importance Sampling .................... 121

7.4 *Per-reward O↵-policy Methods ................................ 122

7.5 O↵-policy Learning Without Importance Sampling:

The n-step Tree Backup Algorithm .............................. 124

7.6 *A Unifying Algorithm: n-step Q() ............................. 126

7.7 Summary ............................................ 129

8 Planning and Learning with Tabular M eth ods 131

8.1 Models and Planni n g ...................................... 131

8.2 Dyna: Integrati n g Planning, Acting, and Learning ..................... 133

8.3 When the Model Is Wrong ................................... 137

8.4 Prioritized Sweeping ...................................... 139

8.5 Expected vs. Sample Updates ................................. 142

8.6 Trajectory Sampling ...................................... 144

8.7 Real-time Dynamic Programming ............................... 146

8.8 Planning at Decision Time ................................... 149

8.9 Heuristic Search ........................................ 150

8.10 Rollout Algorithms ....................................... 152

8.11 Monte Carlo Tree Search .................................... 153

8.12 Summary of the Chapter .................................... 155

8.13 Summary of Part I: Dimensions ................................ 156

II Approximate Solution Me thods 160

9 On-p ol icy Prediction with Approximation 161

9.1 Value-function Approximation ................................. 161

9.2 The Prediction Objective (VE) ................................ 162

9.3 Stochastic-gradient and Semi-gradient Methods ....................... 164

9.4 Linear Methods ......................................... 167

9.5 Feature Construction for Linear Methods .......................... 171

9.5.1 Polynomials ....................................... 172

9.5.2 Fourier Basis ...................................... 173

9.5.3 Coarse Coding ..................................... 175

9.5.4 Tile Coding ....................................... 177

9.5.5 Radial Basis Function s ................................. 181

9.6 Nonlinear Function Appr oximation: Artiﬁcial Neural Networks .............. 182

9.7 Least-Squares TD ....................................... 186

9.8 Memory-based Function Ap p roximation ........................... 187

9.9 Kernel-based Function Approximation ............................ 189

9.10 Looking Deeper at On-policy Learning: Interest and Emphasis .............. 190

剩余444页未读，继续阅读

littleredhat31415

粉丝: 0
资源: 11

会员权益专享

an introduction to reinforcement learning by Sutton

评论0

会员权益专享

最新资源

an introduction to reinforcement learning by Sutton

评论0

Reinforcement learning合集

Reinforcement Learning: An Introduction最新版习题解答（第一版本）

强化学习 reinforcement learning

TensorFlow中的增强学习（Reinforcement Learning）

深度逆强化学习（Deep Inverse Reinforcement Learning）

解析深度逆强化学习（Inverse Reinforcement Learning）

TensorFlow中的强化学习（Reinforcement Learning）基础

深度强化学习（Deep Reinforcement Learning）基础概念

reinforcement learning an introduction 第2版 答案

reinforcement learning: an introduction.pdf

帮助学习强化学习的书籍

reinforcement learning an introduction 答案

强化学习入门资料algorithms for reinforcement learning

reforcement learning an introduction电子书

适合人工智能深度学习的参考文献

多智能体强化学习推荐书籍

推荐一下学习AI高阶阶段的书籍

tensorflow 强化学习 书

python强化学习书籍

推荐10本神经网络的教材

会员权益专享

最新资源

reinforcement learning an introduction 第2版答案

tensorflow 强化学习书