2018版《强化学习：入门》详解

需积分: 9 109 浏览量更新于2024-07-18 收藏 40.67MB PDF 举报

《强化学习：一个介绍》第二版是Richard S. Sutton和Andrew G. Barto合著的一本经典之作，它隶属于Adaptive Computation and Machine Learning系列的一部分。该书于2018年出版，是对2018年前版本的修订版，旨在修正之前的符号和表述错误，使其更加准确和完善。本书是深度探讨强化学习理论与实践的权威指南，对于那些对机器学习特别是智能决策制定过程感兴趣的读者来说，是不可或缺的学习资料。在本书中，作者首先在第一章"Introduction"（引言）中概述了强化学习的基本概念。强化学习是一种机器学习方法，它关注如何通过与环境的交互来学习最优策略，以便最大化长期奖励。在这个框架下，学习者（通常称为智能体）通过执行一系列动作并接收来自环境的反馈（奖励或惩罚），逐步改进其行为，以期在未来获得更好的结果。强化学习的核心要素包括状态（State）、动作（Action）、环境（Environment）、奖励（Reward）以及策略（Policy）。状态描述了智能体所处的环境信息，动作是智能体可以采取的行动，环境会根据当前状态和动作给出反馈，奖励则反映了行动的好坏，策略则是智能体选择动作的规则。本书后续章节详细介绍了马尔可夫决策过程（Markov Decision Processes, MDPs）、值函数（Value Functions）、策略搜索算法（如Q-learning和SARSA）、深度强化学习（Deep Reinforcement Learning）以及实际应用中的挑战和案例研究。作者还讨论了强化学习与其它机器学习方法的区别，强调了其在游戏、机器人控制、自然语言处理等领域的潜在应用。作为第二版，本书可能包含新的研究成果和技术进展，以及对现有理论的深入剖析，使读者能够紧跟领域前沿。此外，书中还提供了丰富的示例和练习，帮助读者巩固理解，并通过实践掌握强化学习的基本原理。《强化学习：一个介绍》第二版是一本深度、全面且实用的教材，对于想要进入强化学习领域的研究人员、工程师以及对人工智能技术感兴趣的学生来说，是深入理解这一复杂而富有挑战性领域的宝贵资源。无论是初学者还是经验丰富的专业人士，都能从中获益匪浅。

xvi Preface to the Second Edition

deserve our deepest gratitude for this edition as well, which would not exist were it not

for their contributions to edition number one. To that long list we must add many others

who contributed speciﬁcally to the second edition. Our students over the many years that

we have taught this material contributed in countless ways: exposing errors, oﬀering ﬁxes,

and—not the least—being confused in places where we could have explained things better.

We especially thank Martha Steenstrup for reading and providing detailed comments

throughout. The chapters on psychology and neuroscience could not have been written

without the help of many experts in those ﬁelds. We thank John Moore for his patient

tutoring over many many years on animal learning experiments, theory, and neuroscience,

and for his careful reading of multiple drafts of Chapters 14 and 15. We also thank Matt

Botvinick, Nathaniel Daw, Peter Dayan, and Yael Niv for their penetrating comments on

drafts of these chapter, their essential guidance through the massive literature, and their

interception of many of our errors in early drafts. Of course, the remaining errors in these

chapters—and there must still be some—are totally our own. We thank Phil Thomas for

helping us make these chapters accessible to non-psychologists and non-neuroscientists,

and we thank Peter Sterling for helping us improve the exposition. We are grateful to Jim

Houk for introducing us to the subject of information processing in the basal ganglia and

for alerting us to other relevant aspects of neuroscience. Jos´e Mart´ınez, Terry Sejnowski,

David Silver, Gerry Tesauro, Georgios Theocharous, and Phil Thomas generously helped

us understand details of their reinforcement learning applications for inclusion in the

case-studies chapter, and they provided helpful comments on drafts of these sections.

Special thanks are owed to David Silver for helping us better understand Monte Carlo

Tree Search and the DeepMind Go-playing programs. We thank George Konidaris for his

help with the section on the Fourier basis. Emilio Cartoni, Thomas Cederborg, Stefan

Dernbach, Clemens Rosenbaum, Patrick Taylor, Thomas Colin, and Pierre-Luc Bacon

helped us in a number important ways for which we are most grateful.

Sutton would also like to thank the members of the Reinforcement Learning and

Artiﬁcial Intelligence laboratory at the University of Alberta for contributions to the

second edition. He owes a particular debt to Rupam Mahmood for essential contributions

to the treatment of oﬀ-policy Monte Carlo methods in Chapter 5, to Hamid Maei for

helping develop the perspective on oﬀ-policy learning presented in Chapter 11, to Eric

Graves for conducting the experiments in Chapter 13, to Shangtong Zhang for replicating

and thus verifying almost all the experimental results, to Kris De Asis for improving

the new technical content of Chapters 7 and 12, and to Harm van Seijen for insights

that led to the separation of

-step methods from eligibility traces and (along with Hado

van Hasselt) for the ideas involving exact equivalence of forward and backward views of

eligibility traces presented in Chapter 12. Sutton also gratefully acknowledges the support

and freedom he was granted by the Government of Alberta and the National Science and

Engineering Research Council of Canada throughout the period during which the second

edition was conceived and written. In particular, he would like to thank Randy Goebel

for creating a supportive and far-sighted environment for research in Alberta. He would

also like to thank DeepMind their support in the last six months of writing the book.

Finally, we owe thanks to the many careful readers of drafts of the second edition that

we posted on the internet. They found many errors that we had missed and alerted us to

potential points of confusion.

Preface to the First Edition

We ﬁrst came to focus on what is now known as reinforcement learning in late 1979. We

were both at the University of Massachusetts, working on one of the earliest projects to

revive the idea that networks of neuronlike adaptive elements might prove to be a promising

approach to artiﬁcial adaptive intelligence. The project explored the “heterostatic theory

of adaptive systems” developed by A. Harry Klopf. Harry’s work was a rich source of

ideas, and we were permitted to explore them critically and compare them with the long

history of prior work in adaptive systems. Our task became one of teasing the ideas apart

and understanding their relationships and relative importance. This continues today,

but in 1979 we came to realize that perhaps the simplest of the ideas, which had long

been taken for granted, had received surprisingly little attention from a computational

perspective. This was simply the idea of a learning system that wants something, that

adapts its behavior in order to maximize a special signal from its environment. This

was the idea of a “hedonistic” learning system, or, as we would say now, the idea of

reinforcement learning.

Like others, we had a sense that reinforcement learning had been thoroughly explored

in the early days of cybernetics and artiﬁcial intelligence. On closer inspection, though,

we found that it had been explored only slightly. While reinforcement learning had clearly

motivated some of the earliest computational studies of learning, most of these researchers

had gone on to other things, such as pattern classiﬁcation, supervised learning, and

adaptive control, or they had abandoned the study of learning altogether. As a result, the

special issues involved in learning how to get something from the environment received

relatively little attention. In retrospect, focusing on this idea was the critical step that

set this branch of research in motion. Little progress could be made in the computational

study of reinforcement learning until it was recognized that such a fundamental idea had

not yet been thoroughly explored.

The ﬁeld has come a long way since then, evolving and maturing in several directions.

Reinforcement learning has gradually become one of the most active research areas in ma-

chine learning, artiﬁcial intelligence, and neural network research. The ﬁeld has developed

strong mathematical foundations and impressive applications. The computational study

of reinforcement learning is now a large ﬁeld, with hundreds of active researchers around

the world in diverse disciplines such as psychology, control theory, artiﬁcial intelligence,

and neuroscience. Particularly important have been the contributions establishing and

developing the relationships to the theory of optimal control and dynamic programming.

xvii

xviii Preface to the First Edition

The overall problem of learning from interaction to achieve goals is still far from being

solved, but our understanding of it has improved signiﬁcantly. We can now place compo-

nent ideas, such as temporal-diﬀerence learning, dynamic programming, and function

approximation, within a coherent perspective with respect to the overall problem.

Our goal in writing this book was to provide a clear and simple account of the key

ideas and algorithms of reinforcement learning. We wanted our treatment to be accessible

to readers in all of the related disciplines, but we could not cover all of these perspectives

in detail. For the most part, our treatment takes the point of view of artiﬁcial intelligence

and engineering. Coverage of connections to other ﬁelds we leave to others or to another

time. We also chose not to produce a rigorous formal treatment of reinforcement learning.

We did not reach for the highest possible level of mathematical abstraction and did not

rely on a theorem–proof format. We tried to choose a level of mathematical detail that

points the mathematically inclined in the right directions without distracting from the

simplicity and potential generality of the underlying ideas.

[Three paragraphs elided in favor of updated content in the second edition.]

In some sense we have been working toward this book for thirty years, and we have lots

of people to thank. First, we thank those who have personally helped us develop the overall

view presented in this book: Harry Klopf, for helping us recognize that reinforcement

learning needed to be revived; Chris Watkins, Dimitri Bertsekas, John Tsitsiklis, and

Paul Werbos, for helping us see the value of the relationships to dynamic programming;

John Moore and Jim Kehoe, for insights and inspirations from animal learning theory;

Oliver Selfridge, for emphasizing the breadth and importance of adaptation; and, more

generally, our colleagues and students who have contributed in countless ways: Ron

Williams, Charles Anderson, Satinder Singh, Sridhar Mahadevan, Steve Bradtke, Bob

Crites, Peter Dayan, and Leemon Baird. Our view of reinforcement learning has been

signiﬁcantly enriched by discussions with Paul Cohen, Paul Utgoﬀ, Martha Steenstrup,

Gerry Tesauro, Mike Jordan, Leslie Kaelbling, Andrew Moore, Chris Atkeson, Tom

Mitchell, Nils Nilsson, Stuart Russell, Tom Dietterich, Tom Dean, and Bob Narendra.

We thank Michael Littman, Gerry Tesauro, Bob Crites, Satinder Singh, and Wei Zhang

for providing speciﬁcs of Sections 4.7, 15.1, 15.4, 15.5, and 15.6 respectively. We thank

the Air Force Oﬃce of Scientiﬁc Research, the National Science Foundation, and GTE

Laboratories for their long and farsighted support.

We also wish to thank the many people who have read drafts of this book and

provided valuable comments, including Tom Kalt, John Tsitsiklis, Pawel Cichosz, Olle

G¨allmo, Chuck Anderson, Stuart Russell, Ben Van Roy, Paul Steenstrup, Paul Cohen,

Sridhar Mahadevan, Jette Randlov, Brian Sheppard, Thomas O’Connell, Richard Coggins,

Cristina Versino, John H. Hiett, Andreas Badelt, Jay Ponte, Joe Beck, Justus Piater,

Martha Steenstrup, Satinder Singh, Tommi Jaakkola, Dimitri Bertsekas, Torbj¨orn Ekman,

Christina Bj¨orkman, Jakob Carlstr¨om, and Olle Palmgren. Finally, we thank Gwyn

Mitchell for helping in many ways, and Harry Stanton and Bob Prior for being our

champions at MIT Press.

剩余549页未读，继续阅读

qq_38763135

粉丝: 0
资源: 2

2018版《强化学习：入门》详解

强化学习入门经典：Reinforcement Learning_An Introduction

2018年强化学习经典教材：《Reinforcement Learning: An Introduction》第二版

强化学习：理查德·S·萨顿和安德鲁·G·巴特的第二版导论

Reinforcement Learning An Introduction2018

reinforcement learning an introduction 答案

Reinforcement learning an introduction 2nd edition

Reinforcement learning an introduction中文pdf

Reinforcement Learning An Introduction.pdf

Reinforcement Learning An Introduction 2nd.rar

Reinforcement Learning An Introduction~Summary of Notation

最新资源