强化学习：理查德·S·萨顿和安德鲁·G·巴特的第二版导论

需积分: 10 8 浏览量更新于2024-07-17 收藏 85.29MB PDF 举报

"Reinforcement Learning An Introduction second edition - Richard S. Sutton and Andrew G. Barto" 本书《强化学习：入门》是第二版，由Richard S. Sutton和Andrew G. Barto两位专家共同撰写，属于Adaptive Computation and Machine Learning系列。这个系列致力于出版关于机器学习和适应性计算的前沿著作。书的封面设计灵感来源于一个由强化学习系统控制的模拟自行车的轨迹，展示了强化学习在控制领域的应用。强化学习（Reinforcement Learning，简称RL）是一种机器学习方法，它通过与环境的交互来学习最优策略。在这个过程中，智能体（agent）通过执行动作并观察由此产生的奖励或惩罚，逐渐学习如何最大化长期奖励。RL在机器人、游戏、自然语言处理、推荐系统等多个领域都有广泛应用。在第二版中，Sutton和Barto更新了他们对强化学习的理解，涵盖了该领域的最新进展。他们详细介绍了基础概念，如马尔可夫决策过程（Markov Decision Processes, MDPs）、价值函数、策略迭代和Q学习等。此外，还讨论了更高级的主题，如经验回放、双线性近似和深度强化学习，后者利用深度神经网络作为函数逼近器，大大提升了RL在复杂环境中的性能。书中不仅提供了理论基础，还包括了大量的实例和练习题，帮助读者理解并应用这些概念。此外，作者强调了探索与开发的平衡，以及在实际问题中处理不确定性和延迟奖励的挑战。在版权方面，此书遵循Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic License，允许非商业性的分享和使用，但不允许对原始作品进行改编。这意味着读者可以自由阅读和学习，但不能用于商业目的，也不能修改原作。《强化学习：入门》第二版是一本深入浅出的RL教材，适合初学者和有经验的研究者，它详尽地阐述了强化学习的基本原理和最新发展，是该领域的经典之作。

xvi Preface to the Second Edition

deserve our deepest gratitude for this edition as well, which would not exist were it not

for their contributions to editi on number one. To that long list we must add many others

who contributed speciﬁcally to the second edi ti on . Our students over the many years that

we have taught this material contributed in countless ways: exposing errors, o↵ering ﬁxes,

and—not the least—being confused in places where we could have explained things better.

We especially thank Martha Steenstrup for reading and providing detailed comments

throughout. The chapters on psychology and neuroscience could not have been written

without the help of many experts in those ﬁelds. We thank John Moore for his pat i ent

tutoring over many many years on animal l e arn i ng experiments, theory, and neuroscience,

and for his careful reading of multiple drafts of Chapters 14 and 15. We also thank Matt

Botvinick, Nathaniel Daw, Peter Dayan, and Yael Niv for their penetrating comments on

drafts of these chapter, their essential guidance through the massive literature, and their

interception of many of our errors in early drafts. Of course, the remaining errors in these

chapters—and there must still be some—are totally our own. We thank Phil Thomas for

helping us make these chapters accessible to non-psychologists an d non-neuroscientists,

and we thank Peter Sterling for helping us improve the exposition. We are gratef ul to Jim

Houk for introducing us to the subject of information processin g in the basal ganglia and

for alerting us to other relevant aspect s of neuroscien ce. Jos´e Mart´ınez, Terry Sejnowski,

David Silver, Gerry Te s auro, Georgios Theocharous, and Phil Thomas generously helped

us understand details of their reinforcement learning applications for i nc l us ion in the

case-studies chapter, and they provided helpful comments on drafts of these sections.

Special thanks are owed to David Silver for hel p in g us better understand Monte Carlo

Tree Search and the DeepMind Go-playing programs. We thank George Konidaris for his

help with the section on the Fourier basis. Emilio Cartoni, Thomas Cederborg, Stefan

Dernbach, Clemens Rosenbaum, Pat r i ck Taylor, Thomas Colin, and Pierre-Luc Bacon

helped us in a number important ways for which we are most grateful.

Sutton would also like to thank the members of the Reinforcement Learning and

Artiﬁcial Intell i gen ce laboratory at the Uni versity of Alberta for contributions to the

second edition . He owes a particular debt to Rupam Mahmood for essential contributions

to the treatment of o↵-policy Monte Carlo methods in Chapter 5, to Hamid Maei for

helping develop the perspective on o↵-policy learning presented in Chapter 11, to Eric

Graves for conducting t h e experiments in Chapter 13, to Shangtong Zhang for replicating

and thus verifying almost all the experimental results, to Kris De Asis for improving

the new technical content of Chapters 7 and 12, and to Harm van Seijen for insights

that led to the separation of n- st ep me t hods from eligibi l ity traces and (along with Hado

van Hasselt) for the ideas involving exact equivalence of forward and backward views of

eligibility traces presented in Chapter 12. Sutton also gratefully acknowledges the support

and freedom he was granted by the Government of Alberta and the National Science and

Engineering Research Council of Canada throughout the period during which t h e second

edition was conceived and written. In particular, he would like to thank Randy Goebel

for creating a supportive and far-sighted environment for research in Alberta. He would

also like to thank DeepMind their support in the last six mont hs of writing the book.

Finally, we owe thanks t o the many careful r e ade rs of d raf t s of t he second edition that

we post e d on the internet. They found many errors that we had missed and alerted us to

potential p oi nts of confusion.

Preface to the First Edition

We ﬁrst came to focus on what is now known as reinforcement learni n g in late 1979. We

were both at the Universi ty of Massachusetts, working on one of the earliest projects to

revive the idea that networks of neuronlike adaptive elements might prove to be a promising

approach to artiﬁcial adaptive intelligence. The project explored the “heterostatic theor y

of adaptive s ystems” developed by A. Harry Klopf. Har ry ’ s work was a rich source of

ideas, and we were permitted to explor e them critically and compare them with the lon g

history of prior work in adaptive systems. Our task became one of teasin g the ideas apart

and understanding their relationships and relative importance. This continues today,

but in 1979 we came to realize that perhaps the simplest of the ideas, which had long

been taken for granted, h ad received surprisingly little attent i on from a computational

perspec t ive. This was simply the idea of a learning system that wants something, that

adapts its behavior in order to maximize a special signal from its environment . This

was the idea of a “hedonistic” learning system, or, as we would say n ow, the idea of

reinforcement learning.

Like others, we had a sense that reinforcement learning had been thoroughly explored

in the early days of cybernetics and artiﬁcial intelligence. On closer inspect i on, though,

we found that it had been explored only slightly. While rei nf orc em ent learni n g had c l e arly

motivated some of the ear li e st computational studies of learning, most of these researchers

had gone on to other things, such as pattern classiﬁcation, supervised learning, and

adaptive control, or they had abandoned the study of learning altogether. As a result, the

special issues involve d in learning how t o get something from the environment received

relatively little attention. In retrospec t, focusing on this idea was the critical step that

set t h is branch of research in motion. Little progress could be made in the computational

study of reinforcement learning until it was recognized that such a fundamental idea had

not yet been thoroughly explored.

The ﬁeld has come a long way since then, evolving and maturing in several directions.

Reinforcement learning has gradually become one of the most act i ve research areas in ma-

chine le ar nin g, artiﬁcial intelligence, and neu r al network research. The ﬁel d has developed

strong mathematical foundations and impressive applications. The computational study

of reinforcement learning is now a large ﬁeld, with hundreds of active res ear chers around

the world in diverse disciplines such as psychology, control theory, artiﬁcial intelligence,

and neuroscience. Particularly important h ave been the contributions establishing and

developing the relationships to the theory of optimal control and dynamic programming.

xvii

xviii Preface to the First Edition

The overall problem of learning from interaction to achieve goals is sti l l far from being

solved, but our understanding of it has improved signiﬁcantly. We can n ow place compo-

nent ideas, such as temporal-di↵erence learning, dynamic programming, and function

approximation, within a coherent perspective wi t h respect to the overall problem.

Our goal in writing this book was to provide a clear and simple account of the key

ideas and algori t h ms of reinforcement learning. We wanted our treatment to be acces si bl e

to readers in all of the related discipli ne s, but we could not cover all of these persp ect i ves

in detail. For the most part, our treatment t akes the point of view of artiﬁcial intelligence

and engineering. Coverage of connections to other ﬁelds we leave to others or to another

time. We also chose not to produce a rigorous formal treatment of reinf or ce ment learning.

We did not reach for the highest possible level of mathematical abstraction and did not

rely on a theorem–proof format. We tried to choose a level of mathematical detail that

points the mathematically inclined in the right directions without distracting from the

simplicity and potential generality of the underlying ideas.

...

In some sense we have been working toward this book for thirty years, and we have lots

of people t o thank. First, we thank those who have personally helped us develop the overall

view presented in this book: Harry Klopf, for helping us recognize that reinforcement

learning needed to be revived; Chris Watkins, Dimitri Bert sekas, John Tsitsiklis, and

Paul Werbos, for helping us see the value of the relationships to dynamic programming;

John Moore and Jim Kehoe, for insights and inspirations from animal learning theory;

Oliver Selfridge, for emphasizing the breadth and importance of adaptation; and, more

generally, our coll eagu es and stu de nts who have contributed in countless ways: Ron

Williams, Charles Anderson, Satinder Singh, Sridhar Mahadevan, Steve Bradtke, Bob

Crites, Peter Dayan, and Leemon Baird. Our view of reinforcement learning has been

signiﬁcantly enriched by discussions with Paul Cohen, Paul Utgo↵, Martha Steenstrup,

Gerry Tesauro, Mike Jordan, Leslie Kaelbling, Andrew Moore, Chris Atkeson, Tom

Mitchell, Nils Nilsson, Stuart Russell, Tom D ie t t er ich, Tom Dean, and Bob Narendra.

We thank Michael Littman, Gerry Tesauro, Bob Cr i te s, S at in d er Si n gh, an d Wei Zh ang

for prov i di n g speciﬁcs of Sections 4.7, 15.1, 15.4, 15.4, and 15.6 respectively. We thank

the Air Force Oﬃce of Scientiﬁc Resear ch, the National Science Foundation, and GTE

Laboratories for their long and farsight e d support.

We also wish to thank the many people who have read d raf t s of this book and

provided valuable comments, including Tom Kalt, John Tsitsiklis, Pawel Cichosz, Olle

G¨allmo, Chuck Anderson, Stuart Russell, Ben Van Roy, Paul Steenst ru p, Paul Cohe n,

Sridhar Mahadevan, Jett e Randlov, Brian Sheppard, Thomas O’Connell, Richard Coggins,

Cristina Versino, John H. Hiett, Andreas Badelt, Jay Ponte, Joe Beck, Justus Piater,

Martha Steenstrup, Satinder Singh, Tommi Jaakkola, Dimitri Bertsekas, Torbj¨orn Ekman,

Christina Bj¨orkman, Jakob Carlstr¨om, and Olle Palmgren. Finally, we thank Gwyn

Mitchell for helping in many ways , and Harry Stanton and Bob Prior f or being our

champions at MIT Press.

剩余547页未读，继续阅读

yuanzelong2013

粉丝: 2
资源: 2

强化学习：理查德·S·萨顿和安德鲁·G·巴特的第二版导论

Reinforcement Learning: An Introduction second edition

Reinforcement learning--an introduction second edition

Reinforcement Learning An Introduction

增强学习导论代码Reinforcement Learning - An Introduction(Second edition, Draft)

Reinforcement Learning: An Introduction(Second Edition)第一章TicTacToe例子Qt程序

Reinforcement Learning An Introduction(2nd)2018.pdf

强化学习：简介，第二版（草稿）Reinforcement Learning: An Introduction, Second Edition (Draft)

Reinforcement Learning: An Introduction

增强学习 Reinforcement Learning: An Introduction

Reinforcement Learning：An Introduction.pdf

最新资源