10MB精简版《强化学习介绍》：简介与资源

需积分: 16 147 浏览量更新于2024-07-18 1 收藏 10.39MB PDF 举报

《强化学习：入门》（Reinforcement Learning: An Introduction, second edition）是由Richard S. Sutton和Andrew G. Barto共同编著的经典教材，隶属于 Adaptive Computation and Machine Learning 系列之一。该书的初衷是为了提供一个系统、易懂的指南，让读者深入了解强化学习这一领域。此版本相较于网上的其他版本，体积显著缩小，仅有10M多，更加方便下载和阅读。强化学习是一种机器学习方法，它关注智能体在与环境互动的过程中，通过奖励机制学习如何做出最优决策以最大化长期累积收益。本书详细介绍了该领域的基础理论，包括马尔科夫决策过程（Markov Decision Processes, MDP）、价值函数、策略、动态规划、Q学习和深度强化学习等核心概念。作者通过丰富的实例和应用场景来阐述这些理论，帮助读者掌握如何设计和实现强化学习算法。书中特别强调了实践性，不仅提供了理论分析，还包含了许多编程练习和实际项目的示例，使读者能在实践中加深理解。此外，封面设计灵感来源于Jette Randløv开发的一个模拟自行车控制系统的强化学习应用，展示了强化学习在实际工程中的应用潜力。版权方面，该书遵循Creative Commons Attribution-NonCommercial-NoDerivs 2.0 Generic License，允许非商业性的复制和分享，但禁止对作品进行修改或衍生。《强化学习：入门》于2018年出版，并在美国西切斯特出版服务公司印刷和装订，体现了其严谨的专业性和国际影响力。《强化学习：入门》是强化学习领域的权威指南，适合希望在这个快速发展的领域深入学习的科研人员、工程师以及对人工智能感兴趣的读者。无论是作为学术研究的基石，还是作为技术实践的参考书籍，它都是不可或缺的资源。

xvi Preface to the Second Edition

deserve our deepest gratitude for this edition as well, which would not exist were it not

for their contributions to editi on number one. To that long list we must add many others

who contributed speciﬁcally to the second edition. Our students over the many years that

we have taught this material contributed in countless ways: exposing err or s, o↵ering ﬁxes,

and—not the least—being confused in places where we could have explained things better.

We especially thank Martha Steenstrup for reading and providing detailed comments

throughout. The chapters on psychology and neuroscience could n ot have been written

without the help of many experts in those ﬁelds. We thank John Moore for his patient

tutoring over many many years on animal learning experiments, theory, and neuroscience,

and for his careful reading of multiple drafts of Chapters 14 and 15. We also thank Matt

Botvinick, Nathaniel Daw, Peter Dayan, and Yael Niv for their penetrating comments on

drafts of these chapter, their essential guidance through the massive literature, and their

interception of many of our errors in early drafts. Of course, the remaining errors in these

chapters—and there must still be some—are totally our own. We thank Phil Thomas for

helping us make these chapters accessible to non-psychologists and non-neuroscientists,

and we thank Peter Sterling for helping us improve the exposition. We are gratef ul to Jim

Houk for introducing us to the subject of information processing in the basal ganglia and

for alerting us to other relevant aspect s of neuroscienc e. Jos´e Mart´ınez, Terry Sejnowski,

David Silver, Gerry Tesauro, Georgios Theocharous, and Phil Thomas generously helped

us understand d et ai l s of their reinforcement learning applications for inclusion in the

case-studies chap t er , and they provided helpful comments on drafts of these sections.

Special thanks are owed to David Silver for helping us better understand Monte Carlo

Tree Search and the DeepMind Go-playing programs. We thank George Konidaris for his

help with the section on the Fourier basis. Emilio Cartoni, Thomas Cederborg, Stefan

Dernbach, Clemens Rosenbaum, Patr i ck Taylor, Thomas Colin, and Pierre-Luc Bacon

helped us in a number important ways for which we are most grateful.

Sutton would also like to thank the members of the Reinforcement Learning and

Artiﬁcial Intelli gence laboratory at the Uni versity of Alberta for contributions to the

second edition. He owes a particular debt to Rupam Mahmood for essential contributions

to the treatment of o↵-policy Monte Carlo methods in Chapter 5, to Hamid Maei for

helping develop the perspective on o↵-policy learning presented in Chapter 11, to Eric

Graves for conducting the experiments in Chapter 13, to Shangtong Zhang for replicating

and thus verifying almost all the experimental results, to Kris De Asis for improving

the new technical content of Chapters 7 and 12, and to Harm van Seijen for insights

that led to the separation of n-step methods from eligibility traces and (along with Hado

van Hasselt) for the ideas involving exact equivalence of forward and backward views of

eligibility traces presented in Chapter 12. Sutton also gratefully acknowledges the support

and freedom he was granted by the Government of Al berta and the National Science and

Engineering Research Coun ci l of Canada throughout the period during which the second

edition was conceived and written. In particular, he would like to thank Rand y Goebel

for creating a supportive and far-sighted environment for research in Alberta. He would

also like to thank DeepMind their support in the last six months of writing the book.

Finally, we owe thanks to the many careful readers of drafts of the second edition t hat

we p ost ed on the internet. They found many errors that we had missed and alerted us to

potent i al points of confusion.

Preface to the First Edition

We ﬁrst came to focus on what is now known as reinforcement learnin g in late 1979. We

were both at the Univer si ty of Massachusetts, working on one of the earliest projects to

revive the idea that networks of neuronlike adaptive elements might prove to be a promisi n g

approach to artiﬁcial adaptive intelligence. The project explored the “heterostatic theory

of adaptive systems” developed by A. Harry Klopf. Harry’s work was a rich source of

ideas, and we were permitted to explore them critically and compare them with the l ong

history of prior work i n adaptive systems. Our task became one of teasing the ideas apart

and understanding their relationships and relative importance. This continues today,

but in 1979 we came to realize that perhaps the simplest of the ideas, which had l on g

been taken for granted, had received surprisingly little attenti on from a computational

perspective. This was simply the idea of a learning system that wants something, that

adapts its behavior in order to maximize a special signal from its environment. This

was the idea of a “hedonistic” learning system, or, as we would say now, the id ea of

reinforcement learning.

Like others, we had a sense that reinforcement learning had been thoroughly explored

in the early days of cybernetics and artiﬁcial intelligence. On closer inspection, though,

we found that it had been explored only slightly. While rei n for cement learn i ng h ad c l ear ly

motivated some of the earl i est computational studies of learning, most of these researchers

had gone on to other things, such as pattern classiﬁcation, supervised learning, and

adaptive control, or they had abandoned the study of learning altogeth er. As a result, the

special issues involved in learning how t o get something from the environment received

relatively little attention. In retrospect, focusing on this idea was the critical step that

set this branch of research in motion. Little progress could be made in the computational

study of reinforcement learning until it was recognized that such a fundamental idea had

not yet been thoroughly explored.

The ﬁeld has come a long way since then, evolving and maturing in several directions.

Reinforcement learning has gradually become one of the most active research areas in ma-

chine learning, artiﬁcial intelligence, and neural network research. The ﬁeld has developed

strong mathematical foundations and impressive applications. The computational study

of reinforcement learning is now a large ﬁeld, with hundreds of active researchers around

the world in diverse disciplines such as psychology, control theory, artiﬁcial intelligence,

and neuroscience. Particularly important have been the contributions establ i sh in g and

developing the relationships to the theory of optimal control and dynamic programming.

xvii

xviii Preface to the First Edition

The overall problem of learning from interaction to achieve goals is still far from being

solved, but our understanding of i t has impr oved signiﬁcantly. We can now place compo-

nent ideas, such as temporal-di↵erence learning, dynamic programming, and function

approximation, within a coherent perspective with respect to the overall problem.

Our goal in writing this book was to provide a clear and simple account of the key

ideas and algori t h ms of reinforcement learning. We wanted our treatment to be accessible

to readers in all of the related discipli nes, but we could not cover all of these perspect i ves

in detail. For the most part, our treatment t akes the point of view of artiﬁcial intelligence

and engineering. Coverage of connections to other ﬁelds we leave to others or to another

time. We also chose not to produce a rigorous formal treatment of reinf orcement learning.

We did not reach for the highest possible level of mathematical abstraction and did not

rely on a theorem–proof format. We tried to choose a level of mathematical detail that

point s the mathematically inclined in the right directions without distracting from the

simplicity and potential gener al i ty of the underlying ideas.

...

In some sense we have been working toward this book for thirty years, and we have lots

of people to thank. First, we thank those who have personally helped us develop the overall

view presented in this book: Harry Klopf, for helping us recognize that reinforcement

learning needed to be revived; Chris Watkins, Dimitri Bertsekas, John Tsitsi k li s, and

Paul Werbos, for helping us see the value of the relationships to dynamic programming;

John Moore and Jim Kehoe, for insights and inspirations from animal learning theory;

Oliver Selfridge, for emphasizing the breadth and importance of adaptation; and, more

generally, our colleagu es and students who have contributed in countless ways: Ron

Williams, Charles Anderson, Satinder Singh, Sridhar Mahadevan, Steve Bradtke, Bob

Crites, Peter Dayan, and Leemon Baird. Our view of reinforcement learning has been

signiﬁcantly enriched by discussions with Paul Cohen, Paul Utgo↵, Martha Steenstrup,

Gerry Tesauro, Mike Jordan, Leslie Kaelbling, Andrew Moore, Chris Atkeson, Tom

Mitchell, Nils Nilsson, Stuart Russell, Tom Di et t eri ch, Tom Dean, and Bob Narendra.

We thank Michael Littman, Gerry Tesauro, Bob Crites, Satinder S in gh , an d Wei Zh ang

for prov i d in g speciﬁcs of Sections 4.7, 15.1, 15.4, 15.4, and 15.6 respectively. We thank

the Air Force Oﬃce of Scientiﬁc Resear ch, the National Science Foundation, and GTE

Laboratories for their long and farsight ed sup port.

We also wish to than k the many people who have read drafts of this b ook and

provided valuable comments, including Tom Kalt, John Tsitsiklis, Pawel Cichosz, Olle

G¨allmo, Chuck Anderson, Stuart Russell , Ben Van Roy, Paul Steenstrup, Paul Cohen,

Sridhar Mahadevan, Jet te Randlov, Brian Sheppard , Thomas O’Connell, Richard Coggins,

Cristina Versino, John H. Hiett, Andreas Badelt, Jay Ponte, Joe Beck, Justus Piater,

Martha Steenstrup, Satinder Singh, Tommi Jaakkola, Dimitri Bertsekas, Torbj¨orn Ekman,

Christina Bj¨orkman, Jakob Carlstr¨om, and Olle Palmgren. Finally, we thank Gwyn

Mitchell for helping in many ways, and Harry Stanton and Bob Prior f or being our

champions at MIT Press.

剩余547页未读，继续阅读

cengagewang

粉丝: 4
资源: 19

10MB精简版《强化学习介绍》：简介与资源

Reinforcement Learning An Introduction second edition

Reinforcement Learning_An Introduction多版本合集

强化学习导论中文版 增强学习导论中文版 Reinforcement learning an introduction 中文版.

Reinforcement Learning_An Introduction（英文优秀专业书籍推荐）.pdf

强化学习入门经典：Reinforcement Learning_An Introduction

Reinforcement_learning_An_introduction 第二版

Reinforcement Learning：An Introduction_Complete Draft.

Reinforcement Learning - An introduction

Reinforcement Learning: An Introduction

Reinforcement Learning - An Introduction

最新资源

强化学习导论中文版增强学习导论中文版 Reinforcement learning an introduction 中文版.