xviii Preface to the First Edition
The overall problem of learning from interaction to achieve goals is sti l l far from being
solved, but our understanding of it has improved significantly. We can n ow place compo-
nent ideas, such as temporal-di↵erence learning, dynamic programming, and function
approximation, within a coherent perspective wi t h respect to the overall problem.
Our goal in writing this book was to provide a clear and simple account of the key
ideas and algori t h ms of reinforcement learning. We wanted our treatment to be acces si bl e
to readers in all of the related discipli ne s, but we could not cover all of these persp ect i ves
in detail. For the most part, our treatment t akes the point of view of artificial intelligence
and engineering. Coverage of connections to other fields we leave to others or to another
time. We also chose not to produce a rigorous formal treatment of reinf or ce ment learning.
We did not reach for the highest possible level of mathematical abstraction and did not
rely on a theorem–proof format. We tried to choose a level of mathematical detail that
points the mathematically inclined in the right directions without distracting from the
simplicity and potential generality of the underlying ideas.
...
In some sense we have been working toward this book for thirty years, and we have lots
of people t o thank. First, we thank those who have personally helped us develop the overall
view presented in this book: Harry Klopf, for helping us recognize that reinforcement
learning needed to be revived; Chris Watkins, Dimitri Bert sekas, John Tsitsiklis, and
Paul Werbos, for helping us see the value of the relationships to dynamic programming;
John Moore and Jim Kehoe, for insights and inspirations from animal learning theory;
Oliver Selfridge, for emphasizing the breadth and importance of adaptation; and, more
generally, our coll eagu es and stu de nts who have contributed in countless ways: Ron
Williams, Charles Anderson, Satinder Singh, Sridhar Mahadevan, Steve Bradtke, Bob
Crites, Peter Dayan, and Leemon Baird. Our view of reinforcement learning has been
significantly enriched by discussions with Paul Cohen, Paul Utgo↵, Martha Steenstrup,
Gerry Tesauro, Mike Jordan, Leslie Kaelbling, Andrew Moore, Chris Atkeson, Tom
Mitchell, Nils Nilsson, Stuart Russell, Tom D ie t t er ich, Tom Dean, and Bob Narendra.
We thank Michael Littman, Gerry Tesauro, Bob Cr i te s, S at in d er Si n gh, an d Wei Zh ang
for prov i di n g specifics of Sections 4.7, 15.1, 15.4, 15.4, and 15.6 respectively. We thank
the Air Force Office of Scientific Resear ch, the National Science Foundation, and GTE
Laboratories for their long and farsight e d support.
We also wish to thank the many people who have read d raf t s of this book and
provided valuable comments, including Tom Kalt, John Tsitsiklis, Pawel Cichosz, Olle
G¨allmo, Chuck Anderson, Stuart Russell, Ben Van Roy, Paul Steenst ru p, Paul Cohe n,
Sridhar Mahadevan, Jett e Randlov, Brian Sheppard, Thomas O’Connell, Richard Coggins,
Cristina Versino, John H. Hiett, Andreas Badelt, Jay Ponte, Joe Beck, Justus Piater,
Martha Steenstrup, Satinder Singh, Tommi Jaakkola, Dimitri Bertsekas, Torbj¨orn Ekman,
Christina Bj¨orkman, Jakob Carlstr¨om, and Olle Palmgren. Finally, we thank Gwyn
Mitchell for helping in many ways , and Harry Stanton and Bob Prior f or being our
champions at MIT Press.