Algorithms of Reinforcement LearningAccess
Why did I write this book?Good question! There exist a good number of really great books on Reinforcement Learning. So why a new book? I had selfish reasons: I wanted a short book, which nevertheless contained the major ideas underlying state-of-the-art RL algorithms (back in 2010), a discussion of their relative strengths and weaknesses, with hints on what is known (and not known, but would be good to know) about these algorithms. If I succeeded, time will tell. Or, you can, by sending me an e-mail at csaba.szepesvari@gmail.com
AbstractReinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms’ merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in artificial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming. We give a fairly comprehensive catalog of learning problems, describe the core ideas, note a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations.
Table of contents
AlgorithmsThe book, as the title suggests, describes a number of algorithms. These are the following. For algorithms whose names are boldfaced a pseudocode is also given.
Other unique features of the bookThe book discusses the following function approximation methods:
In addition, it discusses the relative merits of “batch” (LS-type) and incremental (TD-type) algorithms, the influence of the choice of the function approximation method (can we overfit in reinforcement learning?), various theoretically well-founded online learning algorithms (ever wondered about what an efficient exploration method should do?), actor-critic algorithms, and more.
Some connections to other parts of the literature (outside of machine learning) are mentioned. This includes connection of LSTD (and related methods) to Z-estimation (from statistics), sample-average approximation methods (from operations research), or the connection of policy gradient algorithms to likelihood ratio methods (from simulation optimization). In general, the book has many pointers to the literature. I think that the books provides a
New results in the book
Tutorial, slidesSome people find it much easier to learn from slides. Rich and I gave a tutorial at AAAI-2010 in July that was based on the book. The tutorial webpage is here. We used the following slides:
Errata (last update: June 25, 2018)In an ideal world, we would publish with no mistakes. The world is not ideal. The second best thing then is to keep a list of mistakes (and update the pdf!). For your convenience, here I give you an errata both as a pdf file and also in html.
The above errata is based mostly on a list provided by Gabor, who deserves a big thanks for reading through the text so carefully. I am also indebted to Sotetsu Koyamada who more recently have given me another lengthy list of typos. Earlier (and more recently), several individual read various parts of the draft and have submitted useful suggestions, which I tried to incorporate. They include:
Thank You! All the remaining errors are mine. |