Page numbers refer to the printed copy. The online version (the “draft”) is up-to-date. Thanks to my PhD student, Gabor Bartok and Sotetsu Koyamada who have found many of these errors.

- p. xi. Section 2) should be Section 2 (no closing parenthesis)
- p. 1. The dot should be in between the bars in the definition of the infinity norm, not
on the top. That is, ∥⋅∥
_{∞}is the intended form and not ||_{∞}. Also, in “which, if θ, which” the part “which, if θ” should be deleted. - p.2. The footnote from p.5 explaining the meaning of “almost surely” should be moved here.
- p.5. In the example on gambling the personal pronoun “his” should be replaced by “her”.
- p.9. In Eq. (1.14) on the right-hand side of the equation Q(y,π(x)) should be Q(y,π(y)).
- p.12. In footnote 1, add “if” before “it”.
- p. 19. line 25: “gives the the sum of rewards along the trajectory” should be “gives the difference between the return along the trajectory and the value estimate at the start state”.
- p. 21. line 1: “then” should be “than”.
- p. 22. The text “goal is to approximate the value function V underlying ” should be deleted.
- p. 23. Delete “be” from “is no longer be guaranteed”. After θ
^{(λ)}in the middle of page delete “.”. The phrase “using V_{θ}” should be “ using the chosen features φ”. - p. 24. line 19: “full control learning task” should be “full, control-learning task”
- p. 25. “some methods using which” should be “some methods that avoid” On the same
page, in the figure caption, “true” should be “two”. Also, on the figure the labels L
_{n}(θ) and L(θ) are swapped. - p. 29. “this corresponds to starting with a diagonal matrix in RLSTD” should be “this corresponds to starting with a positive diagonal matrix in RLSTD”.
- p. 32. The word “complicate” (in the middle of page) should be “complicated”.
- p. 34. “has full column rank” should be “has full row rank”. On the same page, in Eq.
(2.17): Replace ∥θ
_{*}⊤φ-r∥^{2}with ∥θ_{ *}⊤φ-V ∥_{ μ}^{2}on the right-hand side (although note that V = r). - p. 40. The text “Gittins (1989) has shown” should be “Gittins (1989) showed”.
- p. 43. The 4th displayed equation and the text surrounding it should be deleted. This is
the equation that says that R
_{T }^{UCRL2(δ)}= O(D^{2}||^{2}|| log(T∕δ)∕ε+εT). This equation holds (under the cited conditions), but it does not lead itself to a logarithmic regret bound. - p. 43, line -5: Replace “. This happens” with “as happens to be the case when”.
- p. 45, Algorithm 10 (UCRL2): Instead of “repeat-until” the appropriate programming construct should have been “while true-end while” (the body of the loop is repeated indefinitely) [Hill Ma].
- p. 46, Algorithm 11 (OptSolve): In the repeat-until construction the conditions must be flipped. Two places [Hill Ma].
- p. 47. On line 4 of the 1st paragraph of Section 3.3., “optimalas” should be “optimal as”. On the same page, on line 3, after Eq. (3.1): “, Algorithm 12 the pseudocode of Q-learning.” should start with a full stop and is missing the word “shows”. So, the text should be “. Algorithm 12 shows the pseudocode of Q-learning.”.
- p. 48. Section 3.2 is mentioned twice in the same sentence (around the middle of the page). The second occurrence should be deleted.
- p. 56, Algorithm 16, line 7. The correct update equation is b ← b + R
_{t+1}⋅ z [Tom Schaul, Idsia]. - p. 57, Algorithm 16 (LSPI): Flip the condition in the until construct [Hill Ma].
- p. 58. The definition of regret should be R
_{T }^{}= Tρ^{*}-_{ T }^{}[Hamid Reza Maei, Stanford]. - p. 59. “likelihood ratio methods Glynn, 1990” should be “likelihood ratio methods (Glynn, 1990).”
- p. 61, Algorithm 18: Again, instead of repeat-until, one needs while(true)-endwhile [Hill Ma].
- p. 65. The definition of norm is missing the so-called homogeneity condition: For any
λ ∈ ℝ, v ∈ V , f(λv) = |λ|f(v). One the same page, in the bottom, “ℓ
^{∞}norms” should be “ℓ^{∞}norm”. - p. 66. “uniformly bounded” should be “bounded” (when mentioning a single function).
- p. 66. line 8 from the bottom, “f
_{n}(x) → 0 for each x” should be “Define f so that f(x) = 0 if x≠0 and f(0) = 1. Then f_{n}(x) → f(x) for each x.” Replace “However, ∥f_{n}- f∥_{∞}=_{∞}= 1 ⁄→ 0.” with “However, ∥f_{n}- f∥_{∞}= 1 ⁄→ 0.” - p. 67. “Polish mathematicians” should be singular: “Polish mathematician”.
- p. 68. On the top of page, “Assume that T is a γ-contraction.” should go into the next line.
- p. 68. In the first displayed equation the last inequality should be removed. In the next two displayed equations, on the right-hand side, replace with . Finally, in the 4th displayed equation on the page, remove from the right-hand side.
- p. 69. In the line preceding the definition of B(), “uniformly bounded” should be “bounded”.
- p. 69. “It is easy to see that V
^{π}”: At the end of the display following (in the bottom of the page) replace V with V^{π}. - p. 72. “for the final policy π, we have TV
^{π}= T^{π}V^{π}= V^{π}.” should be “for the final policy π, we have T^{*}V^{π}= T^{π}V^{π}= V^{π}.”.

For further information, visit http://www.ualberta.ca/~szepesva/RLBook.html.