Homepage of Csaba Szepesvári

Department of Computing Science
University of Alberta
Edmonton, Alberta
Canada T6G 2E8
Office: 311 Athabasca Hall
Email: szepesva AT cs DOT ualberta DOT ca
Phone: (780) 492-8581
Fax: (780) 492-6393
Book cover for my book 'Algorithms for Reinforcement Learning' [en/hu dict]
[CMPUT 412]
[math genealogy]


I am currently on leave at Deepmind, heading the foundation team there. When I am not on leave, I am with Department of Computing Science at the UofA, where I am a member of the Reinforcement Learning and Artificial Intelligence group, while I am also one of the PIs at AMII (Alberta Machine Intelligence Institute). It may be useful to keep a super short bio here.

While research and profession are important and fun, family trumps them! My wife is Beáta, our kids are Dávid, Réka, Eszter and Csongor. Lately, the family got expanded with new, awesome members! I am super happy to welcome Deon Nicholas and Curtis Wendlandt as amazing son-in-laws and an even bigger welcome to the smallest and newest member, Nolan Nicholas, my very first grandson. Yay!

Csaba's family


  • March, 2017: Bandit book, based on the bandit blog, coming soon! Partner in crime: Tor Once ready, the pdf will be available online. If you are interested in reviewing some chapters, send me an email.
  • August, 2017: I am starting a two year leave at Deepmind. During these two years, I won't take any interns or new graduate students at the UofA.
  • September, 2016: Bandits blog devoted to the bandits course's material and beyond.
  • August, 2016: Bandit Algorithms: My new graduate course co-developed with Tor.

For students

  • As noted above, during my leave at Deepmind I am not taking interns or new grad students. If you are interested in positions at Deepmind, and have the right qualitifications, Deepmind does have positions, both for interns and researchers.
  • Prospective grad students who are interested in joining the Statistical Machine Learning degree specialization program, which is a joint program between our department and the MathStat department should look here. Why should you apply?
  • Here is some advice for present and future grad students.
  • Responding to an "emergency situation", back in 2008 I have spent a few hours by searching on the IEEE website to collect recent references on applications of RL. Here are the results which are now linked to the page on Successes of RL. See also Satinder's similarly titled page here.

Research interests

Online learning research develops of learning algorithms that show good online performance, i.e., good performance while learning. Online learning tasks are sequential: In each step of the sequential process, the learning algorithm receives some information from the environment and makes a prediction so as to minimize the prediction loss. My team and I focus on interactive online learning problems, sequential processes where the predictions influence what future information is received. Interactive online learning problems are studied in various disciplines, such as within control theory under the name "dual control", or within machine learning itself in the area of reinforcement learning. While these problems are natural, interactive online learning is perhaps the area that is the least developed within online learning. To make progress, we explore special cases of interactive online learning, which allows us to identify and study the key issues in isolation. Besides, developing better algorithms for these special cases is of independent interest as they have often interesting uses on their own. We also study more fundamental questions as they arise. Big picture: I am interested in machine learning. In particular, I like to think about how to make the most efficient use of data in various situations and also how this can be done algorithmically. I am particularly interested in sequential decision making problems, which, when learning is put into the picture, leads to reinforcement learning. Up to 2008, the most frequently occuring keywords associated with my publications were theory (80), reinforcement learning (49), application (31), neural networks (24), stochastic approximation (17), function approximation (16), nonparametrics (15), control (15), online learning (13), adaptive control (10), performance bounds (10), vision (10), Monte-Carlo methods (8), particle filtering (8) . There is a fair amount of noise in the numbers here. And the chronology is also somewhat important. For example, I focused on neural networks up to around 2001, though they are coming into fashion again, so..