CMPUT 654 Fall 2016: Bandit Algorithms

Course contents

Decision making in the face of uncertainty is a significant challenge in machine learning. Which drugs should a patient receive? How should I allocate my study time between courses? Which version of a website will return the most revenue? What move should be considered next when playing chess/go? All of these questions can be expressed in the multi-armed bandit framework where a learning agent sequentially takes actions, observes rewards and aims to maximise the total reward over a period of time. The framework is now very popular, used in practice by big companies, and growing fast (dozens of papers every year). The focus of the course will be on understanding the statistical ideas, mathematics and implementation details for current state-of-the-art algorithms.


Students planning to complete the course should have basic knowledge of calculus (taking derivatives, maximising functions), linear algebra (matrices, matrix inversion) and probability.