Publications

Refereed Conferences

  • Conservative Bandits. Yifan Wu, Roshan Shariff, Tor Lattimore, and Csaba Szepesvári. In International Conference on Machine Learning (ICML), New York, NY, USA. June 20–22, 2016. (25% acceptance)
    Abstract

    We study a novel multi-armed bandit problem that models the challenge faced by a company wishing to explore new strategies to maximize revenue whilst simultaneously maintaining their revenue above a fixed baseline, uniformly over time. While previous work addressed the problem under the weaker requirement of maintaining the revenue constraint only at a given fixed time in the future, the algorithms previously proposed are unsuitable due to their design under the more stringent constraints. We consider both the stochastic and the adversarial settings, where we propose, natural, yet novel strategies and analyze the price for maintaining the constraints. Amongst other things, we prove both high probability and expectation bounds on the regret, while we also consider both the problem of maintaining the constraints with high probability or expectation. For the adversarial setting the price of maintaining the constraint appears to be higher, at least for the algorithm considered. A lower bound is given showing that the algorithm for the stochastic setting is almost optimal. Empirical results obtained in synthetic environments complement our theoretical findings.

  • Exploiting Symmetries to Construct Efficient MCMC Algorithms With an Application to SLAM. Roshan Shariff, András György, and Csaba Szepesvári. In Artificial Intelligence & Statistics (AISTATS), San Diego, CA, USA. May 9–12, 2015. (29% acceptance)
    Abstract

    The Metropolis-Hastings (MH) algorithm is a flexible method to generate samples from a target distribution, a key problem in probabilistic inference. In this paper we propose a variation of the MH algorithm based on group moves, where the next state is obtained by first choosing a random transformation of the state space and then applying this transformation to the current state. This adds much-needed flexibility to the “textbook” MH algorithm where all measures involved must be given in terms of densities with respect to a common reference measure. Under mild conditions, our main result extends the acceptance probability formula of the textbook algorithm to MH algorithms with group moves. We work out how the new algorithms can be used to exploit a problem’s natural symmetries and apply the technique to the simultaneous localization and mapping (SLAM) problem, obtaining the first fully rigorous justification of a previous MCMC-based SLAM method. New experimental results comparing our method to existing state-of-the-art specialized methods on a standard range-only SLAM benchmark problem validate the strength of the approach.

  • Lunar Lander: A Continuous-Action Case Study for Policy-Gradient Actor-Critic Algorithms. Roshan Shariff and Travis Dick. In Reinforcement Learning & Decision Making (RLDM), Princeton, NJ, USA. October 25–27, 2013.
    Abstract

    We empirically investigate modifications and implementation techniques required to apply a policy-gradient actor-critic algorithm to reinforcement learning problems with continuous state and action spaces. As a test-bed, we introduce a new simulated task, which involves landing a lunar module in a simplified two-dimensional world. The empirical results demonstrate the importance of efficiently implementing eligibility traces and appropriately weighting the features produced by a tile coder.

Thesis

  • Exploiting Symmetries to Construct Efficient MCMC Algorithms With an Application to SLAM. Roshan Shariff. M.Sc. Thesis. University of Alberta, Edmonton, Canada. April 2015.
    Abstract

    Sampling from a given probability distribution is a key problem in many different disciplines. Markov chain Monte Carlo (MCMC) algorithms approach this problem by constructing a random walk governed by a specially constructed transition probability distribution. As the random walk progresses, the distribution of its states converges to the required target distribution. The Metropolis-Hastings (MH) algorithm is a generally applicable MCMC method which, given a proposal distribution, modifies it by adding an accept/reject step: it proposes a new state based on the proposal distribution and the existing state of the random walk, then either accepts or rejects it with a certain probability; if it is rejected, the old state is retained. The MH algorithm is most effective when the proposal distribution closely matches the target distribution: otherwise most proposals will be rejected and convergence to the target distribution will be slow. The proposal distribution should therefore be designed to take advantage of any known structure in the target distribution.

    A particular kind of structure that arises in some probabilistic inference problems is that of symmetry: the problem (or its sub-problems) remains unchanged under certain transformations. A simple kind of symmetry is the choice of a coordinate system in a geometric problem; translating and rotating the origin of a plane does not affect the relative positions of any points on it. The field of group theory has a rich and fertile history of being used to characterize such symmetries; in particular, topological group theory has been applied to the study of both continuous and discrete symmetries. Symmetries are described by a group that acts on the state space of a problem, transforming it in such a way that the problem remains unchanged. We consider problems in which the target distribution has factors, each of which has a symmetry group; each factor's value does not change when the space is transformed by an element of its corresponding symmetry group.

    This thesis proposes a variation of the MH algorithm where each step first chooses a random transformation of the state space and then applies it to the current state; these transformations are elements of suitable symmetry groups. The main result of this thesis extends the acceptance probability formula of the textbook MH algorithm to this case under mild conditions, adding much-needed flexibility to the MH algorithm. The new algorithm is also demonstrated in the Simultaneous Localization and Mapping (SLAM) problem in robotics, in which a robot traverses an unknown environment, and its trajectory and a map of the environment must be recovered from sensor observations and known control signals. Here the group moves are chosen to exploit the SLAM problem's natural geometric symmetries, obtaining the first fully rigorous justification of a previous MCMC-based SLAM method. New experimental results comparing this method to existing state-of-the-art specialized methods on a standard range-only SLAM benchmark problem validate the strength of the approach.

Workshops

  • Conservative Bandits. Yifan Wu, Roshan Shariff, Tor Lattimore, and Csaba Szepesvári. In Workshop on Machine Learning for eCommerce, Neural Information Processing Systems (NIPS), Montréal, QC, Canada. December 11, 2015.
    Abstract

    We study a novel multi-armed bandit problem that models the challenge faced by a company wishing to explore new strategies whilst simultaneously maintaining their revenue above a fixed baseline. We develop a new algorithm that satisfies the revenue constraint and also enjoys problem dependent regret guarantees with respect to the optimal action. Lower bounds are given showing that the new algorithm is almost optimal.

  • A Markov Chain Monte Carlo Approach To Simultaneous Localization and Mapping. Roshan Shariff, Péter Torma, András György, and Csaba Szepesvári. In Workshop on Monte Carlo Methods for Bayesian Inference in Modern Day Applications, Neural Information Processing Systems (NIPS), Vancouver, BC, Canada. December 10, 2010.
    Abstract

    A Markov-chain Monte Carlo based algorithm is provided that addresses the Simultaneous Localization and Mapping (SLAM) problem with general dynamics and observation model under open-loop control and provided that the map-representation is finite dimensional. To our knowledge this is the first provably consistent yet (close-to) practical solution to this problem. The algorithm has comparable results to the current state of the art as demonstrated in difficult loop closing situations.