About Me

I am a PhD Candidate at the University of Alberta in the Department of Computing Science. My current research interests are in artificial intelligence, specifically in the possibilities for imbuing machines with traits of creativity and curiosity. I use the pronouns she/her or they/them.


Doctor of Philosophy in Computing Science (Expected)
Department of Computing Science, Faculty of Science, University of Alberta
Supervised by Dr. Patrick Pilarski
Committee Members: Dr. Martha White and Dr. Craig Chapman

Bachelor of Science in Honors Mathematics, 2010-2014
Department of Mathematical and Statistical Sciences, Faculty of Science, University of Alberta
Graduated with First-Class Honors
Dean's Honor Roll (First Class Standing) 2011-2012, 2012-2013, 2013-2014
Research Project: "Characterization of k-polygon graphs"
Supervised by Dr. Lorna Stewart


Submitted or In Preparation

  1. C. Linke, N. M. Ady, M. White, T. Degris, and A. White, "Adapting Behaviour via Intrinsic Reward: A Survey and Empirical Study," under submission. (arXiv Preprint)
    Learning about many things can provide numerous benefits to a reinforcement learning system. For example, learning many auxiliary value functions, in addition to optimizing the environmental reward, appears to improve both exploration and representation learning. The question we tackle in this paper is how to sculpt the stream of experience---how to adapt the system's behaviour---to optimize the learning of a collection of value functions. A simple answer is to compute an intrinsic reward based on the statistics of each auxiliary learner, and use reinforcement learning to maximize that intrinsic reward. Unfortunately, implementing this simple idea has proven difficult, and thus has been the focus of decades of study. It remains unclear which of the many possible measures of learning would work well in a parallel learning setting where environmental reward is extremely sparse or absent. In this paper, we investigate and compare different intrinsic reward mechanisms in a new bandit-like parallel-learning testbed. We discuss the interaction between reward and prediction learners and highlight the importance of introspective prediction learners: those that increase their rate of learning when progress is possible, and decrease when it is not. We provide a comprehensive empirical comparison of 15 different rewards, including well-known ideas from reinforcement learning and active learning. Our results highlight a simple but seemingly powerful principle: intrinsic rewards based on the amount of learning can generate useful behaviour, if each individual learner is introspective.
  2. N. M. Ady and P. M. Pilarski, "Four properties of specific curiosity that we should want curious machines to have," in preparation for submission to Frontiers in Artificial Intelligence.
    In recent years, the idea of curiosity has become a particularly powerful inspiration to researchers designing machine intelligence. The name curiosity has been applied to numerous methods that have helped systems better achieve their designers' goals. Yet despite roots in our knowledge of human curiosity, the bulk of existing machine curiosity methods forgo some of the properties that we most appreciate about human curiosity: directedness towards inostensible referents, transience, voluntary exposure, and increased positive attitudes towards objects of curiosity. These systems, together, have helped to explain the role of learning in motivation. To motivate our hypothesis, we here contribute an analysis of some common structural deficiencies in existing methods for computational curiosity. We further offer a proposal for a way to blend methods that learn both models and value functions to demonstrate the possibilities that our hypothesis offers for improving machine curiosity. As such, this work opens a new line of research for machine curiosity and how we might better understand curiosity in other domains.

Refereed Journal Publications

  1. J. Günther, N. M. Ady, A. Kearney, M. R. Dawson, P. M. Pilarski, "Examining the Use of Temporal-Difference Incremental Delta-Bar-Delta for Real-World Predictive Knowledge Architectures," accepted February 26, 2020 to Frontiers in Robotics and AI - Computational Intelligence in Robotics. (Open-access article) (arXiv Preprint)
    Predictions and predictive knowledge have seen recent success in improving not only robot control but also other applications ranging from industrial process control to rehabilitation. A property that makes these predictive approaches well suited for robotics is that they can be learned online and incrementally through interaction with the environment. However, a remaining challenge for many prediction-learning approaches is an appropriate choice of prediction-learning parameters, especially parameters that control the magnitude of a learning machine's updates to its predictions (the learning rate or step size). To begin to address this challenge, we examine the use of online step-size adaptation using a sensor-rich robotic arm. Our method of choice, Temporal-Difference Incremental Delta-Bar-Delta (TIDBD), learns and adapts step sizes on a feature level; importantly, TIDBD allows step-size tuning and representation learning to occur at the same time. We show that TIDBD is a practical alternative for classic Temporal-Difference (TD) learning via an extensive parameter search. Both approaches perform comparably in terms of predicting future aspects of a robotic data stream. Furthermore, the use of a step-size adaptation method like TIDBD appears to allow a system to automatically detect and characterize common sensor failures in a robotic application. Together, these results promise to improve the ability of robotic devices to learn from interactions with their environments in a robust way, providing key capabilities for autonomous agents and robots.

Refereed Conference Presentations

  1. C. Linke, N. M. Ady, T. M. Degris, M. White, A. White, "Investigating Curiosity for Multi-Prediction Learning," 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), July 7-10, 2019, McGill University, Montréal, Québec, Canada. (Poster and abstract.)
    This paper investigates a computational analog of curiosity to drive behavior adaption in learningsystems with multiple prediction objectives. The primary goal is to learn multiple independent predictionsin parallel from data produced by some decision making policy—learning for the sake of learning. We canframe this as a reinforcement learning problem, where a decision maker’s objective is to provide trainingdata for each of the prediction learners, with reward based on each learner’s progress. Despite the varietyof potential rewards—mainly from the literature on curiosity and intrinsic motivation—there has been littlesystematic investigation into suitable curiosity rewards in a pure exploration setting. In this paper, weformalize this pure exploration problem as a multi-arm bandit, enabling different learning scenarios to besimulated by different types of targets for each arm and enabling careful study of the large suite of potentialcuriosity rewards. We test 15 different analogs of well-known curiosity reward schemes, and compare theirperformance across a wide array of prediction problems. This investigation elucidates issues with severalcuriosity rewards for this pure exploration setting, and highlights a promising direction using a simplecuriosity reward based on the use of step-size adapted learners.
  2. J. Günther, A. Kearney, N. M. Ady, C. Sherstan, M. R. Dawson, and P. M. Pilarski, "GVFs: General Value Freebies," 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), July 7-10, 2019, McGill University, Montréal, Québec, Canada. (Poster and abstract.)
    Machine learning offers the ability for machines to learn from data and improve their performance on a given task. The data used in learning is usually provided either in terms of a predesigned data set or as sampled through interaction with the environment. However, there is another oft-forgotten source of data available for machines to learn from: the learning process itself. As algorithms learn from data and interact with their environment, learning mechanisms produce a continuous stream of data in terms of errors, parameters changes, updates and estimates. These signals have great potential for use in learning and decision making. In this paper, we investigate the utility of such “freebie” signals that are produced either as the output of learning or due to the act of learning, i.e., updates to weights and learning rates. Specifically, we implement a prediction learner that models its environment via multiple General Value Functions (GVFs) and deploy it within a robotic setting. The first signal of interest that we study is one known as the Unexpected Demon Error (UDE), which is closely related to the Temporal-Difference (TD) error and can be tied to the notion of surprise. Detecting surprise reveals important information not only about the learning process but also about the environment and the functioning of the agent within its environment. The second type of signal that we investigate is the agent's learning step size. For this purpose, a vectorized step-size adaptation algorithm is used to update the step sizes over the course of learning. Observing the step-size distribution over time appears to allow a system to automatically detect and characterise common sensor failures in the physical system. We suggest that by adding introspective signals such as UDE and step sizes analysis to the available data, autonomous and long-lived agents can become better aware of their interactions with the environment, resulting in a superior ability to make decisions.
  3. J. Günther, A. Kearney, N. M. Ady, M. R. Dawson, P. M. Pilarski, “Meta-learning for Predictive Knowledge Architectures: A Case Study Using TIDBD on a Sensor-rich Robotic Arm,” Proc. of the 18th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2019), Montreal, Canada, May 13–17, 2019, IFAAMAS, pp. 1967–1969. (Extended abstract and poster.) (PDF)
  4. N. M. Ady and P. M. Pilarski, “Comparing Reinforcement Learning Methods for Computational Curiosity through Behavioural Analysis,” 3rd Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM), University of Michigan, Ann Arbor, Michigan, USA, June 11-14, 2017. (Poster and abstract.)
  5. N. M. Ady and P. M. Pilarski, “Unifying Curious Reinforcement Learners,” Designing for Curiosity: An Interdisciplinary Workshop, ACM CHI Conference on Human Factors in Computing Systems (CHI 2017), Denver, Colorado, USA, May 6-11, 2017. (Poster and extended abstract.)
  6. N. M. Ady and P. M. Pilarski, “Domains for Investigating Curious Behaviour in Reinforcement Learning Agents,” 11th Women in Machine Learning (WiML) Workshop, Barcelona, Spain, December 5, 2016. (Poster and abstract.)
  7. N. Ady and F. Rice, “A Disciplinary Divide: Does Discipline-Specific Coaching Make a Difference?” Canadian Writing Centres Association / L’Association Canadienne des Centres de Rédaction Conference: Writing without borders, Brockville, Ontario, Canada, May 23, 2014. (Abstract and oral presentation.)

Multidisciplinary Symposia Presentations

  1. J. Ventura, N. M. Ady, and P. M. Pilarski, “An Exploration of Artificial Curiosity and Reinforcement Learning in a Simple Robot,” WISEST Poster Session, University of Alberta, 2017. https://doi.org/10.7939/R36W96Q00 (Poster.)
  2. N. M. Ady and P. M. Pilarski, “Behaviour of Curious Reinforcement Learners Faced with Varying Reward,” 2017 CRA-W Grad Cohort Workshop Poster Session, Washington D.C., USA, April 7, 2017. (Poster and abstract.)
  3. N. M. Ady, “Undergraduate Preparation for Writing in the Discourse Community of Mathematics” Canadian Undergraduate Mathematics Conference, Carleton University, Ottawa, Ontario, Canada, July 2-5, 2014. (Oral presentation and abstract.)
    Upon completion of their undergraduate degree, are aspiring mathematicians prepared for the writing needed in mathematical academia? Mathematicians in academic positions must communicate a variety of ideas to a variety of audiences, and so must learn the skills needed to do so effectively. This talk considers what those writing skills are and their development in an undergraduate program. The talk looks at current undergraduate mathematics education in writing. A discussion of discourse style and writing culture in mathematics allows an understanding of the gap facing students when they graduate. Given the importance of good writing and communication to the math community, the conclusions include improvements that might be made both for the sake of undergraduates and to improve the strength of the mathematical discourse community.
  4. N. M. Ady, A. St. Arnaud, and L. K. Stewart, “Recognition of k-polygon graphs,” 3rd Annual University of Alberta Undergraduate Research Symposium, University of Alberta, Edmonton, Alberta, Canada, November 22, 2013. (Poster.)

Technical Reports

  1. N. M. Ady, "Curious Actor-Critic Reinforcement Learning with the Dynamixel-bot," University of Alberta, 2017. https://doi.org/10.7939/R3B853Z7S (Technical report, 7 pages.)
    Curiosity is a crucial, but not yet well-understood component of intelligence and a better understanding of existing models may lead to a better understanding of curiosity as a whole. In this work, we present a physical robot implementation of the basic curiosity loop introduced by Gordon and Ahissar in 2012. In the same way that Gordon and Ahissar produced a rough simulation of a rat’s whiskers, this work presents a physical model using two servos to create the whisking actions.
  2. N. M. Ady, "Parameter Screening for Curious Reinforcement Learner Motivated by Unexpected Error," University of Alberta, 2017. https://doi.org/10.7939/R3G15TS0P (Technical report, 8 pages + appendix.)
    Curiosity is a critical component of intelligence. One method of motivating curious behaviour in computational systems is to use reinforcement learning to learn which decisions maximize the amount of unexpected error observed by a predictive component. However, reinforcement learning algorithms for prediction and control require the system designer to set multiple parameters, and it is unknown how such a curious system’s behaviour might vary depending on parameter settings. Eight parameters (one learning rate, continuation probability, trace decay parameter for both prediction and control, 'epsilon' (the probability of a random action for epsilon-greedy control) and beta-naught parameter for computation of White’s (2015) unexpected error) were tested in an inscribed central composite experimental design. The response variable was the return. We found that the linear effects on return for epsilon, the learning rate for control, the continuation probability for prediction, and the beta-naught parameter for unexpected error were significant, along with the quadratic interactions between epsilon and beta-naught, epsilon and the continuation probability for prediction, beta-naught and the continuation probability for prediction, and the learning rate and continuation probability for prediction.

Invited Presentations

  1. “Curiosity in Machine Intelligence”
    Amii’s AI Meet-up (for Alberta Machine Intelligence Institute), Startup Edmonton, Edmonton, Alberta, Canada, January 22, 2019. (Half-length seminar.)
  2. “Overview of Curiosity in Computational Reinforcement Learning”
    Princeton Neuroscience Institute and Psychology Department, Princeton University, Princeton, New Jersey, June 28, 2018. (Half-length seminar.)
  3. “Writing Centres: Becoming a better writer from the comfort of your own campus”
    Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada, October 2016.
  4. “Curious About Computational Curiosity? Introduction and Experiments”
    Reasoning and Learning Lab, McGill University, Montréal, Québec, Canada, August 9, 2016. (Full-length seminar.)
  5. “On Being a Graduate Student in Computing Science”
    Microsoft Store, West Edmonton Mall, Edmonton, Alberta, Canada, April 23, 2016.
  6. “License Plate Recognition” co-presented with T. Griffith and A. Neufeld
    Edmonton Office of Traffic Safety, Edmonton, Alberta, Canada, April 17, 2014.

Selected Employment History

Research Positions

PhD Candidate and Graduate Research Assistant
Supervisor: Dr. Patrick Pilarski
Department of Computing Science, University of Alberta, Edmonton
Reinforcement Learning and Artificial Intelligence Lab Alberta Machine Intelligence Institute Bionic Limbs for Improved Natural Control
Summer 2015
Systems Administrator - Student Co-op
Government of Canada, Ottawa
Fall 2014
Scientist - Student Co-op
Government of Canada, Ottawa
Summer 2014
Research Assistant (Wireless Networks)
Supervisor: Dr. Mike MacGregor
Department of Computing Science, University of Alberta, Edmonton
Summer 2013
Research Assistant (Graph Algorithms)
Supervisor: Dr. Lorna Stewart
Department of Computing Science, University of Alberta, Edmonton

Academic Teaching Positions

Winter 2013 to Winter 2016
Writing Tutor and Workshop Instructor
Centre for Writers, University of Alberta, Edmonton
Winter 2016, Fall 2015, Winter 2015
Teaching Assistant for CMPUT 174
Introduction to the Foundations of Computation I
Department of Computing Science, University of Alberta, Edmonton
Winter 2014
Teaching Assistant for CMPUT 474
Formal Languages, Automata and Computability
Department of Computing Science, University of Alberta, Edmonton