Skip to main content

Showing 1–30 of 30 results for author: Devlin, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.13376  [pdf, other

    cs.LG

    Efficient Offline Reinforcement Learning: The Critic is Critical

    Authors: Adam Jelley, Trevor McInroe, Sam Devlin, Amos Storkey

    Abstract: Recent work has demonstrated both benefits and limitations from using supervised approaches (without temporal-difference learning) for offline reinforcement learning. While off-policy reinforcement learning provides a promising approach for improving performance beyond supervised approaches, we observe that training is often inefficient and unstable due to temporal difference bootstrapping. In thi… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  2. arXiv:2406.04208  [pdf, other

    cs.LG cs.AI

    Aligning Agents like Large Language Models

    Authors: Adam Jelley, Yuhan Cao, Dave Bignell, Sam Devlin, Tabish Rashid

    Abstract: Training agents to behave as desired in complex 3D environments from high-dimensional sensory information is challenging. Imitation learning from diverse human behavior provides a scalable approach for training an agent with a sensible behavioral prior, but such an agent may not perform the specific behaviors of interest when deployed. To address this issue, we draw an analogy between the undesira… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2312.02312  [pdf, other

    cs.LG cs.AI cs.CV

    Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games

    Authors: Lukas Schäfer, Logan Jones, Anssi Kanervisto, Yuhan Cao, Tabish Rashid, Raluca Georgescu, Dave Bignell, Siddhartha Sen, Andrea Treviño Gavito, Sam Devlin

    Abstract: Video games have served as useful benchmarks for the decision making community, but going beyond Atari games towards training agents in modern games has been prohibitively expensive for the vast majority of the research community. Recent progress in the research, development and open release of large vision models has the potential to amortize some of these costs across the community. However, it… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: Preprint

  4. arXiv:2303.16359  [pdf, ps, other

    cs.AI cs.CY cs.PL

    Adaptive Scaffolding in Block-Based Programming via Synthesizing New Tasks as Pop Quizzes

    Authors: Ahana Ghosh, Sebastian Tschiatschek, Sam Devlin, Adish Singla

    Abstract: Block-based programming environments are increasingly used to introduce computing concepts to beginners. However, novice students often struggle in these environments, given the conceptual and open-ended nature of programming tasks. To effectively support a student struggling to solve a given task, it is important to provide adaptive scaffolding that guides the student towards a solution. We intro… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: Preprint. Accepted as a paper at the AIED'22 conference

  5. arXiv:2303.02160  [pdf, other

    cs.HC cs.LG cs.RO

    Navigates Like Me: Understanding How People Evaluate Human-Like AI in Video Games

    Authors: Stephanie Milani, Arthur Juliani, Ida Momennejad, Raluca Georgescu, Jaroslaw Rzpecki, Alison Shaw, Gavin Costello, Fei Fang, Sam Devlin, Katja Hofmann

    Abstract: We aim to understand how people assess human likeness in navigation produced by people and artificially intelligent (AI) agents in a video game. To this end, we propose a novel AI agent with the goal of generating more human-like behavior. We collect hundreds of crowd-sourced assessments comparing the human-likeness of navigation behavior generated by our agent and baseline AI agents with human-ge… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: 18 pages; accepted at CHI 2023

  6. arXiv:2302.07985  [pdf, other

    cs.LG cs.AI

    Trust-Region-Free Policy Optimization for Stochastic Policies

    Authors: Mingfei Sun, Benjamin Ellis, Anuj Mahajan, Sam Devlin, Katja Hofmann, Shimon Whiteson

    Abstract: Trust Region Policy Optimization (TRPO) is an iterative method that simultaneously maximizes a surrogate objective and enforces a trust region constraint over consecutive policies in each iteration. The combination of the surrogate objective maximization and the trust region enforcement has been shown to be crucial to guarantee a monotonic policy improvement. However, solving a trust-region-constr… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: RLDM 2022

  7. arXiv:2301.13136  [pdf, other

    cs.LG

    Contrastive Meta-Learning for Partially Observable Few-Shot Learning

    Authors: Adam Jelley, Amos Storkey, Antreas Antoniou, Sam Devlin

    Abstract: Many contrastive and meta-learning approaches learn representations by identifying common features in multiple views. However, the formalism for these approaches generally assumes features to be shared across views to be captured coherently. We consider the problem of learning a unified representation from partial observations, where useful features may be present in only some of the views. We app… ▽ More

    Submitted 30 January, 2023; originally announced January 2023.

    Comments: Accepted for publication at ICLR 2023. Code is available at https://github.com/AdamJelley/POEM

  8. arXiv:2301.10677  [pdf, other

    cs.AI cs.LG stat.ML

    Imitating Human Behaviour with Diffusion Models

    Authors: Tim Pearce, Tabish Rashid, Anssi Kanervisto, Dave Bignell, Mingfei Sun, Raluca Georgescu, Sergio Valcarcel Macua, Shan Zheng Tan, Ida Momennejad, Katja Hofmann, Sam Devlin

    Abstract: Diffusion models have emerged as powerful generative models in the text-to-image domain. This paper studies their application as observation-to-action models for imitating human behaviour in sequential environments. Human behaviour is stochastic and multimodal, with structured correlations between action dimensions. Meanwhile, standard modelling choices in behaviour cloning are limited in their ex… ▽ More

    Submitted 3 March, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

    Comments: Published in ICLR 2023

    Journal ref: ICLR 2023

  9. arXiv:2211.10869  [pdf, other

    cs.LG

    UniMASK: Unified Inference in Sequential Decision Problems

    Authors: Micah Carroll, Orr Paradise, Jessy Lin, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin

    Abstract: Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision-making, where many well-studied tasks like behavior cloning, offline reinforcement learning, inverse dynamics, and waypoint conditioning correspond to different sequenc… ▽ More

    Submitted 19 November, 2022; originally announced November 2022.

    Comments: NeurIPS 2022 (Oral). A prior version was published at an ICML Workshop, available at arXiv:2204.13326

  10. arXiv:2204.13326  [pdf, other

    cs.LG

    Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

    Authors: Micah Carroll, Jessy Lin, Orr Paradise, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin

    Abstract: Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a se… ▽ More

    Submitted 9 December, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

    Comments: Superseded by arXiv:2211.10869

  11. arXiv:2202.00082  [pdf, other

    cs.LG

    Trust Region Bounds for Decentralized PPO Under Non-stationarity

    Authors: Mingfei Sun, Sam Devlin, Jacob Beck, Katja Hofmann, Shimon Whiteson

    Abstract: We present trust region bounds for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL), which holds even when the transition dynamics are non-stationary. This new analysis provides a theoretical understanding of the strong performance of two recent actor-critic methods for MARL, which both rely on independent ratios, i.e., computing probability ratios separat… ▽ More

    Submitted 15 February, 2023; v1 submitted 31 January, 2022; originally announced February 2022.

    Comments: AAMAS 2023

  12. arXiv:2202.00079  [pdf, other

    cs.LG cs.AI

    You May Not Need Ratio Clipping in PPO

    Authors: Mingfei Sun, Vitaly Kurin, Guoqing Liu, Sam Devlin, Tao Qin, Katja Hofmann, Shimon Whiteson

    Abstract: Proximal Policy Optimization (PPO) methods learn a policy by iteratively performing multiple mini-batch optimization epochs of a surrogate objective with one set of sampled data. Ratio clipping PPO is a popular variant that clips the probability ratios between the target policy and the policy used to collect samples. Ratio clipping yields a pessimistic estimate of the original surrogate objective,… ▽ More

    Submitted 31 January, 2022; originally announced February 2022.

  13. arXiv:2112.06054  [pdf, other

    cs.LG

    Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency

    Authors: Mingfei Sun, Sam Devlin, Katja Hofmann, Shimon Whiteson

    Abstract: Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-policy extensions could either change the original objective or involve complicated optimization. We revisit the foundation of adversarial imitation and propose an of… ▽ More

    Submitted 13 April, 2022; v1 submitted 11 December, 2021; originally announced December 2021.

    Comments: AAAI 2022

  14. arXiv:2107.14698  [pdf, other

    cs.LG cs.AI cs.MA

    Strategically Efficient Exploration in Competitive Multi-agent Reinforcement Learning

    Authors: Robert Loftin, Aadirupa Saha, Sam Devlin, Katja Hofmann

    Abstract: High sample complexity remains a barrier to the application of reinforcement learning (RL), particularly in multi-agent systems. A large body of work has demonstrated that exploration mechanisms based on the principle of optimism under uncertainty can significantly improve the sample efficiency of RL in single agent tasks. This work seeks to understand the role of optimistic exploration in non-coo… ▽ More

    Submitted 30 July, 2021; originally announced July 2021.

    Comments: To Appear in Uncertainty in Artificial Intelligence (UAI) 2021. 10 figures, 14 pages

    MSC Class: 68T05 ACM Class: I.2.6

  15. arXiv:2105.09637  [pdf, other

    cs.AI cs.LG

    Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation

    Authors: Sam Devlin, Raluca Georgescu, Ida Momennejad, Jaroslaw Rzepecki, Evelyn Zuniga, Gavin Costello, Guy Leroy, Ali Shaw, Katja Hofmann

    Abstract: A key challenge on the path to developing agents that learn complex human-like behavior is the need to quickly and accurately quantify human-likeness. While human assessments of such behavior can be highly accurate, speed and scalability are limited. We address these limitations through a novel automated Navigation Turing Test (ANTT) that learns to predict human judgments of human-likeness. We dem… ▽ More

    Submitted 28 July, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: All data collected throughout this study, plus the code to reproduce our analysis and ANTT are available at https://github.com/microsoft/NTT

    Journal ref: Proceedings of the 38th International Conference on Machine Learning (ICML), 139:2644-2653, 2021

  16. arXiv:2101.11071  [pdf, other

    cs.LG cs.AI stat.ML

    The MineRL 2020 Competition on Sample Efficient Reinforcement Learning using Human Priors

    Authors: William H. Guss, Mario Ynocente Castro, Sam Devlin, Brandon Houghton, Noboru Sean Kuno, Crissman Loomis, Stephanie Milani, Sharada Mohanty, Keisuke Nakata, Ruslan Salakhutdinov, John Schulman, Shinya Shiroshita, Nicholay Topin, Avinash Ummadisingu, Oriol Vinyals

    Abstract: Although deep reinforcement learning has led to breakthroughs in many difficult domains, these successes have required an ever-increasing number of samples, affording only a shrinking segment of the AI community access to their development. Resolution of these limitations requires new, sample-efficient methods. To facilitate research in this direction, we propose this second iteration of the MineR… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

    Comments: 37 pages, initial submission, accepted at NeurIPS. arXiv admin note: substantial text overlap with arXiv:1904.10079

  17. arXiv:2101.05507  [pdf, other

    cs.LG cs.AI cs.HC cs.MA

    Evaluating the Robustness of Collaborative Agents

    Authors: Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah

    Abstract: In order for agents trained by deep reinforcement learning to work alongside humans in realistic settings, we will need to ensure that the agents are \emph{robust}. Since the real world is very diverse, and human behavior often changes in response to agent deployment, the agent will likely encounter novel situations that have never been seen during training. This results in an evaluation challenge… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

  18. arXiv:2101.03864  [pdf, other

    cs.LG cs.MA

    Deep Interactive Bayesian Reinforcement Learning via Meta-Learning

    Authors: Luisa Zintgraf, Sam Devlin, Kamil Ciosek, Shimon Whiteson, Katja Hofmann

    Abstract: Agents that interact with other agents often do not know a priori what the other agents' strategies are, but have to maximise their own online return while interacting with and learning about others. The optimal adaptive behaviour under uncertainty over the other agents' strategies w.r.t. some prior can in principle be computed using the Interactive Bayesian Reinforcement Learning framework. Unfor… ▽ More

    Submitted 15 April, 2022; v1 submitted 11 January, 2021; originally announced January 2021.

    Comments: Published as an extended abstract at AAMAS 2021

  19. Difference Rewards Policy Gradients

    Authors: Jacopo Castellini, Sam Devlin, Frans A. Oliehoek, Rahul Savani

    Abstract: Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent's contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly… ▽ More

    Submitted 9 November, 2023; v1 submitted 21 December, 2020; originally announced December 2020.

    Comments: This work as been accepted as an Extended Abstract in Proc. of the 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2021), U. Endriss, A. Nowé, F. Dignum, A. Lomuscio (eds.), May 3-7 2021, Online

    ACM Class: I.2.6; I.2.11

    Journal ref: Neural Comput & Applic (2022)

  20. arXiv:2009.00541  [pdf, other

    cs.AI

    "It's Unwieldy and It Takes a Lot of Time." Challenges and Opportunities for Creating Agents in Commercial Games

    Authors: Mikhail Jacob, Sam Devlin, Katja Hofmann

    Abstract: Game agents such as opponents, non-player characters, and teammates are central to player experiences in many modern games. As the landscape of AI techniques used in the games industry evolves to adopt machine learning (ML) more widely, it is vital that the research community learn from the best practices cultivated within the industry over decades creating agents. However, although commercial gam… ▽ More

    Submitted 1 September, 2020; originally announced September 2020.

    Comments: 7 pages, 3 figures, to be published in the 16th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE-20)

  21. arXiv:2007.02912  [pdf, other

    cs.LG stat.ML

    Meta-Learning Divergences of Variational Inference

    Authors: Ruqi Zhang, Yingzhen Li, Christopher De Sa, Sam Devlin, Cheng Zhang

    Abstract: Variational inference (VI) plays an essential role in approximate Bayesian inference due to its computational efficiency and broad applicability. Crucial to the performance of VI is the selection of the associated divergence measure, as VI approximates the intractable distribution by minimizing this divergence. In this paper we propose a meta-learning algorithm to learn the divergence metric suite… ▽ More

    Submitted 22 June, 2021; v1 submitted 6 July, 2020; originally announced July 2020.

    Comments: Published at AISTATS 2021

  22. arXiv:2006.08718  [pdf, other

    cs.LG cs.RO stat.ML

    Analytic Manifold Learning: Unifying and Evaluating Representations for Continuous Control

    Authors: Rika Antonova, Maksim Maydanskiy, Danica Kragic, Sam Devlin, Katja Hofmann

    Abstract: We address the problem of learning reusable state representations from streaming high-dimensional observations. This is important for areas like Reinforcement Learning (RL), which yields non-stationary data distributions during training. We make two key contributions. First, we propose an evaluation suite that measures alignment between latent and true low-dimensional states. We benchmark several… ▽ More

    Submitted 6 October, 2020; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: Added Section 4: "Imposing AML Relations During Transfer"; expanded description of experiments in Section 5: "Evaluating AML and Latent Space Transfer"

  23. arXiv:2006.04471  [pdf, ps, other

    cs.AI cs.GT

    A Comparison of Self-Play Algorithms Under a Generalized Framework

    Authors: Daniel Hernandez, Kevin Denamganai, Sam Devlin, Spyridon Samothrakis, James Alfred Walker

    Abstract: Throughout scientific history, overarching theoretical frameworks have allowed researchers to grow beyond personal intuitions and culturally biased theories. They allow to verify and replicate existing findings, and to link is connected results. The notion of self-play, albeit often cited in multiagent Reinforcement Learning, has never been grounded in a formal model. We present a formalized frame… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

  24. arXiv:2003.12331  [pdf, other

    cs.AI cs.NE

    Rolling Horizon Evolutionary Algorithms for General Video Game Playing

    Authors: Raluca D. Gaina, Sam Devlin, Simon M. Lucas, Diego Perez-Liebana

    Abstract: Game-playing Evolutionary Algorithms, specifically Rolling Horizon Evolutionary Algorithms, have recently managed to beat the state of the art in win rate across many video games. However, the best results in a game are highly dependent on the specific configuration of modifications and hybrids introduced over several papers, each adding additional parameters to the core algorithm. Further, the be… ▽ More

    Submitted 24 August, 2020; v1 submitted 27 March, 2020; originally announced March 2020.

  25. arXiv:1910.12911  [pdf, other

    cs.LG cs.AI stat.ML

    Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck

    Authors: Maximilian Igl, Kamil Ciosek, Yingzhen Li, Sebastian Tschiatschek, Cheng Zhang, Sam Devlin, Katja Hofmann

    Abstract: The ability for policies to generalize to new environments is key to the broad application of RL agents. A promising approach to prevent an agent's policy from overfitting to a limited set of training environments is to apply regularization techniques originally developed for supervised learning. However, there are stark differences between supervised learning and RL. We discuss those differences… ▽ More

    Submitted 28 October, 2019; originally announced October 2019.

    Comments: Published at Neurips 2019

  26. arXiv:1905.07631  [pdf, other

    stat.ML cs.LG stat.ME

    Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees

    Authors: Summer Devlin, Chandan Singh, W. James Murdoch, Bin Yu

    Abstract: Tree ensembles, such as random forests and AdaBoost, are ubiquitous machine learning models known for achieving strong predictive performance across a wide variety of domains. However, this strong performance comes at the cost of interpretability (i.e. users are unable to understand the relationships a trained random forest has learned and why it is making its predictions). In particular, it is ch… ▽ More

    Submitted 18 May, 2019; originally announced May 2019.

    Comments: Under review

  27. arXiv:1903.05431  [pdf, other

    cs.MA cs.LG

    Resource Abstraction for Reinforcement Learning in Multiagent Congestion Problems

    Authors: Kleanthis Malialis, Sam Devlin, Daniel Kudenko

    Abstract: Real-world congestion problems (e.g. traffic congestion) are typically very complex and large-scale. Multiagent reinforcement learning (MARL) is a promising candidate for dealing with this emerging complexity by providing an autonomous and distributed solution to these problems. However, there are three limiting factors that affect the deployability of MARL approaches to congestion problems. These… ▽ More

    Submitted 13 March, 2019; originally announced March 2019.

    Comments: Keywords: congestion problems, resource management, multiagent reinforcement learning, multiagent systems, multiagent learning, resource abstraction. In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems (AAMAS '16)

    Journal ref: Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2016/

  28. arXiv:1901.08129  [pdf, ps, other

    cs.AI

    The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) Competition

    Authors: Diego Perez-Liebana, Katja Hofmann, Sharada Prasanna Mohanty, Noburu Kuno, Andre Kramer, Sam Devlin, Raluca D. Gaina, Daniel Ionita

    Abstract: Learning in multi-agent scenarios is a fruitful research direction, but current approaches still show scalability problems in multiple games with general reward settings and different opponent types. The Multi-Agent Reinforcement Learning in MalmÖ (MARLÖ) competition is a new challenge that proposes research in this domain using multiple 3D games. The goal of this contest is to foster research in… ▽ More

    Submitted 23 January, 2019; originally announced January 2019.

    Comments: 2 pages plus references

    Journal ref: Challenges in Machine Learning (NIPS Workshop), 2018

  29. arXiv:1808.01262  [pdf, ps, other

    cs.AI

    The Text-Based Adventure AI Competition

    Authors: Timothy Atkinson, Hendrik Baier, Tara Copplestone, Sam Devlin, Jerry Swan

    Abstract: In 2016, 2017, and 2018 at the IEEE Conference on Computational Intelligence in Games, the authors of this paper ran a competition for agents that can play classic text-based adventure games. This competition fills a gap in existing game AI competitions that have typically focussed on traditional card/board games or modern video games with graphical interfaces. By providing a platform for evaluati… ▽ More

    Submitted 24 January, 2019; v1 submitted 3 August, 2018; originally announced August 2018.

    Comments: updated to journal version

    MSC Class: 68T50

  30. arXiv:1711.06498  [pdf, other

    cs.AI

    Win Prediction in Esports: Mixed-Rank Match Prediction in Multi-player Online Battle Arena Games

    Authors: Victoria Hodge, Sam Devlin, Nick Sephton, Florian Block, Anders Drachen, Peter Cowling

    Abstract: Esports has emerged as a popular genre for players as well as spectators, supporting a global entertainment industry. Esports analytics has evolved to address the requirement for data-driven feedback, and is focused on cyber-athlete evaluation, strategy and prediction. Towards the latter, previous work has used match data from a variety of player ranks from hobbyist to professional players. Howeve… ▽ More

    Submitted 17 November, 2017; originally announced November 2017.