Skip to main content

Showing 1–20 of 20 results for author: Burch, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2312.15220  [pdf, other

    cs.GT

    Look-ahead Search on Top of Policy Networks in Imperfect Information Games

    Authors: Ondrej Kubicek, Neil Burch, Viliam Lisy

    Abstract: Search in test time is often used to improve the performance of reinforcement learning algorithms. Performing theoretically sound search in fully adversarial two-player games with imperfect information is notoriously difficult and requires a complicated training process. We present a method for adding test-time search to an arbitrary policy-gradient algorithm that learns from sampled trajectories.… ▽ More

    Submitted 6 February, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

  2. arXiv:2303.03196  [pdf, other

    cs.GT cs.AI cs.LG cs.MA

    Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning

    Authors: Marc Lanctot, John Schultz, Neil Burch, Max Olan Smith, Daniel Hennes, Thomas Anthony, Julien Perolat

    Abstract: Progress in fields of machine learning and adversarial planning has benefited significantly from benchmark domains, from checkers and the classic UCI data sets to Go and Diplomacy. In sequential decision-making, agent evaluation has largely been restricted to few interactions against experts, with the aim to reach some desired level of performance (e.g. beating a human professional player). We pro… ▽ More

    Submitted 31 October, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: 25 pages, 8 figures, Accepted at TMLR October 2023

  3. arXiv:2206.15378  [pdf, other

    cs.AI cs.GT cs.MA

    Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

    Authors: Julien Perolat, Bart de Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen McAleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot , et al. (9 additional authors not shown)

    Abstract: We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additiona… ▽ More

    Submitted 30 June, 2022; originally announced June 2022.

  4. arXiv:2112.03178  [pdf, other

    cs.AI cs.GT cs.LG

    Student of Games: A unified learning algorithm for both perfect and imperfect information games

    Authors: Martin Schmid, Matej Moravcik, Neil Burch, Rudolf Kadlec, Josh Davidson, Kevin Waugh, Nolan Bard, Finbarr Timbers, Marc Lanctot, G. Zacharias Holland, Elnaz Davoodi, Alden Christianson, Michael Bowling

    Abstract: Games have a long history as benchmarks for progress in artificial intelligence. Approaches using search and learning produced strong performance across many perfect information games, and approaches using game-theoretic reasoning and learning demonstrated strong performance for specific imperfect information poker variants. We introduce Student of Games, a general-purpose algorithm that unifies p… ▽ More

    Submitted 15 November, 2023; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: Published in Science Advances

    Journal ref: Science Advances 9, eadg3256 (2023)

  5. arXiv:2101.04237  [pdf, other

    cs.AI cs.LG

    Solving Common-Payoff Games with Approximate Policy Iteration

    Authors: Samuel Sokota, Edward Lockhart, Finbarr Timbers, Elnaz Davoodi, Ryan D'Orazio, Neil Burch, Martin Schmid, Michael Bowling, Marc Lanctot

    Abstract: For artificially intelligent learning systems to have widespread applicability in real-world settings, it is important that they be able to operate decentrally. Unfortunately, decentralized control is difficult -- computing even an epsilon-optimal joint policy is a NEXP complete problem. Nevertheless, a recently rediscovered insight -- that a team of agents can coordinate via common knowledge -- h… ▽ More

    Submitted 11 January, 2021; originally announced January 2021.

    Comments: AAAI 2021

  6. arXiv:2011.14124  [pdf, ps, other

    cs.AI

    Human-Agent Cooperation in Bridge Bidding

    Authors: Edward Lockhart, Neil Burch, Nolan Bard, Sebastian Borgeaud, Tom Eccles, Lucas Smaira, Ray Smith

    Abstract: We introduce a human-compatible reinforcement-learning approach to a cooperative game, making use of a third-party hand-coded human-compatible bot to generate initial training data and to perform initial evaluation. Our learning approach consists of imitation learning, search, and policy iteration. Our trained agents achieve a new state-of-the-art for bridge bidding in three settings: an agent pla… ▽ More

    Submitted 28 November, 2020; originally announced November 2020.

  7. arXiv:2006.08740  [pdf, other

    cs.GT

    Sound Algorithms in Imperfect Information Games

    Authors: Michal Šustr, Martin Schmid, Matej Moravčík, Neil Burch, Marc Lanctot, Michael Bowling

    Abstract: Search has played a fundamental role in computer game research since the very beginning. And while online search has been commonly used in perfect information games such as Chess and Go, online search methods for imperfect information games have only been introduced relatively recently. This paper addresses the question of what is a sound online algorithm in an imperfect information setting of two… ▽ More

    Submitted 2 March, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

    Comments: Accepted to AAMAS2021 as extended abstract (Ref. numbers not available yet)

  8. arXiv:2004.09677  [pdf, other

    cs.LG stat.ML

    Approximate exploitability: Learning a best response in large games

    Authors: Finbarr Timbers, Nolan Bard, Edward Lockhart, Marc Lanctot, Martin Schmid, Neil Burch, Julian Schrittwieser, Thomas Hubert, Michael Bowling

    Abstract: Researchers have demonstrated that neural networks are vulnerable to adversarial examples and subtle environment changes, both of which one can view as a form of distribution shift. To humans, the resulting errors can look like blunders, eroding trust in these agents. In prior games research, agent evaluation often focused on the in-practice game outcomes. While valuable, such evaluation typically… ▽ More

    Submitted 3 November, 2022; v1 submitted 20 April, 2020; originally announced April 2020.

  9. arXiv:2002.08456  [pdf, other

    cs.GT cs.LG stat.ML

    From Poincaré Recurrence to Convergence in Imperfect Information Games: Finding Equilibrium via Regularization

    Authors: Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, Georgios Piliouras, Marc Lanctot, Karl Tuyls

    Abstract: In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG). We generalize existing results of Poincaré recurrence from normal-form games to zero-sum two-player imperfect information games and other sequential game settings. We then investigate how adapting the reward (by adding a regularization term) of the game can give strong convergen… ▽ More

    Submitted 19 February, 2020; originally announced February 2020.

    Comments: 43 pages

  10. arXiv:1906.11110  [pdf, other

    cs.AI cs.GT

    Rethinking Formal Models of Partially Observable Multiagent Decision Making

    Authors: Vojtěch Kovařík, Martin Schmid, Neil Burch, Michael Bowling, Viliam Lisý

    Abstract: Multiagent decision-making in partially observable environments is usually modelled as either an extensive-form game (EFG) in game theory or a partially observable stochastic game (POSG) in multiagent reinforcement learning (MARL). One issue with the current situation is that while most practical problems can be modelled in both formalisms, the relationship of the two models is unclear, which hind… ▽ More

    Submitted 28 September, 2021; v1 submitted 26 June, 2019; originally announced June 2019.

    Comments: A 2020 update of the original 2019 version of the paper. (Rewrote the main text and clarified the relationship between FOSGs/POSGs and EFGs. Some of the technical results are now presented in the appendix.)

  11. The Hanabi Challenge: A New Frontier for AI Research

    Authors: Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, Michael Bowling

    Abstract: From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making. In recent years, machine learning has made dramatic advances with artificial agents reaching superhuman performance in challenge domains like Go, Atari, and some variants of poker. As with their predecessors of chess, checkers, and backgammon, these game domains… ▽ More

    Submitted 6 December, 2019; v1 submitted 1 February, 2019; originally announced February 2019.

    Comments: 32 pages, 5 figures, In Press (Artificial Intelligence)

  12. arXiv:1811.01458  [pdf, other

    cs.MA cs.AI cs.LG

    Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning

    Authors: Jakob N. Foerster, Francis Song, Edward Hughes, Neil Burch, Iain Dunning, Shimon Whiteson, Matthew Botvinick, Michael Bowling

    Abstract: When observing the actions of others, humans make inferences about why they acted as they did, and what this implies about the world; humans also use the fact that their actions will be interpreted in this manner, allowing them to act informatively and thereby communicate efficiently with others. Although learning algorithms have recently achieved superhuman performance in a number of two-player,… ▽ More

    Submitted 10 September, 2019; v1 submitted 4 November, 2018; originally announced November 2018.

  13. arXiv:1810.11542  [pdf, ps, other

    cs.GT

    Revisiting CFR+ and Alternating Updates

    Authors: Neil Burch, Matej Moravcik, Martin Schmid

    Abstract: The CFR+ algorithm for solving imperfect information games is a variant of the popular CFR algorithm, with faster empirical performance on a range of problems. It was introduced with a theoretical upper bound on solution error, but subsequent work showed an error in one step of the proof. We provide updated proofs to recover the original bound.

    Submitted 19 February, 2019; v1 submitted 26 October, 2018; originally announced October 2018.

    Comments: 15 pages, 0 figures. Published in JAIR, Feb. 2019

    Journal ref: Journal of Artificial Intelligence Research 64 (2019) 429-443

  14. arXiv:1809.03057  [pdf, other

    cs.GT cs.AI

    Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines

    Authors: Martin Schmid, Neil Burch, Marc Lanctot, Matej Moravcik, Rudolf Kadlec, Michael Bowling

    Abstract: Learning strategies for imperfect information games from samples of interaction is a challenging problem. A common method for this setting, Monte Carlo Counterfactual Regret Minimization (MCCFR), can have slow long-term convergence rates due to high variance. In this paper, we introduce a variance reduction technique (VR-MCCFR) that applies to any sampling variant of MCCFR. Using this technique, p… ▽ More

    Submitted 9 September, 2018; originally announced September 2018.

  15. DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker

    Authors: Matej Moravčík, Martin Schmid, Neil Burch, Viliam Lisý, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, Michael Bowling

    Abstract: Artificial intelligence has seen several breakthroughs in recent years, with games often serving as milestones. A common feature of these games is that players have perfect information. Poker is the quintessential game of imperfect information, and a longstanding challenge problem in artificial intelligence. We introduce DeepStack, an algorithm for imperfect information settings. It combines recur… ▽ More

    Submitted 3 March, 2017; v1 submitted 6 January, 2017; originally announced January 2017.

  16. arXiv:1612.06915  [pdf, other

    cs.AI

    AIVAT: A New Variance Reduction Technique for Agent Evaluation in Imperfect Information Games

    Authors: Neil Burch, Martin Schmid, Matej Moravčík, Michael Bowling

    Abstract: Evaluating agent performance when outcomes are stochastic and agents use randomized strategies can be challenging when there is limited data available. The variance of sampled outcomes may make the simple approach of Monte Carlo sampling inadequate. This is the case for agents playing heads-up no-limit Texas hold'em poker, where man-machine competitions have involved multiple days of consistent pl… ▽ More

    Submitted 19 January, 2017; v1 submitted 20 December, 2016; originally announced December 2016.

    Comments: To appear at AAAI-17 Workshop on Computer Poker and Imperfect Information Games

  17. Predicting the Performance of IDA* using Conditional Distributions

    Authors: Uzi Zahavi, Ariel Felner, Neil Burch, Robert C. Holte

    Abstract: Korf, Reid, and Edelkamp introduced a formula to predict the number of nodes IDA* will expand on a single iteration for a given consistent heuristic, and experimentally demonstrated that it could make very accurate predictions. In this paper we show that, in addition to requiring the heuristic to be consistent, their formulas predictions are accurate only at levels of the brute-force search tree w… ▽ More

    Submitted 15 January, 2014; originally announced January 2014.

    Journal ref: Journal Of Artificial Intelligence Research, Volume 37, pages 41-83, 2010

  18. arXiv:1303.4441  [pdf, other

    cs.GT

    Solving Imperfect Information Games Using Decomposition

    Authors: Neil Burch, Michael Johanson, Michael Bowling

    Abstract: Decomposition, i.e. independently analyzing possible subgames, has proven to be an essential principle for effective decision-making in perfect information games. However, in imperfect information games, decomposition has proven to be problematic. To date, all proposed techniques for decomposition in imperfect information games have abandoned theoretical guarantees. This work presents the first te… ▽ More

    Submitted 21 April, 2014; v1 submitted 18 March, 2013; originally announced March 2013.

    Comments: 7 pages by 2 columns, 5 figures; April 21 2014 - expand explanations and theory

  19. arXiv:1207.1411  [pdf

    cs.GT cs.AI

    Bayes' Bluff: Opponent Modelling in Poker

    Authors: Finnegan Southey, Michael P. Bowling, Bryce Larson, Carmelo Piccione, Neil Burch, Darse Billings, Chris Rayner

    Abstract: Poker is a challenging problem for artificial intelligence, with non-deterministic dynamics, partial observability, and the added difficulty of unknown adversaries. Modelling all of the uncertainties in this domain is not an easy task. In this paper we present a Bayesian probabilistic model for a broad class of poker games, separating the uncertainty in the game dynamics from the uncertainty of th… ▽ More

    Submitted 4 July, 2012; originally announced July 2012.

    Comments: Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005)

    Report number: UAI-P-2005-PG-550-558

  20. arXiv:1205.0622  [pdf, other

    cs.GT cs.AI

    No-Regret Learning in Extensive-Form Games with Imperfect Recall

    Authors: Marc Lanctot, Richard Gibson, Neil Burch, Martin Zinkevich, Michael Bowling

    Abstract: Counterfactual Regret Minimization (CFR) is an efficient no-regret learning algorithm for decision problems modeled as extensive games. CFR's regret bounds depend on the requirement of perfect recall: players always remember information that was revealed to them and the order in which it was revealed. In games without perfect recall, however, CFR's guarantees do not apply. In this paper, we presen… ▽ More

    Submitted 3 May, 2012; originally announced May 2012.

    Comments: 21 pages, 4 figures, expanded version of article to appear in Proceedings of the Twenty-Ninth International Conference on Machine Learning