Thinking fast and slow with deep learning and tree search TW Anthony, Z Tian, D Barber Advances in Neural Information Processing Systems, 5360-5370, 2017 | 429 | 2017 |
Openspiel: A framework for reinforcement learning in games M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ... arXiv preprint arXiv:1908.09453, 2019 | 280 | 2019 |
Mastering the game of Stratego with model-free multiagent reinforcement learning J Perolat, B De Vylder, D Hennes, E Tarassov, F Strub, V de Boer, ... Science 378 (6623), 990-996, 2022 | 220 | 2022 |
From Poincaré recurrence to convergence in imperfect information games: Finding equilibrium via regularization J Perolat, R Munos, JB Lespiau, S Omidshafiei, M Rowland, P Ortega, ... International Conference on Machine Learning, 8525-8535, 2021 | 92 | 2021 |
On the role of planning in model-based deep reinforcement learning JB Hamrick, AL Friesen, F Behbahani, A Guez, F Viola, S Witherspoon, ... arXiv preprint arXiv:2011.04021, 2020 | 88 | 2020 |
Learning to Play No-Press Diplomacy with Best Response Policy Iteration T Anthony, T Eccles, A Tacchetti, J Kramár, I Gemp, TC Hudson, N Porcel, ... arXiv preprint arXiv:2006.04635, 2020 | 57 | 2020 |
Policy Gradient Search: Online Planning and Expert Iteration without Search Trees TW Anthony, R Nishihara, P Moritz, T Salimans, J Schulman arXiv preprint arXiv:1904.03646, 2019 | 32 | 2019 |
OpenSpiel: A Framework for Reinforcement Learning in Games. CoRR abs/1908.09453 (2019) M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ... arXiv preprint cs.LG/1908.09453, 2019 | 28 | 2019 |
Learning to Resolve Alliance Dilemmas in Many-Player Zero-Sum Games E Hughes, TW Anthony, T Eccles, JZ Leibo, D Balduzzi, Y Bachrach arXiv preprint arXiv:2003.00799, 2020 | 25 | 2020 |
ITERATIVE EMPIRICAL GAME SOLVING VIA SINGLE POLICY BEST RESPONSE MO Smith, T Anthony, MP Wellman | 20* | |
Sample-based Approximation of Nash in Large Many-Player Games via Gradient Descent I Gemp, R Savani, M Lanctot, Y Bachrach, T Anthony, R Everett, ... arXiv preprint arXiv:2106.01285, 2021 | 19 | 2021 |
Smooth markets: A basic mechanism for organizing gradient-based learners D Balduzzi, WM Czarnecki, TW Anthony, IM Gemp, E Hughes, JZ Leibo, ... arXiv preprint arXiv:2001.04678, 2020 | 17 | 2020 |
Learning to play against any mixture of opponents MO Smith, T Anthony, MP Wellman Frontiers in Artificial Intelligence 6, 2023 | 15 | 2023 |
Turbocharging solution concepts: Solving NEs, CEs and CCEs with neural equilibrium solvers L Marris, I Gemp, T Anthony, A Tacchetti, S Liu, K Tuyls Advances in Neural Information Processing Systems 35, 5586-5600, 2022 | 15 | 2022 |
Expert iteration TW Anthony UCL (University College London), 2021 | 8 | 2021 |
Heterogeneous Social Value Orientation Leads to Meaningful Diversity in Sequential Social Dilemmas U Madhushani, KR McKee, JP Agapiou, JZ Leibo, R Everett, T Anthony, ... arXiv preprint arXiv:2305.00768, 2023 | 6 | 2023 |
Designing all-pay auctions using deep learning and multi-agent simulation I Gemp, T Anthony, J Kramar, T Eccles, A Tacchetti, Y Bachrach Scientific Reports 12 (1), 16937, 2022 | 6 | 2022 |
Developing, evaluating and scaling learning agents in multi-agent environments I Gemp, T Anthony, Y Bachrach, A Bhoopchand, K Bullard, J Connor, ... AI Communications 35 (4), 271-284, 2022 | 5 | 2022 |
Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning M Lanctot, J Schultz, N Burch, MO Smith, D Hennes, T Anthony, J Perolat arXiv preprint arXiv:2303.03196, 2023 | 4 | 2023 |
Strategic Knowledge Transfer MO Smith, T Anthony, MP Wellman Journal of Machine Learning Research 24 (233), 1-96, 2023 | 3 | 2023 |