AI safety gridworlds J Leike, M Martic, V Krakovna, PA Ortega, T Everitt, A Lefrancq, L Orseau, ... arXiv preprint arXiv:1711.09883, 2017 | 166 | 2017 |
Reinforcement Learning with a Corrupted Reward Channel T Everitt, V Krakovna, L Orseau, M Hutter, S Legg IJCAI AI & Autonomy, 2017 | 57 | 2017 |
Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models V Krakovna, F Doshi-Velez ICML Workshop on Human Interpretability (WHI 2016), arXiv preprint arXiv …, 2016 | 45 | 2016 |
Penalizing side effects using stepwise relative reachability V Krakovna, L Orseau, R Kumar, M Martic, S Legg arXiv preprint arXiv:1806.01186, 2018 | 19 | 2018 |
Measuring and avoiding side effects using relative reachability V Krakovna, L Orseau, M Martic, S Legg arXiv preprint arXiv:1806.01186, 2018 | 14 | 2018 |
Specification gaming examples in AI V Krakovna tinyurl.com/specification-gaming, 2018 | 14 | 2018 |
Modeling AGI safety frameworks with causal influence diagrams T Everitt, R Kumar, V Krakovna, S Legg arXiv preprint arXiv:1906.08663, 2019 | 11 | 2019 |
Specification gaming: the flip side of AI ingenuity V Krakovna, J Uesato, V Mikulik, M Rahtz, T Everitt, R Kumar, Z Kenton, ... https://deepmind.com/blog/article/Specification-gaming-the-flip-side-of-AI …, 2020 | 8 | 2020 |
Interpretable selection and visualization of features and interactions using bayesian forests V Krakovna, J Du, JS Liu Statistics and its Interface 2018 (Volume 11 Number 3), arXiv preprint arXiv …, 2015 | 6* | 2015 |
A generalized-zero-preserving method for compact encoding of concept lattices M Skala, V Krakovna, J Kramár, G Penn Proceedings of the 48th annual meeting of the Association for Computational …, 2010 | 5 | 2010 |
Building interpretable models: From Bayesian networks to neural networks V Krakovna | 3 | 2016 |
A Minimalistic Approach to Sum-Product Network Learning for Real Applications V Krakovna, M Looks ICLR 2016 workshop, arXiv preprint arXiv:1602.04259, 2016 | 3 | 2016 |
REALab: An Embedded Perspective on Tampering R Kumar, J Uesato, R Ngo, T Everitt, V Krakovna, S Legg arXiv preprint arXiv:2011.08820, 2020 | 1 | 2020 |
Avoiding Side Effects By Considering Future Tasks V Krakovna, L Orseau, R Ngo, M Martic, S Legg NeurIPS 2020, arXiv preprint arXiv:2010.07877, 2020 | 1 | 2020 |
Avoiding Tampering Incentives in Deep RL via Decoupled Approval J Uesato, R Kumar, V Krakovna, T Everitt, R Ngo, S Legg arXiv preprint arXiv:2011.08827, 2020 | | 2020 |