Follow
Catherine Olsson
Catherine Olsson
Anthropic
Verified email at mit.edu
Title
Cited by
Cited by
Year
Estimating the reproducibility of psychological science
Open Science Collaboration
Science 349 (6251), aac4716, 2015
91802015
Dota 2 with large scale deep reinforcement learning
C Berner, G Brockman, B Chan, V Cheung, P Dębiak, C Dennison, ...
arXiv preprint arXiv:1912.06680, 2019
16722019
An open, large-scale, collaborative effort to estimate the reproducibility of psychological science
Open Science Collaboration
Perspectives on Psychological Science 7, 657-660, 2012
7272012
Training a helpful and harmless assistant with reinforcement learning from human feedback
Y Bai, A Jones, K Ndousse, A Askell, A Chen, N DasSarma, D Drain, ...
arXiv preprint arXiv:2204.05862, 2022
6562022
Constitutional ai: Harmlessness from ai feedback
Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ...
arXiv preprint arXiv:2212.08073, 2022
5612022
Tensorfuzz: Debugging neural networks with coverage-guided fuzzing
A Odena, C Olsson, D Andersen, I Goodfellow
International Conference on Machine Learning, 4901-4911, 2019
3382019
Language models (mostly) know what they know
S Kadavath, T Conerly, A Askell, T Henighan, D Drain, E Perez, ...
arXiv preprint arXiv:2207.05221, 2022
2182022
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned
D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai, S Kadavath, B Mann, ...
arXiv preprint arXiv:2209.07858, 2022
2092022
A general language assistant as a laboratory for alignment
A Askell, Y Bai, A Chen, D Drain, D Ganguli, T Henighan, A Jones, ...
arXiv preprint arXiv:2112.00861, 2021
2072021
In-context learning and induction heads
C Olsson, N Elhage, N Nanda, N Joseph, N DasSarma, T Henighan, ...
arXiv preprint arXiv:2209.11895, 2022
1832022
Predictability and surprise in large generative models
D Ganguli, D Hernandez, L Lovitt, A Askell, Y Bai, A Chen, T Conerly, ...
Proceedings of the 2022 ACM Conference on Fairness, Accountability, and …, 2022
1672022
Discriminator rejection sampling
S Azadi, C Olsson, T Darrell, I Goodfellow, A Odena
arXiv preprint arXiv:1810.06758, 2018
1472018
A mathematical framework for transformer circuits
N Elhage, N Nanda, C Olsson, T Henighan, N Joseph, B Mann, A Askell, ...
Transformer Circuits Thread 1, 1, 2021
1452021
Toy models of superposition
N Elhage, T Hume, C Olsson, N Schiefer, T Henighan, S Kravec, ...
arXiv preprint arXiv:2209.10652, 2022
1362022
Is generator conditioning causally related to GAN performance?
A Odena, J Buckman, C Olsson, T Brown, C Olah, C Raffel, I Goodfellow
International conference on machine learning, 3849-3858, 2018
1352018
Discovering language model behaviors with model-written evaluations
E Perez, S Ringer, K Lukošiūtė, K Nguyen, E Chen, S Heiner, C Pettit, ...
arXiv preprint arXiv:2212.09251, 2022
1202022
Dawn Drain
N Elhage, N Nanda, C Olsson, T Henighan, N Joseph, B Mann, A Askell, ...
Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson …, 2021
1142021
Dota 2 with large scale deep reinforcement learning
CB OpenAI, G Brockman, B Chan, V Cheung, P Debiak, C Dennison, ...
arXiv preprint arXiv:1912.06680 2, 2019
1042019
Dawn Drain
C Olsson, N Elhage, NJ Neel Nanda, N DasSarma, T Henighan, B Mann, ...
Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy …, 2022
1012022
Unrestricted adversarial examples
TB Brown, N Carlini, C Zhang, C Olsson, P Christiano, I Goodfellow
arXiv preprint arXiv:1809.08352, 2018
942018
The system can't perform the operation now. Try again later.
Articles 1–20