Follow
Dawn Drain
Dawn Drain
Verified email at microsoft.com
Title
Cited by
Cited by
Year
Training a helpful and harmless assistant with reinforcement learning from human feedback
Y Bai, A Jones, K Ndousse, A Askell, A Chen, N DasSarma, D Drain, ...
arXiv preprint arXiv:2204.05862, 2022
14532022
Constitutional ai: Harmlessness from ai feedback
Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ...
arXiv preprint arXiv:2212.08073, 2022
11392022
Graphcodebert: Pre-training code representations with data flow
D Guo, S Ren, S Lu, Z Feng, D Tang, S Liu, L Zhou, N Duan, ...
arXiv preprint arXiv:2009.08366, 2020
9972020
Codexglue: A machine learning benchmark dataset for code understanding and generation
S Lu, D Guo, S Ren, J Huang, A Svyatkovskiy, A Blanco, C Clement, ...
arXiv preprint arXiv:2102.04664, 2021
8232021
Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned
D Ganguli, L Lovitt, J Kernion, A Askell, Y Bai, S Kadavath, B Mann, ...
arXiv preprint arXiv:2209.07858, 2022
4162022
A general language assistant as a laboratory for alignment
A Askell, Y Bai, A Chen, D Drain, D Ganguli, T Henighan, A Jones, ...
arXiv preprint arXiv:2112.00861, 2021
3592021
In-context learning and induction heads
C Olsson, N Elhage, N Nanda, N Joseph, N DasSarma, T Henighan, ...
arXiv preprint arXiv:2209.11895, 2022
3322022
A mathematical framework for transformer circuits
N Elhage, N Nanda, C Olsson, T Henighan, N Joseph, B Mann, A Askell, ...
Transformer Circuits Thread 1 (1), 12, 2021
2802021
Predictability and surprise in large generative models
D Ganguli, D Hernandez, L Lovitt, A Askell, Y Bai, A Chen, T Conerly, ...
Proceedings of the 2022 ACM Conference on Fairness, Accountability, and …, 2022
2782022
Toy models of superposition
N Elhage, T Hume, C Olsson, N Schiefer, T Henighan, S Kravec, ...
arXiv preprint arXiv:2209.10652, 2022
2692022
Discovering language model behaviors with model-written evaluations
E Perez, S Ringer, K Lukošiūtė, K Nguyen, E Chen, S Heiner, C Pettit, ...
arXiv preprint arXiv:2212.09251, 2022
2192022
Language models (mostly) know what they know
S Kadavath, T Conerly, A Askell, T Henighan, D Drain, E Perez, ...
arXiv preprint arXiv:2207.05221, 2022
1432022
The capacity for moral self-correction in large language models
D Ganguli, A Askell, N Schiefer, TI Liao, K Lukošiūtė, A Chen, A Goldie, ...
arXiv preprint arXiv:2302.07459, 2023
1322023
Measuring progress on scalable oversight for large language models
SR Bowman, J Hyun, E Perez, E Chen, C Pettit, S Heiner, K Lukošiūtė, ...
arXiv preprint arXiv:2211.03540, 2022
862022
PyMT5: multi-mode translation of natural language and Python code with transformers
CB Clement, D Drain, J Timcheck, A Svyatkovskiy, N Sundaresan
arXiv preprint arXiv:2010.03150, 2020
842020
A mathematical framework for transformer circuits. Transformer Circuits Thread, 2021
N Elhage, N Nanda, C Olsson, T Henighan, N Joseph, B Mann, A Askell, ...
73
Unit test case generation with transformers and focal context
M Tufano, D Drain, A Svyatkovskiy, SK Deng, N Sundaresan
arXiv preprint arXiv:2009.05617, 2020
642020
Scaling laws and interpretability of learning from repeated data
D Hernandez, T Brown, T Conerly, N DasSarma, D Drain, S El-Showk, ...
arXiv preprint arXiv:2205.10487, 2022
632022
Generating bug-fixes using pretrained transformers
D Drain, C Wu, A Svyatkovskiy, N Sundaresan
Proceedings of the 5th ACM SIGPLAN International Symposium on Machine …, 2021
582021
Generating accurate assert statements for unit test cases using pretrained transformers
M Tufano, D Drain, A Svyatkovskiy, N Sundaresan
Proceedings of the 3rd ACM/IEEE International Conference on Automation of …, 2022
562022
The system can't perform the operation now. Try again later.
Articles 1–20