Training compute-optimal large language models J Hoffmann, S Borgeaud, A Mensch, E Buchatskaya, T Cai, E Rutherford, ... arXiv preprint arXiv:2203.15556, 2022 | 936 | 2022 |
Scaling language models: Methods, analysis & insights from training gopher JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ... arXiv preprint arXiv:2112.11446, 2021 | 756 | 2021 |
Improving language models by retrieving from trillions of tokens S Borgeaud, A Mensch, J Hoffmann, T Cai, E Rutherford, K Millican, ... International conference on machine learning, 2206-2240, 2022 | 654 | 2022 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 481 | 2023 |
Haiku: Sonnet for JAX, 2020 T Hennigan, T Cai, T Norman, I Babuschkin URL http://github. com/deepmind/dm-haiku 10, 2021 | 163* | 2021 |
The DeepMind JAX Ecosystem, 2020 I Babuschkin, K Baumli, A Bell, S Bhupatiraju, J Bruce, P Buchlovsky, ... URL http://github. com/deepmind 5, 2010 | 92 | 2010 |
Cyprien de Masson d’Autume JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, F Song, J Aslanides, ... | 80 | 2021 |
An empirical analysis of compute-optimal large language model training J Hoffmann, S Borgeaud, A Mensch, E Buchatskaya, T Cai, E Rutherford, ... Advances in Neural Information Processing Systems 35, 30016-30030, 2022 | 64 | 2022 |
The DeepMind JAX Ecosystem I Babuschkin, K Baumli, A Bell, S Bhupatiraju, J Bruce, P Buchlovsky, ... URL http://github. com/deepmind 24, 25, 2020 | 55 | 2020 |
Cyprien de Masson d’Autume, Yujia Li, Tayfun Terzi, Vladimir Mikulik, Igor Babuschkin, Aidan Clark, Diego de Las Casas, Aurelia Guy, Chris Jones, James Bradbury, Matthew J JW Rae, S Borgeaud, T Cai, K Millican, J Hoffmann, HF Song, J Aslanides, ... Johnson, Blake A. Hechtman, Laura Weidinger, Iason Gabriel, William S. Isaac …, 2021 | 47 | 2021 |
Gemma: Open models based on gemini research and technology G Team, T Mesnard, C Hardin, R Dadashi, S Bhupatiraju, S Pathak, ... arXiv preprint arXiv:2403.08295, 2024 | 35 | 2024 |
Unified scaling laws for routed language models A Clark, D de Las Casas, A Guy, A Mensch, M Paganini, J Hoffmann, ... International conference on machine learning, 4057-4086, 2022 | 33 | 2022 |
Optax: composable gradient transformation and optimisation, in jax!, 2020 M Hessel, D Budden, F Viola, M Rosca, E Sezener, T Hennigan URL http://github. com/deepmind/optax 16, 2010 | 28 | 2010 |
Driessche J Hoffmann, S Borgeaud, A Mensch, E Buchatskaya, T Cai, E Rutherford, ... G. vd, Damoc, B., Guy, A., Osindero, S., Simonyan, K., Elsen, E., Rae, JW …, 2022 | 16 | 2022 |
Optax: composable gradient transformation and optimisation, in jax M Hessel, D Budden, F Viola, M Rosca, E Sezener, T Hennigan Github. http://github. com/google/jax, 2020 | 16 | 2020 |
Device-based filtering of content items associated with mobile applications K Kannan, C Guo, CK Harris, X Mao, TMJ Hennigan, SLI Hsiao, R Govoni, ... US Patent App. 15/236,968, 2017 | 15 | 2017 |
Place heat geometries FE Herring, TMJ Hennigan, G Exton, MPT Wilson, A Eland, S Fortune US Patent App. 13/741,235, 2013 | 13 | 2013 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context M Reid, N Savinov, D Teplyashin, D Lepikhin, T Lillicrap, J Alayrac, ... arXiv preprint arXiv:2403.05530, 2024 | 11 | 2024 |