Tensor programs iib: Architectural universality of neural tangent kernel training dynamics G Yang, E Littwin
International Conference on Machine Learning, 11762-11772, 2021
62 2021 Biometric authentication techniques DS Prakash, LE Ballard, JV Hauck, F Tang, E Littwin, PKA Vasu, G Littwin, ...
US Patent 10,929,515, 2021
35 2021 The multiverse loss for robust transfer learning E Littwin, L Wolf
Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2016
32 2016 What algorithms can transformers learn? a study in length generalization H Zhou, A Bradley, E Littwin, N Razin, O Saremi, J Susskind, S Bengio, ...
arXiv preprint arXiv:2310.16028, 2023
26 2023 Stabilizing transformer training by preventing attention entropy collapse S Zhai, T Likhomanenko, E Littwin, D Busbridge, J Ramapuram, Y Zhang, ...
International Conference on Machine Learning, 40770-40803, 2023
26 2023 The slingshot mechanism: An empirical study of adaptive optimizers and the grokking phenomenon V Thilak, E Littwin, S Zhai, O Saremi, R Paiss, J Susskind
arXiv preprint arXiv:2206.04817, 2022
25 2022 On infinite-width hypernetworks E Littwin, T Galanti, L Wolf, G Yang
Advances in neural information processing systems 33, 13226-13237, 2020
25 * 2020 The loss surface of residual networks: Ensembles and the role of batch normalization E Littwin, L Wolf
arXiv preprint arXiv:1611.02525, 2016
15 2016 Regularizing by the variance of the activations' sample-variances E Littwin, L Wolf
Advances in Neural Information Processing Systems 31, 2018
11 2018 Transformers learn through gradual rank increase E Boix-Adsera, E Littwin, E Abbe, S Bengio, J Susskind
arXiv preprint arXiv:2306.07042, 2023
9 2023 Tensor programs ivb: Adaptive optimization in the infinite-width limit G Yang, E Littwin
arXiv preprint arXiv:2308.01814, 2023
8 2023 Collegial ensembles E Littwin, B Myara, S Sabah, J Susskind, S Zhai, O Golan
Advances in Neural Information Processing Systems 33, 18738-18748, 2020
8 2020 Spherical embedding of inlier silhouette dissimilarities E Littwin, H Averbuch-Elor, D Cohen-Or
Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2015
7 2015 Adaptive Optimization in the -Width Limit E Littwin, G Yang
The Eleventh International Conference on Learning Representations, 2022
6 2022 On the convex behavior of deep neural networks in relation to the layers' width E Littwin, L Wolf
arXiv preprint arXiv:2001.04878, 2020
6 2020 Learning representation from neural fisher kernel with low-rank approximation R Zhang, S Zhai, E Littwin, J Susskind
arXiv preprint arXiv:2202.01944, 2022
5 2022 On random kernels of residual architectures E Littwin, T Galanti, L Wolf
Uncertainty in Artificial Intelligence, 897-907, 2021
4 2021 When can transformers reason with abstract symbols? E Boix-Adsera, O Saremi, E Abbe, S Bengio, E Littwin, J Susskind
arXiv preprint arXiv:2310.09753, 2023
3 2023 Biometric authentication techniques DS Prakash, LE Ballard, JV Hauck, F Tang, E Littwin, PKA Vasu, G Littwin, ...
US Patent 11,151,235, 2021
3 2021 The slingshot mechanism: An empirical study of adaptive optimizers and the\emph {Grokking Phenomenon} V Thilak, E Littwin, S Zhai, O Saremi, R Paiss, JM Susskind
Has it Trained Yet? NeurIPS 2022 Workshop, 2022
2 2022