Ammar Ahmad Awan

Citata da

	Tutte	Dal 2019
Citazioni	1621	1446
Indice H	21	19
i10-index	28	25

460

230

115

345

2013201420152016201720182019202020212022202320245 13 19 20 26 84 105 135 189 195 366 453

Accesso pubblico

Visualizza tutto

18 articoli

5 articoli

Disponibili

Non disponibili

In base ai mandati di finanziamento

Coautori

Dhabaleswar K. PandaProfessor of Computer Science, The Ohio State UniversityEmail verificata su cse.ohio-state.edu
Hari SubramoniThe Ohio State UniversityEmail verificata su cse.ohio-state.edu
He YuxiongMicrosoft ResearchEmail verificata su microsoft.com
Ching-Hsiang ChuResearch Scientist, Meta/FacebookEmail verificata su meta.com
Khaled HamidoucheAMD ResearchEmail verificata su amd.com
Jeff RasleyMicrosoftEmail verificata su microsoft.com
Reza Yazdani AminabadiMicrosoft ResearchEmail verificata su microsoft.com
Minjia ZhangUniversity of Illinois at Urbana-ChampaginEmail verificata su illinois.edu
Arpan JainThe Ohio State UniversityEmail verificata su osu.edu
Olatunji RuwaseMicrosoft ResearchEmail verificata su microsoft.com
Conglong LiSenior Researcher at Microsoft, CMU Ph.D.Email verificata su microsoft.com
Akshay VenkateshNVIDIA; Ohio State UniversityEmail verificata su nvidia.com
Quentin AnthonyPhD Student, Ohio State UniversityEmail verificata su osu.edu
Jahanzeb HashmiSenior Architect, NVIDIAEmail verificata su nvidia.com
Zhewei YaoSnowflakeEmail verificata su snowflake.com
Xiaoyi LuAssociate Professor, University of California, MercedEmail verificata su ucmerced.edu
Kawthar Shafie KhorassaniAMDEmail verificata su amd.com
(Altamont) Bracy Hamilton EltonPenguin ComputingEmail verificata su bracyelton.com
Raghu MachirajuProfessor of Computer Science and Engineering, Bioinformatics and PathologyEmail verificata su osu.edu
Anil ParwaniProfessor of Pathology and Biomedical InformaticsEmail verificata su osumc.edu

Segui

Ammar Ahmad Awan

Microsoft

Email verificata su osu.edu - Home page

Deep Learning HPC Parallel I/O MPI Cloud Computing


Titolo Ordina per citazioni Ordina per anno Ordina per titolo	Citata da Citata da	Anno
Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale RY Aminabadi, S Rajbhandari, AA Awan, C Li, D Li, E Zheng, O Ruwase, ... SC22: International Conference for High Performance Computing, Networking …, 2022	189	2022
S-caffe: Co-designing mpi runtimes and caffe for scalable deep learning on modern gpu clusters AA Awan, K Hamidouche, JM Hashmi, DK Panda ACM PPoPP '17 52 (8), 193-205, 2017	178	2017
Deepspeed-moe: Advancing mixture-of-experts inference and training to power next-generation ai scale S Rajbhandari, C Li, Z Yao, M Zhang, RY Aminabadi, AA Awan, J Rasley, ... International conference on machine learning, 18332-18346, 2022	175	2022
Phi-3 technical report: A highly capable language model locally on your phone M Abdin, SA Jacobs, AA Awan, J Aneja, A Awadallah, H Awadalla, ... arXiv preprint arXiv:2404.14219, 2024	121	2024
An in-depth performance characterization of CPU-and GPU-based DNN training on modern architectures AA Awan, H Subramoni, DK Panda Proceedings of the Machine Learning on HPC Environments, 1-8, 2017	82	2017
1-bit adam: Communication efficient large-scale training with adam’s convergence speed H Tang, S Gan, AA Awan, S Rajbhandari, C Li, X Lian, J Liu, C Zhang, ... International Conference on Machine Learning, 10118-10129, 2021	75	2021
Scalable and efficient moe training for multitask multilingual models YJ Kim, AA Awan, A Muzio, AFC Salinas, L Lu, A Hendy, S Rajbhandari, ... arXiv preprint arXiv:2109.10465, 2021	66	2021
Scalable distributed dnn training using tensorflow and cuda-aware mpi: Characterization, designs, and performance evaluation AA Awan, J Bédorf, CH Chu, H Subramoni, DK Panda 2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2019	57	2019
Efficient large message broadcast using NCCL and CUDA-aware MPI for deep learning AA Awan, K Hamidouche, A Venkatesh, DK Panda Proceedings of the 23rd European MPI Users' Group Meeting, 15-22, 2016	56	2016
Optimized broadcast for deep learning workloads on dense-GPU InfiniBand clusters: MPI or NCCL? AA Awan, CH Chu, H Subramoni, DK Panda Proceedings of the 25th European MPI Users' Group Meeting, 1-9, 2018	54	2018
Gems: Gpu-enabled memory-aware model-parallelism system for distributed dnn training A Jain, AA Awan, AM Aljuhani, JM Hashmi, QG Anthony, H Subramoni, ... SC20: International Conference for High Performance Computing, Networking …, 2020	49	2020
Privacy-aware searching with oblivious term matching for cloud storage Z Pervez, AA Awan, AM Khattak, S Lee, EN Huh The Journal of Supercomputing 63, 538-560, 2013	47	2013
Nv-group: link-efficient reduction for distributed deep learning on modern dense gpu systems CH Chu, P Kousha, AA Awan, KS Khorassani, H Subramoni, DK Panda Proceedings of the 34th ACM International Conference on Supercomputing, 1-12, 2020	42	2020
Performance characterization of dnn training using tensorflow and pytorch on modern clusters A Jain, AA Awan, Q Anthony, H Subramoni, DKDK Panda 2019 IEEE International Conference on Cluster Computing (CLUSTER), 1-11, 2019	41	2019
Deepspeed-chat: Easy, fast and affordable rlhf training of chatgpt-like models at all scales Z Yao, RY Aminabadi, O Ruwase, S Rajbhandari, X Wu, AA Awan, ... arXiv preprint arXiv:2308.01320, 2023	39	2023
Oc-dnn: Exploiting advanced unified memory capabilities in cuda 9 and volta gpus for out-of-core dnn training AA Awan, CH Chu, H Subramoni, X Lu, DK Panda 2018 IEEE 25th International Conference on High Performance Computing (HiPC …, 2018	37	2018
1-bit LAMB: communication efficient large-scale large-batch training with LAMB’s convergence speed C Li, AA Awan, H Tang, S Rajbhandari, Y He 2022 IEEE 29th International Conference on High Performance Computing, Data …, 2022	29	2022
Scaling tensorflow, pytorch, and mxnet using mvapich2 for high-performance deep learning on frontera A Jain, AA Awan, H Subramoni, DK Panda 2019 IEEE/ACM Third Workshop on Deep Learning on Supercomputers (DLS), 76-83, 2019	28	2019
Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ... 2015 IEEE International Conference on Cluster Computing, 78-87, 2015	28	2015
Cuda kernel based collective reduction operations on large-scale gpu clusters CH Chu, K Hamidouche, A Venkatesh, AA Awan, DK Panda 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016	27	2016

Il sistema al momento non può eseguire l'operazione. Riprova più tardi.

Articoli 1–20

Citazioni per anno

Citazioni duplicate

Citazioni unite

Aggiungi coautoriCoautori

Segui

Citata da

Coautori