Optimization principles and application performance evaluation of a multithreaded GPU using CUDA S Ryoo, CI Rodrigues, SS Baghsorkhi, SS Stone, DB Kirk, WW Hwu Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of …, 2008 | 1293 | 2008 |
An adaptive performance modeling tool for GPU architectures SS Baghsorkhi, M Delahaye, SJ Patel, WD Gropp, WW Hwu Proceedings of the 15th ACM SIGPLAN symposium on Principles and practice of …, 2010 | 418 | 2010 |
Program optimization space pruning for a multithreaded GPU S Ryoo, CI Rodrigues, SS Stone, SS Baghsorkhi, SZ Ueng, JA Stratton, ... Proceedings of the 6th annual IEEE/ACM international symposium on Code …, 2008 | 373 | 2008 |
CUDA-lite: Reducing GPU programming complexity SZ Ueng, M Lathara, SS Baghsorkhi, WMW Hwu Languages and Compilers for Parallel Computing: 21th International Workshop …, 2008 | 325 | 2008 |
Program optimization carving for GPU computing S Ryoo, CI Rodrigues, SS Stone, JA Stratton, SZ Ueng, SS Baghsorkhi, ... Journal of Parallel and Distributed Computing 68 (10), 1389-1401, 2008 | 162 | 2008 |
Auto-tuning of fast fourier transform on graphics processors Y Dotsenko, SS Baghsorkhi, B Lloyd, NK Govindaraju ACM SIGPLAN Notices 46 (8), 257-266, 2011 | 95 | 2011 |
Implicitly parallel programming models for thousand-core microprocessors W Hwu, S Ryoo, SZ Ueng, JH Kelm, I Gelado, SS Stone, RE Kidd, ... Proceedings of the 44th annual Design Automation Conference, 754-759, 2007 | 94 | 2007 |
Programmable coarse grained and sparse matrix compute hardware with advanced scheduling E Nurvitadhi, B Vembu, NCG Von Borries, R Barik, TH Lin, K Sinha, ... US Patent 10,186,011, 2019 | 84 | 2019 |
Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors SS Baghsorkhi, I Gelado, M Delahaye, WW Hwu ACM SIGPLAN Notices 47 (8), 23-34, 2012 | 66 | 2012 |
Performance analysis and tuning for general purpose graphics processing units (GPGPU) H Kim, R Vuduc, S Baghsorkhi, J Choi, W Hwu Synthesis Lectures on Computer Architecture 7 (2), 1-96, 2012 | 63 | 2012 |
Compute optimizations for neural networks K Nealis, A Yao, X Chen, E Ould-Ahmed-Vall, SS Baghsorkhi, ... US Patent 10,410,098, 2019 | 56 | 2019 |
Program optimization study on a 128-core GPU S Ryoo, C Rodrigues, S Stone, S Baghsorkhi, SZ Ueng, WW Hwu The First Workshop on General Purpose Processing on Graphics Processing Units 23, 2007 | 47 | 2007 |
Specialized fixed function hardware for efficient convolution R Barik, E Ould-Ahmed-Vall, X Chen, D Srivastava, A Yao, K Nealis, ... US Patent 10,824,938, 2020 | 44 | 2020 |
Coordination and increased utilization of graphics processors during inference AR Appu, A Koker, JC Weast, MB MacPherson, LL Hurd, SS Baghsorkhi, ... US Patent 10,304,154, 2019 | 42 | 2019 |
Function as a service (faas) system enhancements MR Haghighat, K Doshi, AJ Herdrich, A Mohan, RR Iyer, M Sun, K Bhuyan, ... US Patent App. 17/255,588, 2021 | 36 | 2021 |
Methods and apparatus to detect anomalies of a monitored system M Agerstam, B Sadeghi, J Martin, J Ota, J Gottschlich, M Carranza, ... US Patent 10,802,942, 2020 | 30 | 2020 |
Save: Sparsity-aware vector engine for accelerating dnn training and inference on cpus Z Gong, H Ji, CW Fletcher, CJ Hughes, S Baghsorkhi, J Torrellas 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture …, 2020 | 29 | 2020 |
FlexVec: Auto-vectorization for irregular loops SS Baghsorkhi, N Vasudevan, Y Wu Proceedings of the 37th ACM SIGPLAN Conference on Programming Language …, 2016 | 28 | 2016 |
Lightweight restricted transactional memory for speculative compiler optimization C Wang, Y Wu, SS Baghsorkhi, A Hartono, R Valentine US Patent 10,324,768, 2019 | 26 | 2019 |
Automating efficient variable-grained resiliency for low-power IoT systems SS Baghsorkhi, C Margiolas Proceedings of the 2018 International Symposium on Code Generation and …, 2018 | 25 | 2018 |