Saurabh Gupta
Saurabh Gupta
AMD Server Performance
Email verificata su ncsu.edu - Home page
Titolo
Citata da
Citata da
Anno
Understanding GPU errors on large-scale HPC systems and the implications for system design and operation
D Tiwari, S Gupta, J Rogers, D Maxwell, P Rech, S Vazhkudai, D Oliveira, ...
2015 IEEE 21st International Symposium on High Performance Computer …, 2015
1302015
Failures in large scale systems: long-term measurement, analysis, and implications
S Gupta, T Patel, C Engelmann, D Tiwari
Proceedings of the International Conference for High Performance Computing …, 2017
872017
Lazy checkpointing: Exploiting temporal locality in failures to mitigate checkpointing overheads on extreme-scale systems
D Tiwari, S Gupta, SS Vazhkudai
Dependable Systems and Networks (DSN), 2014 44th Annual IEEE/IFIP …, 2014
832014
A large-scale study of soft-errors on gpus in the field
B Nie, D Tiwari, S Gupta, E Smirni, JH Rogers
2016 IEEE International Symposium on High Performance Computer Architecture …, 2016
712016
Reliability lessons learned from GPU experience with the Titan supercomputer at Oak Ridge leadership computing facility
D Tiwari, S Gupta, G Gallarno, J Rogers, D Maxwell
Proceedings of the international conference for high performance computing …, 2015
692015
Understanding and exploiting spatial properties of system failures on extreme-scale hpc systems
S Gupta, D Tiwari, C Jantzi, J Rogers, D Maxwell
2015 45th Annual IEEE/IFIP International Conference on Dependable Systems …, 2015
612015
Locality Principle Revisited: A Probability-Based Quantitative Approach
S Gupta, P Xiang, Y Yang, H Zhou
Parallel & Distributed Processing Symposium (IPDPS), 2012 IEEE 26th …, 2012
542012
Adaptive cache bypassing for inclusive last level caches
S Gupta, H Gao, H Zhou
Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International …, 2013
492013
Best practices and lessons learned from deploying and operating large-scale data-centric parallel file systems
S Oral, J Simmons, J Hill, D Leverman, F Wang, M Ezell, R Miller, D Fuller, ...
SC'14: Proceedings of the International Conference for High Performance …, 2014
482014
Reducing waste in extreme scale systems through introspective analysis
L Bautista-Gomez, A Gainaru, S Perarnau, D Tiwari, S Gupta, ...
2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2016
392016
Machine learning models for GPU error prediction in a large scale HPC system
B Nie, J Xue, S Gupta, T Patel, C Engelmann, E Smirni, D Tiwari
2018 48th Annual IEEE/IFIP International Conference on Dependable Systems …, 2018
342018
Characterizing temperature, power, and soft-error behaviors in data center systems: Insights, challenges, and opportunities
B Nie, J Xue, S Gupta, C Engelmann, E Smirni, D Tiwari
2017 IEEE 25th International Symposium on Modeling, Analysis, and Simulation …, 2017
342017
Power-capping aware checkpointing: On the interplay among power-capping, temperature, reliability, performance, and energy
K Tang, D Tiwari, S Gupta, P Huang, Q Lu, C Engelmann, X He
2016 46th Annual IEEE/IFIP International Conference on Dependable Systems …, 2016
222016
A model-driven approach to warp/thread-block level GPU cache bypassing
H Dai, C Li, H Zhou, S Gupta, C Kartsaklis, M Mantor
2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), 1-6, 2016
172016
Improving large-scale storage system performance via topology-aware and balanced data placement
F Wang, S Oral, S Gupta, D Tiwari, SS Vazhkudai
2014 20th IEEE International Conference on Parallel and Distributed Systems …, 2014
152014
A Multi-faceted Approach to Job Placement for Improved Performance on Extreme-Scale Systems
C Zimmer, S Gupta, S Atchley, SS Vazhkudai, C Albing
29th International Conference on High Performance Computing, Networking …, 2016
142016
Spatial Locality-Aware Cache Partitioning for Effective Cache Sharing
S Gupta, H Zhou
ICPP, 2015
132015
Adaptive power profiling for many-core HPC architectures
J Kelley, C Stewart, D Tiwari, S Gupta
2016 IEEE International Conference on Autonomic Computing (ICAC), 179-188, 2016
122016
Understanding and analyzing interconnect errors and network congestion on a large scale HPC system
M Kumar, S Gupta, T Patel, M Wilder, W Shi, S Fu, C Engelmann, D Tiwari
2018 48th Annual IEEE/IFIP International Conference on Dependable Systems …, 2018
112018
Analyzing locality of memory references in GPU architectures
S Gupta, P Xiang, H Zhou
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and …, 2013
92013
Il sistema al momento non può eseguire l'operazione. Riprova più tardi.
Articoli 1–20