Scott Levy
Title
Cited by
Cited by
Year
Understanding the effects of communication and coordination on checkpointing at scale
KB Ferreira, P Widener, S Levy, D Arnold, T Hoefler
Proceedings of the international conference for high performance computing …, 2014
302014
Using simulation to evaluate the performance of resilience strategies at scale
S Levy, B Topp, KB Ferreira, D Arnold, T Hoefler, P Widener
International Workshop on Performance Modeling, Benchmarking and Simulation …, 2013
302013
Understanding performance interference in next-generation HPC systems
OH Mondragon, PG Bridges, S Levy, KB Ferreira, P Widener
SC'16: Proceedings of the International Conference for High Performance …, 2016
122016
Characterizing MPI matching via trace-based simulation
KB Ferreira, S Levy, K Pedretti, RE Grant
Parallel Computing 77, 57-83, 2018
112018
Using unreliable virtual hardware to inject errors in extreme-scale systems
S Levy, MGF Dosanjh, PG Bridges, KB Ferreira
Proceedings of the 3rd Workshop on Fault-tolerance for HPC at extreme scale …, 2013
112013
Exploring the effect of noise on the performance benefit of nonblocking allreduce
P Widener, KB Ferreira, S Levy, T Hoefler
Proceedings of the 21st European MPI Users' Group Meeting, 77, 2014
102014
Lessons learned from memory errors observed over the lifetime of Cielo
S Levy, KB Ferreira, N DeBardeleben, T Siddiqua, V Sridharan, ...
SC18: International Conference for High Performance Computing, Networking …, 2018
82018
EMPRESS—Extensible Metadata PRovider for Extreme-scale Scientific Simulations
M Lawson, C Ulmer, S Mukherjee, G Templet, J Lofstead, S Levy, ...
Proceedings of the 2nd Joint International Workshop on Parallel Data Storage …, 2017
82017
Improving dram fault characterization through machine learning
E Baseman, N DeBardeleben, K Ferreira, S Levy, S Raasch, V Sridharan, ...
2016 46th Annual IEEE/IFIP International Conference on Dependable Systems …, 2016
82016
Scheduling in-situ analytics in next-generation applications
OH Mondragon, PG Bridges, S Levy, KB Ferreira, P Widener
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016
82016
Evaluating the feasibility of using memory content similarity to improve system resilience
S Levy, PG Bridges, KB Ferreira, AP Thompson, C Trott
Proceedings of the 3rd International Workshop on Runtime and Operating …, 2013
82013
Faodel: Data Management for Next-Generation Application Workflows
C Ulmer, S Mukherjee, G Templet, S Levy, J Lofstead, P Widener, ...
Proceedings of the 9th Workshop on Scientific Cloud Computing, 8, 2018
72018
Lifetime memory reliability data from the field
T Siddiqua, V Sridharan, SE Raasch, N DeBardeleben, KB Ferreira, ...
2017 IEEE International Symposium on Defect and Fault Tolerance in VLSI and …, 2017
62017
On noise and the performance benefit of nonblocking collectives
PM Widener, S Levy, KB Ferreira, T Hoefler
The International Journal of High Performance Computing Applications 30 (1 …, 2016
62016
Using simulation to evaluate the performance of resilience strategies and process failures
S Levy, B Topp, KB Ferreira, D Arnold, P Widener, T Hoefler
Sandia National Laboratories, Technical Report SAND2014-0688, 2014
62014
Improving application resilience to memory errors with lightweight compression
S Levy, KB Ferreira, PG Bridges
High Performance Computing, Networking, Storage and Analysis, SC16 …, 2016
52016
Asking the right questions: benchmarking fault-tolerant extreme-scale systems
PM Widener, KB Ferreira, S Levy, PG Bridges, D Arnold, R Brightwell
European Conference on Parallel Processing, 717-726, 2013
52013
An examination of content similarity within the memory of hpc applications
S Levy, KB Ferreira, PG Bridges, AP Thompson, C Trott
Sandia National Laboratory, Tech. Rep. SAND2013-0055, 2013
52013
How I Learned to Stop Worrying and Love In Situ Analytics: Leveraging Latent Synchronization in MPI Collective Algorithms
S Levy, KB Ferreira, P Widener, PG Bridges, OH Mondragon
Proceedings of the 23rd European MPI Users' Group Meeting, 140-153, 2016
42016
An Examination of the Impact of Failure Distribution on Coordinated Checkpoint/Restart
S Levy, KB Ferreira
Proceedings of the ACM Workshop on Fault-Tolerance for HPC at Extreme Scale …, 2016
42016
The system can't perform the operation now. Try again later.
Articles 1–20