Follow
Cheng Li
Title
Cited by
Cited by
Year
Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers
J Hauswald, MA Laurenzano, Y Zhang, C Li, A Rovinski, A Khurana, ...
Proceedings of the Twentieth International Conference on Architectural …, 2015
3312015
Stochastic circuits for real-time image-processing applications
A Alaghi, C Li, JP Hayes
Proceedings of the 50th annual design automation conference, 1-6, 2013
3132013
Djinn and tonic: Dnn as a service and its implications for future warehouse scale computers
J Hauswald, Y Kang, MA Laurenzano, Q Chen, C Li, T Mudge, ...
ACM SIGARCH Computer Architecture News 43 (3S), 27-40, 2015
1972015
Deepspeed-inference: enabling efficient inference of transformer models at unprecedented scale
RY Aminabadi, S Rajbhandari, AA Awan, C Li, D Li, E Zheng, O Ruwase, ...
SC22: International Conference for High Performance Computing, Networking …, 2022
1112022
Accelerating reduction and scan using tensor core units
A Dakkak, C Li, J Xiong, I Gelado, W Hwu
Proceedings of the ACM International Conference on Supercomputing, 46-57, 2019
892019
KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism
I El Hajj, J Gómez-Luna, C Li, LW Chang, D Milojicic, W Hwu
2016 49th Annual IEEE/ACM International Symposium on Microarchitecture …, 2016
442016
Evaluating characteristics of CUDA communication primitives on high-bandwidth interconnects
C Pearson, A Dakkak, S Hashash, C Li, IH Chung, J Xiong, WM Hwu
Proceedings of the 2019 ACM/SPEC International Conference on Performance …, 2019
322019
Designing future warehouse-scale computers for sirius, an end-to-end voice and vision personal assistant
J Hauswald, MA Laurenzano, Y Zhang, H Yang, Y Kang, C Li, A Rovinski, ...
ACM Transactions on Computer Systems (TOCS) 34 (1), 1-32, 2016
312016
XSP: Across-stack profiling and analysis of machine learning models on GPUs
C Li, A Dakkak, J Xiong, W Wei, L Xu, W Hwu
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2020
30*2020
Trims: Transparent and isolated model sharing for low latency deep learning inference in function-as-a-service
A Dakkak, C Li, SG De Gonzalo, J Xiong, W Hwu
2019 IEEE 12th International Conference on Cloud Computing (CLOUD), 372-382, 2019
302019
Zeroquant-v2: Exploring post-training quantization in llms from comprehensive study to low rank compensation
Z Yao, X Wu, C Li, S Youn, Y He
arXiv preprint arXiv:2303.08302, 2023
232023
Ai matrix: A deep learning benchmark for alibaba data centers
W Zhang, W Wei, L Xu, L Jin, C Li
arXiv preprint arXiv:1909.10562, 2019
182019
Understanding int4 quantization for transformer models: Latency speedup, composability, and failure cases
X Wu, C Li, RY Aminabadi, Z Yao, Y He
arXiv preprint arXiv:2301.12017, 2023
162023
Frustrated with replicating claims of a shared model? a solution
A Dakkak, C Li, J Xiong, WM Hwu
arXiv preprint arXiv:1811.09737, 2018
16*2018
Matrix factorization on gpus with memory optimization and approximate computing
W Tan, S Chang, L Fong, C Li, Z Wang, L Cao
Proceedings of the 47th International Conference on Parallel Processing, 1-10, 2018
162018
Acm
Y Wang, C Li, X Shao, Y Chen, F Yan, Y Xu
URL: http://doi. acm. org/10.1145/3209978.3210061, doi 10 (3209978.3210061 …, 2018
132018
A comprehensive study on post-training quantization for large language models
Z Yao, C Li, X Wu, S Youn, Y He
arXiv preprint arXiv:2303.08302, 2023
122023
The design and implementation of a scalable deep learning benchmarking platform
C Li, A Dakkak, J Xiong, W Hwu
2020 IEEE 13th International Conference on Cloud Computing (CLOUD), 414-425, 2020
112020
Random-ltd: Random and layerwise token dropping brings efficient training for large-scale transformers
Z Yao, X Wu, C Li, C Holmes, M Zhang, C Li, Y He
arXiv preprint arXiv:2211.11586, 2022
102022
Benanza: Automatic μBenchmark Generation to Compute" Lower-bound" Latency and Inform Optimizations of Deep Learning Models on GPUs
C Li, A Dakkak, J Xiong, W Hwu
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2020
92020
The system can't perform the operation now. Try again later.
Articles 1–20