Follow
zili zhang
Title
Cited by
Cited by
Year
Fast distributed inference serving for large language models
B Wu, Y Zhong, Z Zhang, G Huang, X Liu, X Jin
arXiv preprint arXiv:2305.05920, 2023
292023
Transparent {GPU} sharing in container clouds for deep learning workloads
B Wu, Z Zhang, Z Bai, X Liu, X Jin
20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023
152023
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation
C Jin, Z Zhang, X Jiang, F Liu, X Liu, X Liu, X Jin
arXiv preprint arXiv:2404.12457, 2024
42024
Ditto: Efficient serverless analytics with elastic parallelism
C Jin, Z Zhang, X Xiang, S Zou, G Huang, X Liu, X Jin
Proceedings of the ACM SIGCOMM 2023 Conference, 406-419, 2023
42023
Fast Vector Query Processing for Large Datasets Beyond {GPU} Memory with Reordered Pipelining
Z Zhang, F Liu, G Huang, X Liu, X Jin
21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024
22024
Rise of Distributed Deep Learning Training in the Big Model Era: From a Software Engineering Perspective
X Liu, D Gu, Z Chen, J Wen, Z Zhang, Y Ma, H Wang, X Jin
ACM Transactions on Software Engineering and Methodology 32 (6), 1-26, 2023
22023
Fast, Approximate Vector Queries on Very Large Unstructured Datasets
Z Zhang, C Jin, L Tang, X Liu, X Jin
20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023
22023
Optimizing half precision Winograd convolution on ARM many-core processors
D Xie, Z Jia, Z Zhang, X Jin
Proceedings of the 13th ACM SIGOPS Asia-Pacific Workshop on Systems, 53-60, 2022
22022
Jolteon: Unleashing the Promise of Serverless for Serverless Workflows
Z Zhang, C Jin, X Jin
21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–9