Bingyang Wu
Cited by
Cited by
AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction
S Zheng, R Chen, A Wei, Y Jin, Q Han, L Lu, B Wu, X Li, S Yan, Y Liang
Proceedings of the 49th Annual International Symposium on Computer …, 2022
Fast distributed inference serving for large language models
B Wu, Y Zhong, Z Zhang, G Huang, X Liu, X Jin
arXiv preprint arXiv:2305.05920, 2023
A survey of resource-efficient llm and multimodal foundation models
M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu, Y Zhao, C Yang, S Wang, ...
arXiv preprint arXiv:2401.08092, 2024
Transparent {GPU} sharing in container clouds for deep learning workloads
B Wu, Z Zhang, Z Bai, X Liu, X Jin
20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023
Neoflow: A flexible framework for enabling efficient compilation for high performance dnn training
S Zheng, R Chen, Y Jin, A Wei, B Wu, X Li, S Yan, Y Liang
IEEE Transactions on Parallel and Distributed Systems 33 (11), 3220-3232, 2021
LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism
B Wu, S Liu, Y Zhong, P Sun, X Liu, X Jin
arXiv preprint arXiv:2404.09526, 2024
Xron: A hybrid elastic cloud overlay network for video conferencing at planetary scale
B Wu, K Qian, B Li, Y Ma, Q Zhang, Z Jiang, J Zhao, D Cai, E Zhai, X Liu, ...
Proceedings of the ACM SIGCOMM 2023 Conference, 696-709, 2023
{dLoRA}: Dynamically Orchestrating Requests and Adapters for {LoRA}{LLM} Serving
B Wu, R Zhu, Z Zhang, P Sun, X Liu, X Jin
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024
The system can't perform the operation now. Try again later.
Articles 1–8