Accelerating low bit-width convolutional neural networks with embedded FPGA L Jiao, C Luo, W Cao, X Zhou, L Wang 2017 27th international conference on field programmable logic and …, 2017 | 84 | 2017 |
Towards efficient deep neural network training by FPGA-based batch-level parallelism C Luo, MK Sit, H Fan, S Liu, W Luk, C Guo Journal of Semiconductors 41 (2), 022403, 2020 | 58 | 2020 |
F-E3D: FPGA-based acceleration of an efficient 3D convolutional neural network for human action recognition H Fan, C Luo, C Zeng, M Ferianc, Z Que, S Liu, X Niu, W Luk 2019 IEEE 30th international conference on Application-specific Systems …, 2019 | 44 | 2019 |
Rna: An accurate residual network accelerator for quantized and reconstructed deep neural networks C Luo, W Cao, L Wang, PHW Leong IEICE Transactions on Information and Systems 102 (5), 1037-1045, 2019 | 23 | 2019 |
Moneo: Non-intrusive Fine-grained Monitor for AI Infrastructure Y Jiang, Y Xiong, L Qu, C Luo, C Tian, P Cheng, Y Xiong ICC 2022-IEEE International Conference on Communications, 2586-2591, 2022 | 4 | 2022 |
Moneo: Monitoring fine-grained metrics nonintrusively in AI infrastructure Y Jiang, Y Xiong, L Qu, CL Luo, C Tian, P Cheng, Y Xiong ACM SIGOPS Operating Systems Review 56 (1), 18-25, 2022 | 3 | 2022 |
CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner C Luo, L Qu, Y Miao, P Cheng, Y Xiong arXiv preprint arXiv:2103.07974, 2021 | 2 | 2021 |
T3P: Demystifying Low-Earth Orbit Satellite Broadband S Tiwari, S Bhushan, A Taneja, M Kassem, C Luo, C Zhou, Z He, ... arXiv preprint arXiv:2310.11835, 2023 | 1 | 2023 |
RTP: Rethinking Tensor Parallelism with Memory Deduplication C Luo, T Zhong, G Fox arXiv preprint arXiv:2311.01635, 2023 | | 2023 |
ASAP 2019 H Fan, C Luo, W Luk | | |