WenLan: Bridging vision and language by large-scale multi-modal pre-training Y Huo, M Zhang, G Liu, H Lu, Y Gao, G Yang, J Wen, H Zhang, B Xu, ... arXiv preprint arXiv:2103.06561, 2021 | 117 | 2021 |
Advancing high-resolution video-language representation with large-scale video transcriptions H Xue, T Hang, Y Zeng, Y Sun, B Liu, H Yang, J Fu, B Guo Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 102 | 2022 |
Clip-vip: Adapting pre-trained image-text model to video-language alignment H Xue, Y Sun, B Liu, J Fu, R Song, H Li, J Luo The Eleventh International Conference on Learning Representations, 2022 | 92* | 2022 |
Long-form video-language pre-training with multimodal temporal contrastive learning Y Sun, H Xue, R Song, B Liu, H Yang, J Fu Advances in neural information processing systems 35, 38032-38045, 2022 | 48 | 2022 |
Parrot: Enhancing Multi-Turn Chat Models by Learning to Ask Questions Y Sun, C Liu, J Huang, R Song, F Zhang, D Zhang, Z Wang, K Gai arXiv preprint arXiv:2310.07301, 2023 | 2 | 2023 |
Going Beyond Closed Sets: A Multimodal Perspective for Video Emotion Analysis H Pu, Y Sun, R Song, X Chen, H Jiang, Y Liu, Z Cao Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 233-244, 2023 | 1 | 2023 |
Joint Semantic and Strategy Matching for Persuasive Dialogue C Jin, Y Zhu, L Kong, S Li, X Zhang, R Song, X Chen, H Chen, Y Sun, ... Findings of the Association for Computational Linguistics: EMNLP 2023, 4187-4197, 2023 | | 2023 |
TeViS: Translating Text Synopses to Video Storyboards X Gu, Y Sun, F Ni, S Chen, X Wang, R Song, B Li, X Cao Proceedings of the 31st ACM International Conference on Multimedia, 4968-4979, 2023 | | 2023 |
Expanding the Horizons: Exploring Further Steps in Open-Vocabulary Segmentation X Wang, L Ji, K Yan, Y Sun, R Song Chinese Conference on Pattern Recognition and Computer Vision (PRCV), 407-419, 2023 | | 2023 |
ViCo: Engaging Video Comment Generation with Human Preference Rewards Y Sun, B Liu, X Chen, R Song, J Fu arXiv preprint arXiv:2308.11171, 2023 | | 2023 |
Difference between Multi-modal vs. Text Pre-trained Models in Embedding Text Y Sun, X Cheng, R Song, W Che, Z Lu, J Wen Beijing Da Xue Xue Bao 59 (1), 48-56, 2023 | | 2023 |
Supplementary Material: Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning Y Sun, H Xue, R Song, B Liu, H Yang, J Fu | | |
Supplementary Material: Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions H Xue, T Hang, Y Zeng, Y Sun, B Liu, H Yang, J Fu, B Guo | | |