Zijia Zhao

Citata da

	Tutte	Dal 2019
Citazioni	155	155
Indice H	5	5
i10-index	4	4

20212022202320241 10 55 89

Accesso pubblico

Visualizza tutto

2 articoli

0 articoli

Disponibili

Non disponibili

In base ai mandati di finanziamento

Segui

Zijia Zhao

Institute of Automation, Chinese Academy Sciences (CASIA)

Email verificata su ia.ac.cn

Multimodal learning


Titolo Ordina per citazioni Ordina per anno Ordina per titolo	Citata da Citata da	Anno
Vast: A vision-audio-subtitle-text omni-modality foundation model and dataset S Chen, H Li, Q Wang, Z Zhao, M Sun, X Zhu, J Liu Advances in Neural Information Processing Systems 36, 2024	52	2024
OPT: Omni-perception pre-trainer for cross-modal understanding and generation J Liu, X Zhu, F Liu, L Guo, Z Zhao, M Sun, W Wang, H Lu, S Zhou, J Zhang, ... arXiv preprint arXiv:2107.00249, 2021	39	2021
Chatbridge: Bridging modalities with large language model as a language catalyst Z Zhao, L Guo, T Yue, S Chen, S Shao, X Zhu, Z Yuan, J Liu arXiv preprint arXiv:2305.16103, 2023	30	2023
Vl-mamba: Exploring state space models for multimodal learning Y Qiao, Z Yu, L Guo, S Chen, Z Zhao, M Sun, Q Wu, J Liu arXiv preprint arXiv:2403.13600, 2024	17	2024
Mm21 pre-training for video understanding challenge: Video captioning with pretraining techniques S Chen, X Zhu, D Hao, W Liu, J Liu, Z Zhao, L Guo, J Liu Proceedings of the 29th ACM International Conference on Multimedia, 4853-4857, 2021	6	2021
Mamo: Fine-grained vision-language representations learning with masked multimodal modeling Z Zhao, L Guo, X He, S Shao, Z Yuan, J Liu Proceedings of the 46th International ACM SIGIR Conference on Research and …, 2023	5	2023
Mamo: masked multimodal modeling for fine-grained vision-language representation learning Z Zhao, L Guo, X He, S Shao, Z Yuan, J Liu arXiv preprint arXiv:2210.04183, 2022	4	2022
Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with Human Intentions W Wang, Y Zhang, X He, Y Yan, Z Zhao, X Wang, J Liu arXiv preprint arXiv:2402.11265, 2024	1	2024
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models T Yue, J Cheng, L Guo, X Dai, Z Zhao, X He, G Xiong, Y Lv, J Liu Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2024	1	2024
OneDiff: A Generalist Model for Image Difference E Hu, L Guo, T Yue, Z Zhao, S Xue, J Liu arXiv preprint arXiv:2407.05645, 2024		2024
Towards Event-oriented Long Video Understanding Y Du, K Zhou, Y Huo, Y Li, WX Zhao, H Lu, Z Zhao, B Wang, W Chen, ... arXiv preprint arXiv:2406.14129, 2024		2024
Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs Z Zhao, H Lu, Y Huo, Y Du, T Yue, L Guo, B Wang, W Chen, J Liu arXiv preprint arXiv:2406.09367, 2024		2024

Il sistema al momento non può eseguire l'operazione. Riprova più tardi.

Articoli 1–12

Citazioni per anno

Citazioni duplicate

Citazioni unite

Aggiungi coautoriCoautori

Segui

Citata da