Few-shot action recognition with hierarchical matching and contrastive learning S Zheng, S Chen, Q Jin European Conference on Computer Vision, 297-313, 2022 | 31 | 2022 |
Visual relation detection with multi-level attention S Zheng, S Chen, Q Jin Proceedings of the 27th ACM international conference on multimedia, 121-129, 2019 | 23 | 2019 |
Skeleton-based interactive graph network for human object interaction detection S Zheng, S Chen, Q Jin 2020 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2020 | 16 | 2020 |
Relation understanding in videos S Zheng, X Chen, S Chen, Q Jin Proceedings of the 27th ACM International Conference on Multimedia, 2662-2666, 2019 | 15 | 2019 |
Vrdformer: End-to-end video visual relation detection with transformers S Zheng, S Chen, Q Jin Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2022 | 12 | 2022 |
Open-category human-object interaction pre-training via language modeling framework S Zheng, B Xu, Q Jin Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023 | 8 | 2023 |
Towards general computer control: A multimodal agent for red dead redemption ii as a case study W Tan, Z Ding, W Zhang, B Li, B Zhou, J Yue, H Xia, J Jiang, L Zheng, ... arXiv preprint arXiv:2403.03186, 2024 | 5 | 2024 |
Llama rider: Spurring large language models to explore the open world Y Feng, Y Wang, J Liu, S Zheng, Z Lu arXiv preprint arXiv:2310.08922, 2023 | 5 | 2023 |
Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds S Zheng, Y Feng, Z Lu The Twelfth International Conference on Learning Representations, 2023 | 4 | 2023 |
Accommodating audio modality in CLIP for multimodal processing L Ruan, A Hu, Y Song, L Zhang, S Zheng, Q Jin Proceedings of the AAAI Conference on Artificial Intelligence 37 (8), 9641-9649, 2023 | 4 | 2023 |
Exploring anchor-based detection for ego4d natural language query S Zheng, Q Zhang, B Liu, Q Jin, J Fu arXiv preprint arXiv:2208.05375, 2022 | 4 | 2022 |
UniCode: Learning a Unified Codebook for Multimodal Large Language Models S Zheng, B Zhou, Y Feng, Y Wang, Z Lu arXiv preprint arXiv:2403.09072, 2024 | 1 | 2024 |
Anchor-based detection for natural language localization in ego-centric videos B Liu, S Zheng, J Fu, WH Cheng 2023 IEEE International Conference on Consumer Electronics (ICCE), 01-04, 2023 | 1 | 2023 |
SPAFormer: Sequential 3D Part Assembly with Transformers B Xu, S Zheng, Q Jin arXiv preprint arXiv:2403.05874, 2024 | | 2024 |
POV: Prompt-Oriented View-Agnostic Learning for Egocentric Hand-Object Interaction in the Multi-view World B Xu, S Zheng, Q Jin Proceedings of the 31st ACM International Conference on Multimedia, 2807-2816, 2023 | | 2023 |
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection Q Zhang, S Zheng, Q Jin arXiv preprint arXiv:2307.10567, 2023 | | 2023 |
Supplementary Material for Open-Category Human-Object Interaction Pre-training via Language Modeling Framework S Zheng, B Xu, Q Jin relation 50 (100), 100, 0 | | |
Supplementary Material for VRDFormer: End-to-End Video Visual Relation Detection with Transformers S Zheng, S Chen, Q Jin | | |