Xing SUN, Ph.D.

Principal Researcher

Tencent

Email: winfred.sun at gmail dot com

Github: https://github.com/X-ing

Short Bio

Xing Sun (孙星) is currently a Principal researcher and Team Manager in Tencent YoutuLab. He received his Ph.D. degree fromThe University of Hong Kong in 2016, under the supervision of Prof. Edmund Y. Lam in Imaging Systems Laboratory, and Dr. Nelson Yung in Laboratory for Intelligent Transportation Systems Research. He received his B.S. degree at Nanjing University of Science and Technology in Jun. 2012. He finished my bachelor dissertation in Lehrstuhl für Hochfrequenztechnik from Technische Universität München in Spring, 2012.

Academic Activities

Recent Research Topics

Large Language Model (LLM)
including Supervised Fine-Tuning (SFT), Reinforcement learning from human feedback (RLHF), Retrieval Augmented Generation (RAG) for LLMs, LLM Agents etc

Multimodal Large Language Model (MLLM)
including Multimodal Instruction Tuning (M-IT), Multimodal In-Context Learning (M-ICL), Multimodal Chain of Thought (M-CoT), and LLM-Aided Visual Reasoning (LAVR) etc

Recent Publications (Google Scholar)

(* Corresponding + Equal contribution)

Preprint

58. A Survey on multimodal large language models
Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, Enhong Chen
Arxiv Tech Report, 2023.

57. MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models
Chaoyou Fu, Peixian Chen, Yunhang Shen, Yulei Qin, Mengdan Zhang, Xu Lin, Jinrui Yang, Xiawu Zheng, Ke Li, Xing Sun, Yunsheng Wu, Rongrong Ji
Arxiv Tech Report, 2023.
Benchmark Website

56. Towards Robust Text Retrieval with Progressive Learning
Tong Wu, Yulei Qin, Enwei Zhang, Zihan Xu, Yuting Gao, Ke Li and Xing Sun*
Arxiv Tech Report, 2023.
Model is available!

55. A Challenger to gpt-4v? early explorations of gemini in visual expertise
Chaoyou Fu, Renrui Zhang, Haojia Lin, Zihan Wang, Timin Gao, Yongdong Luo, Yubo Huang, Zhengye Zhang, Longtian Qiu, Gaoxiang Ye, Yunhang Shen, Mengdan Zhang, Peixian Chen, Sirui Zhao, Xiawu Zheng, Shaohui Lin, Deqiang Jiang, Di Yin, Peng Gao, Ke Li, Xing Sun, Rongrong Ji
Arxiv Tech Report, 2023.
Project

54. Memochat: Tuning llms to use memos for consistent long-range open-domain conversation
Junru Lu, Siyu An, Mingbao Lin, Gabriele Pergola, Yulan He, Di Yin, Xing Sun and Yunsheng Wu
Arxiv Tech Report, 2023.

53. FIPO: Free-form Instruction-oriented Prompt Optimization with Preference Dataset and Modular Fine-tuning Schema
Junru Lu, Siyu An, Min Zhang, Yulan He, Di Yin and Xing Sun
Arxiv Tech Report, 2024.

52. Woodpecker: Hallucination correction for multimodal large language models
Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, Enhong Chen
Arxiv Tech Report, 2023.
Code is available!

51. RMNet: Equivalently Removing Residual Connection from Networks
Fanxu Meng, Hao Cheng, Jiaxin Zhuang, Ke Li, and Xing Sun*
Arxiv Tech Report, 2021.
Code is available!

2024

50. Multi-dataset Detection with Transformers
Bo Ke, Ruizhi Qiao, Xing Sun
International Journal of Computer Vision (IJCV), 2024.

49. Turning a CLIP Model into a Scene Text Spotter
Wenwen Yu, Yuliang Liu, Xingkui Zhu, Haoyu Cao, Xing Sun, Xiang Bai
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024.

48. HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu, Kun Yin, Haoyu Cao, Xinghua Jiang, Xin Li, Yinsong Liu, Deqiang Jiang, Xing Sun, Linli Xu
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

47. Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Xin Li, Yunfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

46. Aligning and Prompting Everything All at Once for Universal Visual Perception
Yunhang Shen, Chaoyou Fu, Peixian Chen, Mengdan Zhang, Ke Li, Xing Sun, Yunsheng Wu, Shaohui Lin, Rongrong Ji
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

45. A General and Efficient Training for Transformer via Token Expansion
Wenxuan Huang, Yunhang Shen, Jiao Xie, Baochang Zhang, Gaoqi He, Ke Li, Xing Sun, Shaohui Lin
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

44. SPD-DDPM: Denoising Diffusion Probabilistic Models in the Symmetric Positive Definite Space
Yunchen Li, Zhou Yu, Gaoqi He, Yunhang Shen, Ke Li, Xing Sun, Shaohui Lin
The AAAI Conference on Artificial Intelligence (AAAI), 2024.

43. SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger
Yuting Gao, Jinfeng Liu, Zihan Xu, Tong Wu, Wei Liu, Jie Yang, Ke Li, Xing Sun
The AAAI Conference on Artificial Intelligence (AAAI), 2024.

42. Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation
Hao Liu, Xin Li, Mingming Gong, Bing Liu, Yunfei Wu, Deqiang Jiang, Yinsong Liu, Xing Sun
The AAAI Conference on Artificial Intelligence (AAAI), 2024.

41. Visual Hallucination Elevates Speech Recognition
Fang Zhang, Yongxin Zhu, Xiangxiang Wang, Huang Chen, Xing Sun, Linli Xu
The AAAI Conference on Artificial Intelligence (AAAI), 2024.

40. Sinkhorn Distance Minimization for Knowledge Distillation
Xiao Cui, Yulei Qin, Yuting Gao, Enwei Zhang, Zihan Xu, Tong Wu, Ke Li, Xing Sun, Wengang Zhou, Houqiang Li
International Conference on Computational Linguistics (COLING), 2024.

2023

39. Co-Salient Object Detection with Co-Representation Purification
Ziyue Zhu, Zhao Zhang, Zheng Lin, Xing Sun, Ming-Ming Cheng
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023.
Code is available!

38. CAPro: Webly Supervised Learning with Cross-modality Aligned Prototypes
Yulei Qin, Xingyu Chen, Yunhang Shen, Chaoyou Fu, Yun Gu, Ke Li, Xing Sun, Rongrong Ji
Conference on Neural Information Processing Systems (NeurIPS), 2023.
Code is available!

37. Span-level aspect-based sentiment analysis via table filling
Mao Zhang, Yongxin Zhu, Zhen Liu, Zhimin Bao, Yunfei Wu, Xing Sun, Linli Xu
Annual Meeting of the Association for Computational Linguistics (ACL), 2023.

36. Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
Haoyu Cao, Changcun Bao, Chaohu Liu, Huang Chen, Kun Yin, Hao Liu, Yinsong Liu, Deqiang Jiang, and Xing Sun*
International Conference on Computer Vision (ICCV), 2023.

35. D3G:Exploring Gaussian Prior for Temporal Sentence Grounding with Glance Annotation
Hanjun Li, Xiujun Shu, Sunan He, Ruizhi Qiao, Wei Wen, Taian Guo, Bei Gan, and Xing Sun*
International Conference on Computer Vision (ICCV), 2023.

34. Coarse-to-Fine: Learning Compact Discriminative Representation for Single-Stage Image Retrieval
Yunquan Zhu, Xinkai Gao, Bo Ke, Ruizhi Qiao and Xing Sun*
International Conference on Computer Vision (ICCV), 2023.

33. Reciprocal Normalization for Domain Adaptation
Zhiyong Huang, Kekai Sheng, Ke Li, Jian Liang, Taiping Yao, Weiming Dong, Dengwen Zhou, and Xing Sun*
Pattern Recognition (PR), 2023.

32. Mitigating Memorization of Noisy Labels via Regularization between Representations
Hao Cheng, Zhaowei Zhu, Xing Sun and Yang Liu
International Conference on Learning Representations (ICLR), 2023.

2022

31. Contextual Non-Local Alignment over Full-Scale Representation for Text-Based Person Search
Chenyang Gao, Guanyu Cai, Xinyang Jiang, Feng Zheng, Jun Zhang, Yifei Gong and Xing Sun*
IEEE Transactions on Image Processing (TIP), 2022.
Ranked #1 on Text based Person Retrieval on CUHK-PEDES
Code is available!

30. Devil's in the Detail: Aligning Visual Clues for Conditional Embedding in Person Re-Identification
Fufu Yu, Xinyang Jiang, Yifei Gong, Shizhen Zhao, Wei-Shi Zheng, Feng Zheng and Xing Sun*
IEEE Transactions on Image Processing (TIP), 2022.
Code is available!

29. PAC-Net: Highlight Your Video via History Preference Modeling
Hang Wang, Penghao Zhou, Chong Zhou, Zhao Zhang, Xing Sun*
European Conference on Computer Vision (ECCV), 2022.

28. Efficient Decoder-free Object Detection with Transformers
Peixian Chen, Mengdan Zhang, Yunhang Shen, Kekai Sheng, Yuting Gao,Xing Sun, Ke Li, Chunhua Shen
European Conference on Computer Vision (ECCV), 2022.
Code is available!

27. DisCo: Remedy Self-supervised Learning on Lightweight Models with Distilled Contrastive Learning
Yuting Gao, Jia-Xin Zhuang, Shaohui Lin, Hao Cheng, Xing Sun , Ke Li, Chunhua Shen
European Conference on Computer Vision (ECCV), 2022. (Oral)
Code is available!

26. Self-supervised Models are Good Teaching Assistants for Vision Transformers
Haiyan Wu, Yuting Gao, Ke Li, Yinqi Zhang, Shaohui Lin, Yuan Xie, Xing Sun
IEEE Conference on Machine Learning (ICML), 2022.

25. DIFNet: Boosting Visual Information Flow for Image Captioning
Mingrui Wu,Xuying Zhang, Xiaoshuai Sun, Yiyi Zhou, Chao Chen, Jiaxin Gu, Xing Sun, Rongrong Ji
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

24. Training-free Transformer Architecture Search
Qinqin Zhou, Kekai Sheng, Xiawu Zheng, Ke Li, Xing Sun , Yonghong Tian, Jie Chen, Rongrong Ji
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2022.

23. AS-MLP: An Axial Shifted MLP Architecture for Vision
Dongze Lian, Zehao Yu, Xing Sun, Shenghua Gao
International Conference on Learning Representations (ICLR), 2022.
Code is available!

22. Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer
Yifan Xu, Zhijie Zhang, Mengdan Zhang, Kekai Sheng, Ke Li, Weiming Dong, Liqing Zhang, Changsheng Xu, Xing Sun*
The AAAI Conference on Artificial Intelligence (AAAI), 2022.

2021

21. Learning Spatially-Aware Canonical View Representation for 3D Shape Recognition with Arbitrary Views
Xin Wei, Yifei Gong, Fudong Wang,Xing Sun*, Jian Sun
International Conference on Computer Vision (ICCV), 2021.

20. Learning to Know Where to See: A Visibility-Aware Approach for Occluded Person Re-identification
Jinrui Yang, Jiawei Zhang, Fufu Yu, Xinyang Jiang, mengdan zhang, Xing Sun*, Yingcong Chen, Weishi Zheng
International Conference on Computer Vision (ICCV), 2021.

19. PR-Net: Preference Reasoning for Personalized Video Highlight Detection
Runnan Chen, Penghao Zhou, Wenzhe Wang, Nenglun Chen, Pai Peng, Xing Sun* and Wenping Wang
International Conference on Computer Vision (ICCV), 2021.

18. Ask&Confirm: Active Detail Enriching for Cross-Modal Retrieval with Partial Query
Guanyu Cai, Xinyang Jiang, Jun Zhang, Yifei Gong, Lianghua He, Pai Peng, Xiaowei Guo and Xing Sun*
International Conference on Computer Vision (ICCV), 2021.
Code is available!

17. Discriminator-free Generative Adversarial Attack
Shaohao Lu, Yuqiao Xian, Ke Yan, Yi Hu, Xing Sun , Xiaowei Guo, Feiyue Huang, Weishi Zheng
ACM Multimedia (MM), 2021.
Code is available!

16. Dig into Multi-modal Cues for Video Retrieval with Hierarchical Alignment
Wenzhe Wang, Penghao Zhou, Runnan Chen, Mengdan Zhang, Guanyu Cai, Pai Peng, Xiaowei Guo, Jian Wu, Xing Sun*
International Joint Conference on Artificial Intelligence (IJCAI), 2021.

15. Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning
Jinpeng Wang, Yuting Gao, Ke Li, Yiqi Lin, Andy J Ma and Xing Sun*
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
Code is available!

14. Learning 3D Shape Feature for Texture-insensitive Person Re-identification
Jiaxing Chen, Xinyang Jiang, Fudong Wang, Jun Zhang, Feng Zheng, Xing Sun, Weishi Zheng
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
Code is coming soon!

13. Temporal Modulation Network for Controllable Space-Time Video Super-Resolution
Gang Xu, Jun Xu, Zhen Li, Liang Wang, Xing Sun, Ming-Ming Cheng
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
Code is available!

12. Learning with Instance-Dependent Label Noise: A Sample Sieve Approach
Hao Cheng, Zhaowei Zhu, Xingyu Li, Yifei Gong, Xing Sun, Yang Liu
International Conference on Learning Representations (ICLR), 2021.
Code is available!

11. One for More: Selecting Generalizable Samples for Generalizable ReID Model
Enwei Zhang,Xinyang Jiang,Hao Cheng,Ancong Wu, Ke Li,Xiaowei Guo,Feng Zheng,Weishi Zheng and Xing Sun*
The AAAI Conference on Artificial Intelligence (AAAI), 2021.
Code is coming soon!

10. Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion
Jinpeng Wang, Yuting Gao, Ke Li, Xinyang Jiang, Xiaowei Guo, Rongrong Ji and Xing Sun*
The AAAI Conference on Artificial Intelligence (AAAI), 2021.
Code is available!

9. High-dimensional dense residual convolutional neural network for light field reconstruction
Nan Meng, Hayden Kwok-Hay So, Xing Sun and Edmund Lam
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021.
Code is available!

2020

8. Pruning Filter in Filter
Fanxu Meng, Hao Cheng, Ke Li, Huixiang Luo, Xiaowei Guo, Guangming Lu and Xing Sun*
Neural Information Processing Systems (NeurIPS), 2020.
Code is available!

7. Do Not Disturb Me: Person Re-identification Under the Interference of Other Pedestrians
Shizhen Zhao, Changxin Gao, Jun Zhang, Hao Cheng, Chuchu Han, Xinyang Jiang, XW Guo, WS Zheng, Nong Sang, Xing Sun
European Conference on Computer Vision (ECCV), 2020.
Code is available!

6. NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination
Penghao Zhou, Chong Zhou, Pai Peng, Junlong Du, Xing Sun, Xiaowei Guo, Feiyue Huang
ACM Multimedia (MM), 2020.
Code is available!

5. Filter Grafting for Deep Neural Networks
Fanxu Meng+ , Hao Cheng+ , Ke Li, Zhixin Xu, Rongrong Ji, Xing Sun and Guangming Lu
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
Code is available!

4. Asymmetric Co-Teaching for Unsupervised Cross-Domain Person Re-Identification
Fengxiang Yang, Ke Li, Zhun Zhong, Zhiming Luo, Xing Sun*, Hao Cheng, Xiaowei Guo, Feiyue Huang, Rongrong Ji, Shaozi Li
The AAAI Conference on Artificial Intelligence (AAAI), 2020.
Code is available!

3. Aware Loss with Angular Regularization for Person Re-Identification
Zhihui Zhu, Xinyang Jiang, Feng Zheng, Xiaowei Guo, Feiyue Huang, Xing Sun*, Weishi Zheng
The AAAI Conference on Artificial Intelligence (AAAI), 2020.
Code is coming soon!

2. Rethinking Temporal Fusion for Video-Based Person Re-Identification on Semantic and Time Aspect
Xinyang Jiang, Yifei Gong, Xiaowei Guo, Qize Yang, Feiyue Huang, Weishi Zheng, Feng Zheng and Xing Sun*
The AAAI Conference on Artificial Intelligence (AAAI), 2020.(Oral)
Code is available!

2019

1. Pyramidal Person Re-IDentification via Multi-Loss Dynamic Training
Feng Zheng, Cheng Deng, Xing Sun* , Xinyang Jiang, Zongqiao Yu, Feiyue Huang, Xiaowei Guo and Rongrong Ji
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
Code is available!