Xing SUN

Xing SUN (孙星)

Ph.D., Principal Researcher & Team Manager
Tencent Youtu Lab

About Me

I am a Principal Researcher and Team Manager at Tencent Youtu Lab. I received my Ph.D. from The University of Hong Kong in 2016.

I dedicate my effort to three core pillars: Multimodal Large Language Models (MLLM), Agent, and Retrieval-Augmented Generation (RAG). We aim to bridge the gap between foundation models and real-world applications through robust, open-source tools and benchmarks.

Open Source Projects

We are actively building the TencentCloudADP ecosystem.

MLLM

Video-MME

The first-ever comprehensive evaluation benchmark of multi-modal LLMs in video analysis.

BenchmarkVideo

Youtu-VL

Open-source Vision-Language models including training recipes and inference code (e.g., Youtu-VL-4B).

MLLMTraining

VITA

The first-ever open-source interactive omni-multimodal LLM

MLLMTraining

Agent

Youtu-LLM

Lightweight, high-performance Large Language Models (2B parameters) for edge deployment.

LLMHuggingFace

Youtu-Agent

A flexible framework for building autonomous LLM agents, supporting complex tool calling, planning, and memory management.

FrameworkPython

Youtu-Tip

A desktop efficiency assistant powered by local LLMs (Ollama) and Youtu-Agent to automate daily workflows.

ApplicationLocal LLM

RAG

Youtu-GraphRAG

Advanced RAG system leveraging Knowledge Graphs to enhance retrieval accuracy and structured reasoning.

RAGGraph

Youtu-Parsing

High-performance document parsing tools designed to convert raw files (PDF, Docx) into clean RAG-ready data.

Data Processing

Youtu-Embedding

Optimized embedding models tailored for semantic search and dense retrieval tasks.

ModelRetrieval

Selected Publications

Full list available on Google Scholar.

MLLM
Agent
RAG