I am a Principal Researcher and Team Manager at Tencent Youtu Lab. I received my Ph.D. from The University of Hong Kong in 2016.
I dedicate my effort to three core pillars: Multimodal Large Language Models (MLLM), Agent, and Retrieval-Augmented Generation (RAG). We aim to bridge the gap between foundation models and real-world applications through robust, open-source tools and benchmarks.
We are actively building the TencentCloudADP ecosystem.
The first-ever comprehensive evaluation benchmark of multi-modal LLMs in video analysis.
Open-source Vision-Language models including training recipes and inference code (e.g., Youtu-VL-4B).
The first-ever open-source interactive omni-multimodal LLM
Lightweight, high-performance Large Language Models (2B parameters) for edge deployment.
A flexible framework for building autonomous LLM agents, supporting complex tool calling, planning, and memory management.
A desktop efficiency assistant powered by local LLMs (Ollama) and Youtu-Agent to automate daily workflows.
Advanced RAG system leveraging Knowledge Graphs to enhance retrieval accuracy and structured reasoning.
High-performance document parsing tools designed to convert raw files (PDF, Docx) into clean RAG-ready data.
Optimized embedding models tailored for semantic search and dense retrieval tasks.
Full list available on Google Scholar.