张皓烨

Haoye Zhang

Algorithm Engineer, China Telecom

Working on Large Language Models / Multimodal LLMs / LLM Agents

Large Language Models Multimodal LLMs LLM Agents LLM Applications

About Me

I am an AI Research & Development Engineer at Tianyi Cloud, China Telecom, specializing in multimodal large language models. My research focuses on trustworthy multimodal learning, particularly hallucination mitigation and alignment techniques such as RLHF for large vision-language models, and currently working on agent memory system and AI agents products.

During my academic studies, I contributed to the development of the MiniCPM-V/o series and streaming interactive multimodal models, advancing efficient and interactive multimodal systems. My research work has been published in prestigious conferences including CVPR and ICLR, covering multimodal alignment, efficient model design and open-source multimodal learning.

My long-term research goal is to build reliable, efficient and practical multimodal systems, addressing real-world challenges in industrial cloud scenarios and promoting the application of responsible multimodal artificial intelligence.

Selected Publications

RLAIF-V

RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness

Tianyu Yu, Haoye Zhang, Qiming Li, Qixin Xu, Yuan Yao, Da Chen, Xiaoman Lu, Ganqu Cui, Yunkai Dang, Taiwen He, Xiaocheng Feng, Jun Song, Bo Zheng, Zhiyuan Liu, Tat-Seng Chua, Maosong Sun, CVPR 2025, Highlight Paper

Paper | Code

RLHF-V

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-Grained Correctional Human Feedback

Tianyu Yu, Yuan Yao, Haoye Zhang, Taiwen He, Yifeng Han, Ganqu Cui, Jinyi Hu, Zhiyuan Liu, Hai-Tao Zheng, Maosong Sun, CVPR, 2024

Paper | Code

[Paper Figure / Teaser]

[Paper Title 3]

[Authors], [Venue], [Year]

Paper | Code

Open-source Projects

MiniCPM-V/o

MiniCPM-V/o

Contribute in the development of MiniCPM-V 2.5, MiniCPM-o 2.6

Large Language Models / Multimodal LLMs

Repository

RLAIF-V dataset

RLAIF-V dataset

Collected a large-scale RLHF dataset for mitigating hallucination in Multimodal LLMs

Multimodal LLMs / RLHF / Hallucination

Hugging Face

Experience

  • [2025 - Present] Algorithm Engineer, China Telecom
  • [2022 - 2025] Master of Computer Science, Tsinghua University - RLHF in Multimodal LLMs
  • [2017 - 2022] Bachelor of Automation (primary), Industrial Design (secondary), Tsinghua University

Contact

Email: hy_zhang1028@163.com

Location: Beijing, China