About Me

I am currently a Researcher at Kuaishou Technology, focusing on cutting-edge research in Computer Vision and Natural Language Processing. My current research interests include Multimodal Large Language Models (MLLMs), Formal Theorem Proving, and AI Agents.

I received my Ph.D. degree from the Multimedia and Human Understanding Group (MHUG) at the Department of Information Engineering and Computer Science, University of Trento, Italy, in 2022. I was supervised by Prof. Nicu Sebe and Dr. Bruno Lepri, with my thesis defense committee including Vittorio Murino, Zhengyou Zhang, and Elisa Ricci.

Before my doctoral studies, I earned my B.Eng. degree in Photogrammetry and Remote Sensing (2015) and M.Eng. degree in Pattern Recognition and Intelligent System (2018) from Wuhan University, China.

We are actively recruiting daily interns for long-term positions. Please feel free to submit your resume to my email for exciting research opportunities!

Research Experience

Researcher
Kuaishou Technology, Beijing, China
01/2025 - Present

Research focus: MLLMs, Formal Theorem Proving and Agents

Researcher
Huawei, Shenzhen, China
08/2022 - 01/2025

Research focus: Image Generation and Enhancing (GANs and Diffusion Models)

Research Intern
Tencent AI Lab, Shenzhen, China
2021 - 06/2022

Mentors: Dr. Linchao Bao and Dr. Wei Bi.

Research focus: GANs, Image Domain Translation

PhD Student
FBK and MHUG, Trento, Italy
12/2018 - 06/2022

Mentors: Prof. Nicu Sebe and Dr. Bruno Lepri.

Research focus: Deep learning, GANs, Cross-modal Representations, Image Domain Translation

Research Intern
Tencent AI Lab, Shenzhen, China
11/2017 - 09/2018

Mentors: Dr. Wei Bi and Dr. Xiaojiang Liu.

Research focus: Deep Learning, Neural Dialogue Generation

Master Student
Computer Vision and Remote Sensing (CVRS) Lab, Wuhan, China
03/2015 - 06/2018

Mentor: Prof. Jian Yao.

Research focus: Deep Learning, Remote Sensing

Selected Publications

Conference Papers

Evaluating Multimodal Large Language Models on Video Captioning via Monte Carlo Tree Search
Linhao Yu, Xinguang Ji, Yahui Liu, Fanheng Kong, Chenxi Sun, Jingyuan Zhang, Hongzhi Zhang, V. W., Fuzheng Zhang, Deyi Xiong
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers
Bin Ren*, Yahui Liu*, Yue Song, Wei Bi, Rita Cucchiara, Nicu Sebe and Wei Wang (*equal contribution)
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
Efficient Training of Visual Transformers with Small Datasets
Yahui Liu, Enver Sangineto, Wei Bi, Nicu Sebe, Bruno Lepri, Marco De Nadai
Advances in Neural Information Processing Systems (NeurIPS), 2021
Smoothing the Disentangled Latent Style Space for Unsupervised Image-to-Image Translation
Yahui Liu, Enver Sangineto, Yajing Chen, Linchao Bao, Haoxian Zhang, Nicu Sebe, Bruno Lepri, Wei Wang, Marco De Nadai.
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
Describe What to Change: A Text-guided Unsupervised Image-to-Image Translation Approach
Yahui Liu, Marco De Nadai, Deng Cai, Huayang Li, Xavier Alameda-Pineda, Nicu Sebe, and Bruno Lepri
ACM International Conference on Multimedia (ACM MM), 2020

Journal Articles

Spatial Entropy as An Inductive Bias for Vision Transformers
Elia Peruzzo, Enver Sangineto, Yahui Liu, Marco De Nadai, Wei Bi, Bruno Lepri, Nicu Sebe
Machine Learning, 2024 (Impact Factor: 5.8)
ISF-GAN: An Implicit Style Function for High-Resolution Image-to-Image Translation
Yahui Liu, Yajing Chen, Linchao Bao, Nicu Sebe, Bruno Lepri, Marco De Nadai
IEEE Transactions on Multimedia (TMM), 2022 (Impact Factor: 8.4)
DeepCrack: A Deep Hierarchical Feature Learning Architecture for Crack Segmentation
Yahui Liu, Jian Yao, Rengping Xie, and Li Li
Neurocomputing, 2019 (Impact Factor: 5.5)
RoadNet: Learning to Comprehensively Analyze Road Networks in Complex Urban Scenes From High-Resolution Remotely Sensed Images
Yahui Liu, Jian Yao, Xiaohu Lu, Menghan Xia, Xingbo Wang, and Yuan Liu
IEEE Transactions on Geoscience and Remote Sensing (TGRS), 2019 (Impact Factor: 7.5)

Recent News

July 2025

We released Leanabell-Prover-V2 for verifier-integrated reasoning via RL.

June 2025

We released SeqPE for universal positional encoding.

May 2025

We released LCoT2Tree for uncovering structural patterns in Long CoT.

May 2025

We released CrEval for evaluating text creativity across diverse domains.

May 2025

We released the UNITE framework for Multimodal Information Retrieval.

May 2025

One paper accepted to ACL main conference: MCTS-VCB.

April 2025

We released Capybara-VL and Capybara-Omni at ICLR 2025 SCI-FM workshop - our efficient MLLMs.

April 2025

We released Leanabell-Prover achieving the SOTA 59.8% pass@32 on MiniF2F-test.

Academic Services

Conference Reviews

  • ICML 2025
  • AAAI 2025
  • CVPR 2024, 2023, 2022, 2021
  • NeurIPS 2025, 2024, 2023, 2022
  • ACM MM 2025, 2024, 2023, 2022, 2021, 2020
  • ACL/EMNLP 2025, 2024
  • ECCV 2024, 2022
  • ICCV 2023, 2021
  • IJCAI 2022, 2021

Journal Reviews

  • IEEE TPAMI
  • International Journal of Computer Vision (IJCV)
  • IEEE Transactions on Industrial Informatics (TII)
  • IEEE GRSL
  • IEEE J-STARS
  • IEEE TNNLS
  • Machine Vision and Applications (MVAP)
  • IEEE TMM
  • Pattern Recognition Letters (PRL)
  • Information Fusion

Selected Awards

🏆
Pengcheng Excellent Talents
Shenzhen, China, 2024
Top Minds (天才少年)
Huawei, China, 2022
Technical Expert (技术大咖)
Tencent AI Lab, China, 2021