MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique

Shuhang Liu; Zhenrong Zhang; Pengfei Hu; Jiefeng Ma; Jun Du; Qing Wang; Jianshu Zhang; Quan Liu; Jianqing Gao; Feng Ma

doi:10.1145/3728422.3762145

What is it about?

Our work aims to make AI systems better at thinking and correcting themselves. We focus on vision–language models — AI that can see an image and explain or reason about it in natural language. While these models are powerful, they often make small mistakes that lead to wrong answers, such as misreading a number or making a simple calculation error. To fix this, we introduce a feedback-based framework where one model acts as the "actor" to solve a problem, and another model, the "critic," reviews its reasoning and gives natural-language feedback. The actor then improves its answer step by step based on these critiques. This process helps the AI learn to reason more reliably, similar to how humans reflect on their own thinking.

Photo by Annie Spratt on Unsplash

This page is a summary of: MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique, October 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3728422.3762145.
You can read the full text:

Read

Contributors

The following have contributed to this page

Shuhang Liu
University of Science and Technology of China

MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique

What is it about?

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

MMC: Iterative Refinement of VLM Reasoning via MCTS-based Multimodal Critique

What is it about?

Featured Image

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management