A harder multimodal benchmark for frontier models that combines text with images, diagrams, charts, and academic visual reasoning tasks.
According to BenchLM.ai, GPT-5.2 Pro leads the MMMU-Pro benchmark with a score of 96, followed by GPT-5.4 (95) and GPT-5.2 (95). The top models are clustered within 1 points, suggesting this benchmark is nearing saturation for frontier models.
121 models have been evaluated on MMMU-Pro. The benchmark falls in the multimodalGrounded category, which carries a 15% weight in BenchLM.ai's overall scoring system. Strong performance here directly impacts a model's overall ranking.
Year
2025
Tasks
Multimodal academic reasoning
Format
Image + text question answering
Difficulty
Frontier multimodal
MMMU-Pro extends the original MMMU setup with more difficult multimodal questions and stronger separation at the top end of the model market.
MMMU-ProA harder multimodal benchmark for frontier models that combines text with images, diagrams, charts, and academic visual reasoning tasks.
GPT-5.2 Pro by OpenAI currently leads with a score of 96 on MMMU-Pro.
121 AI models have been evaluated on MMMU-Pro on BenchLM.
Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.
Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.