Massive Multi-discipline Multimodal Understanding Pro (MMMU-Pro)

Name: Massive Multi-discipline Multimodal Understanding Pro
Creator: BenchLM

A harder multimodal benchmark for frontier models that combines text with images, diagrams, charts, and academic visual reasoning tasks.

Top models on MMMU-Pro — April 29, 2026

As of April 29, 2026, GPT-5.4 Pro leads the MMMU-Pro leaderboard with 94% , followed by Claude Mythos Preview (92.7%) and Gemini 3.1 Pro (83.9%).

1Closed

GPT-5.4 Pro

OpenAI

94%

Overall 91Context 1.05M

2Closed

Claude Mythos Preview

Anthropic

92.7%

Overall 99Context 1M

3Closed

Gemini 3.1 Pro

Google

83.9%

Overall 92Context 1M

23 modelsMultimodal & Grounded45% of category scoreRefreshingUpdated April 29, 2026

According to BenchLM.ai, GPT-5.4 Pro leads the MMMU-Pro benchmark with a score of 94%, followed by Claude Mythos Preview (92.7%) and Gemini 3.1 Pro (83.9%). The scores show moderate spread, with meaningful differences between the top tier and mid-tier models.

23 models have been evaluated on MMMU-Pro. The benchmark falls in the Multimodal & Grounded category. This category carries a 12% weight in BenchLM.ai's overall scoring system. Within that category, MMMU-Pro contributes 45% of the category score, so strong performance here directly affects a model's overall ranking.

About MMMU-Pro

Year

2024

Tasks

Multimodal academic reasoning

Format

Image + text question answering

Difficulty

Frontier multimodal

MMMU-Pro extends the original MMMU setup with more difficult multimodal questions and stronger separation at the top end of the model market.

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

BenchLM freshness & provenance

Version

MMMU-Pro 2024

Refresh cadence

Annual

Staleness state

Refreshing

Question availability

Public benchmark set

Refreshing

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.