Skip to main content

Massive Multi-discipline Multimodal Understanding Pro (MMMU-Pro)

A harder multimodal benchmark for frontier models that combines text with images, diagrams, charts, and academic visual reasoning tasks.

Top models on MMMU-Pro — June 13, 2026

As of June 13, 2026, GPT-5.4 Pro leads the MMMU-Pro leaderboard with 94% , followed by Claude Mythos 5 (92.7%) and Claude Fable 5 (92.7%).

31 modelsMultimodal & Grounded45% of category scoreRefreshingUpdated June 13, 2026

According to BenchLM.ai, GPT-5.4 Pro leads the MMMU-Pro benchmark with a score of 94%, followed by Claude Mythos 5 (92.7%) and Claude Fable 5 (92.7%). The top models are clustered within 1.3 points, suggesting this benchmark is nearing saturation for frontier models.

31 models have been evaluated on MMMU-Pro. The benchmark falls in the Multimodal & Grounded category. This category carries a 12% weight in BenchLM.ai's overall scoring system. Within that category, MMMU-Pro contributes 45% of the category score, so strong performance here directly affects a model's overall ranking.

About MMMU-Pro

Year

2024

Tasks

Multimodal academic reasoning

Format

Image + text question answering

Difficulty

Frontier multimodal

MMMU-Pro extends the original MMMU setup with more difficult multimodal questions and stronger separation at the top end of the model market.

BenchLM freshness & provenance

Version

MMMU-Pro 2024

Refresh cadence

Annual

Staleness state

Refreshing

Question availability

Public benchmark set

Refreshing

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Leaderboard (31 models)

1
94%
2
92.7%
3
92.7%
4
83.9%
5
83.6%
6
81.2%
7
81.2%
8
81%
9
80.4%
10
79.5%
11
79.4%
12
79%
13
79%
14
78.8%
15
78.5%
16
78.5%
17
78.1%
18
78.1%
19
77.9%
20
77.3%
21
76.9%
22
76.6%
23
75.8%
24
75.3%
25
75.2%
26
73.8%
27
71.1%
28
70.6%
29
69.1%
30
66.1%
31
63%

FAQ

What does MMMU-Pro measure?

A harder multimodal benchmark for frontier models that combines text with images, diagrams, charts, and academic visual reasoning tasks.

Which model scores highest on MMMU-Pro?

GPT-5.4 Pro by OpenAI currently leads with a score of 94% on MMMU-Pro.

How many models are evaluated on MMMU-Pro?

31 AI models have been evaluated on MMMU-Pro on BenchLM.

Last updated: June 13, 2026 · BenchLM version MMMU-Pro 2024

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.