Skip to main content

Massive Multi-discipline Multimodal Understanding Pro (MMMU-Pro)

A harder multimodal benchmark for frontier models that combines text with images, diagrams, charts, and academic visual reasoning tasks.

Top models on MMMU-Pro — April 29, 2026

As of April 29, 2026, GPT-5.4 Pro leads the MMMU-Pro leaderboard with 94% , followed by Claude Mythos Preview (92.7%) and Gemini 3.1 Pro (83.9%).

23 modelsMultimodal & Grounded45% of category scoreRefreshingUpdated April 29, 2026

According to BenchLM.ai, GPT-5.4 Pro leads the MMMU-Pro benchmark with a score of 94%, followed by Claude Mythos Preview (92.7%) and Gemini 3.1 Pro (83.9%). The scores show moderate spread, with meaningful differences between the top tier and mid-tier models.

23 models have been evaluated on MMMU-Pro. The benchmark falls in the Multimodal & Grounded category. This category carries a 12% weight in BenchLM.ai's overall scoring system. Within that category, MMMU-Pro contributes 45% of the category score, so strong performance here directly affects a model's overall ranking.

About MMMU-Pro

Year

2024

Tasks

Multimodal academic reasoning

Format

Image + text question answering

Difficulty

Frontier multimodal

MMMU-Pro extends the original MMMU setup with more difficult multimodal questions and stronger separation at the top end of the model market.

BenchLM freshness & provenance

Version

MMMU-Pro 2024

Refresh cadence

Annual

Staleness state

Refreshing

Question availability

Public benchmark set

Refreshing

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Leaderboard (23 models)

1
94%
2
92.7%
3
83.9%
4
81.2%
5
81.2%
6
81%
7
80.4%
8
79.5%
9
79.4%
10
79%
11
78.8%
12
78.5%
13
78.5%
14
77.9%
15
77.3%
16
76.9%
17
76.6%
18
75.8%
19
75.3%
20
75.2%
21
73.8%
22
70.6%
23
66.1%

FAQ

What does MMMU-Pro measure?

A harder multimodal benchmark for frontier models that combines text with images, diagrams, charts, and academic visual reasoning tasks.

Which model scores highest on MMMU-Pro?

GPT-5.4 Pro by OpenAI currently leads with a score of 94% on MMMU-Pro.

How many models are evaluated on MMMU-Pro?

23 AI models have been evaluated on MMMU-Pro on BenchLM.

Last updated: April 29, 2026 · BenchLM version MMMU-Pro 2024

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.