Skip to main content

Massive Multitask Language Understanding Professional (MMLU-Pro)

An enhanced version of MMLU with 10 answer choices instead of 4, featuring more reasoning-focused questions that better differentiate frontier models.

Top models on MMLU-Pro — June 27, 2026

As of June 27, 2026, Qwen3.7 Max leads the MMLU-Pro leaderboard with 89.6% , followed by Claude Opus 4.5 (89.5%) and Qwen3.7 Plus (88.5%).

40 modelsKnowledge22% of category scoreRefreshingUpdated June 27, 2026

According to BenchLM.ai, Qwen3.7 Max leads the MMLU-Pro benchmark with a score of 89.6%, followed by Claude Opus 4.5 (89.5%) and Qwen3.7 Plus (88.5%). The top models are clustered within 1.1 points, suggesting this benchmark is nearing saturation for frontier models.

40 models have been evaluated on MMLU-Pro. The benchmark falls in the Knowledge category. This category carries a 12% weight in BenchLM.ai's overall scoring system. Within that category, MMLU-Pro contributes 22% of the category score, so strong performance here directly affects a model's overall ranking.

About MMLU-Pro

Year

2024

Tasks

Multiple subjects

Format

10-way multiple choice

Difficulty

Professional level

MMLU-Pro increases the number of choices from 4 to 10 and integrates more reasoning-focused problems, reducing the chance of correct guessing and better evaluating true understanding. It serves as a more robust discriminator of model capabilities.

BenchLM freshness & provenance

Version

MMLU-Pro

Refresh cadence

Static

Staleness state

Refreshing

Question availability

Public benchmark set

Refreshing

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Leaderboard (40 models)

1
89.6%
2
89.5%
3
88.5%
4
88.5%
5
87.8%
6
87.5%
7
87.1%
8
87.1%
9
87.1%
10
86.8%
11
86.7%
12
86.4%
13
86.2%
14
86.2%
15
86.1%
16
85.7%
17
85.3%
18
85.2%
19
85.2%
20
85%
21
84.9%
22
84.3%
23
83%
24
83%
25
82.9%
26
82.6%
27
82%
28
81.8%
29
79.2%
31
77.2%
32
75.9%
33
74.2%
34
73.5%
35
69.4%
36
68.3%
37
68.1%
38
60%
39
48.9%
40
19.3%

FAQ

What does MMLU-Pro measure?

An enhanced version of MMLU with 10 answer choices instead of 4, featuring more reasoning-focused questions that better differentiate frontier models.

Which model scores highest on MMLU-Pro?

Qwen3.7 Max by Alibaba currently leads with a score of 89.6% on MMLU-Pro.

How many models are evaluated on MMLU-Pro?

40 AI models have been evaluated on MMLU-Pro on BenchLM.

Last updated: June 27, 2026 · BenchLM version MMLU-Pro

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.