An enhanced version of MMLU with 10 answer choices instead of 4, featuring more reasoning-focused questions that better differentiate frontier models.
As of April 21, 2026, Claude Opus 4.5 leads the MMLU-Pro leaderboard with 89.5% , followed by Qwen3.6 Plus (88.5%) and Qwen3.5 397B (87.8%).
Claude Opus 4.5
Anthropic
Qwen3.6 Plus
Alibaba
Qwen3.5 397B
Alibaba
According to BenchLM.ai, Claude Opus 4.5 leads the MMLU-Pro benchmark with a score of 89.5%, followed by Qwen3.6 Plus (88.5%) and Qwen3.5 397B (87.8%). The top models are clustered within 1.7 points, suggesting this benchmark is nearing saturation for frontier models.
22 models have been evaluated on MMLU-Pro. The benchmark falls in the Knowledge category. This category carries a 12% weight in BenchLM.ai's overall scoring system. Within that category, MMLU-Pro contributes 22% of the category score, so strong performance here directly affects a model's overall ranking.
Year
2024
Tasks
Multiple subjects
Format
10-way multiple choice
Difficulty
Professional level
MMLU-Pro increases the number of choices from 4 to 10 and integrates more reasoning-focused problems, reducing the chance of correct guessing and better evaluating true understanding. It serves as a more robust discriminator of model capabilities.
Version
MMLU-Pro
Refresh cadence
Static
Staleness state
Refreshing
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
An enhanced version of MMLU with 10 answer choices instead of 4, featuring more reasoning-focused questions that better differentiate frontier models.
Claude Opus 4.5 by Anthropic currently leads with a score of 89.5% on MMLU-Pro.
22 AI models have been evaluated on MMLU-Pro on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.