Massive Multitask Language Understanding Professional (MMLU-Pro)

An enhanced version of MMLU with 10 answer choices instead of 4, featuring more reasoning-focused questions that better differentiate frontier models.

About MMLU-Pro

Year

2024

Tasks

Multiple subjects

Format

10-way multiple choice

Difficulty

Professional level

MMLU-Pro increases the number of choices from 4 to 10 and integrates more reasoning-focused problems, reducing the chance of correct guessing and better evaluating true understanding. It serves as a more robust discriminator of model capabilities.

MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

Leaderboard (88 models)

#1Claude Opus 4.6
92
#2Gemini 3.1 Pro
92
#3GPT-5.4
91
#4GPT-5.3 Codex
90
#5Grok 4.1
90
#6GPT-5.2
88
#7Claude Sonnet 4.5
84
#8Claude Sonnet 4.6
83
#9GPT-5.1
83
#10GPT-5 (high)
83
#11Gemini 3 Pro
83
#14Claude Opus 4.5
81
#15GLM-5 (Reasoning)
81
#17GPT-5 (medium)
81
#18Kimi K2.5 (Reasoning)
81
#20GPT-5.2-Codex
80
#21o1-preview
80
#22Grok 4
77
#23Gemini 2.5 Pro
76
#24o4-mini (high)
76
#25o3-pro
75
#26o3
75
#27Qwen2.5-72B
75
#28Claude 4.1 Opus
75
#29Claude 4 Sonnet
75
#30GLM-5
74
#31GLM-4.7
74
#32Qwen2.5-1M
74
#33DeepSeekMath V2
74
#34Kimi K2.5
74
#35Mistral Large 3
74
#36Mistral Large 2
74
#37GLM-4.7-Flash
74
#38Claude 3.5 Sonnet
74
#40GPT-5 mini
73
#41DeepSeek Coder 2.0
73
#42DeepSeek V3.2
73
#43Qwen3.5 397B
73
#45MiniMax M2.5
73
#46Claude Haiku 4.5
73
#47MiMo-V2-Flash
72
#48DeepSeek LLM 2.0
72
#49Gemini 3 Flash
72
#53Mistral 8x7B
65
#55GPT-4o
64
#56Moonshot v1
64
#57Z-1
64
#58Gemini 2.5 Flash
64
#60Nemotron-4 15B
63
#61Claude 3 Haiku
63
#62GPT-OSS 120B
63
#64Claude 3 Opus
62
#65Gemini 1.5 Pro
57
#66Llama 3 70B
55
#67Gemini 1.0 Pro
54
#69Mistral 7B v0.3
54
#71Qwen2.5-VL-32B
53
#73Nova Pro
53
#76DeepSeek V3.1
53
#77GPT-OSS 20B
53
#78DeepSeek-R1
52
#79Mistral 8x7B v0.2
52
#80GPT-4 Turbo
51
#81Llama 4 Scout
51
#83Qwen3 235B 2507
51
#84GLM-4.5
51
#85MiniMax M1 80k
51
#86GLM-4.5-Air
51
#87Kimi K2
51
#88Gemma 3 27B
50

FAQ

What does MMLU-Pro measure?

An enhanced version of MMLU with 10 answer choices instead of 4, featuring more reasoning-focused questions that better differentiate frontier models.

Which model scores highest on MMLU-Pro?

Claude Opus 4.6 by Anthropic currently leads with a score of 92 on MMLU-Pro.

How many models are evaluated on MMLU-Pro?

88 AI models have been evaluated on MMLU-Pro on BenchLM.