Massive Multitask Language Understanding (MMLU)

A comprehensive multiple-choice question answering test covering 57 tasks including elementary mathematics, US history, computer science, law, and more. Tests knowledge across diverse academic subjects from high school to professional level.

About MMLU

Year

2020

Tasks

57 subjects

Format

Multiple choice questions

Difficulty

Elementary to professional level

MMLU evaluates models on 57 subjects spanning humanities, social sciences, STEM, and other areas. Questions range from elementary to advanced professional level, making it a comprehensive test of world knowledge and reasoning ability.

Measuring Massive Multitask Language Understanding

Leaderboard (88 models)

#1GPT-5.4
99
#2Gemini 3.1 Pro
99
#3Claude Opus 4.6
99
#4GPT-5.3 Codex
99
#5Grok 4.1
99
#6GPT-5.2
99
#7GPT-5.2-Codex
99
#9Claude Sonnet 4.6
99
#10Claude Opus 4.5
99
#11Gemini 3 Pro
99
#13GPT-5.1
97
#14GLM-5 (Reasoning)
96
#15Claude Sonnet 4.5
95
#17GPT-5 (high)
93
#18o1-preview
92
#19Kimi K2.5 (Reasoning)
92
#20GPT-5 (medium)
91
#22GPT-5 mini
88
#23o3-pro
88
#24GLM-5
88
#25Grok 4
87
#27o3
86
#28GLM-4.7
86
#29Qwen2.5-1M
84
#30DeepSeek V3.2
84
#31Qwen2.5-72B
83
#32Gemini 2.5 Pro
83
#33Qwen3.5 397B
83
#34o4-mini (high)
82
#35DeepSeek Coder 2.0
80
#36DeepSeekMath V2
80
#37DeepSeek LLM 2.0
79
#38MiMo-V2-Flash
79
#39Kimi K2.5
77
#40Claude 4.1 Opus
76
#41Mistral Large 3
76
#43Claude 4 Sonnet
73
#44MiniMax M2.5
73
#46Gemini 3 Flash
70
#47Mistral Large 2
68
#48Claude Haiku 4.5
68
#49GPT-4o
66
#50GLM-4.7-Flash
66
#51Mistral 8x7B
65
#52Claude 3.5 Sonnet
65
#54Gemini 1.5 Pro
64
#57Gemini 1.0 Pro
62
#58Claude 3 Opus
61
#59GPT-4 Turbo
60
#60Llama 3 70B
58
#62Claude 3 Haiku
56
#63Nemotron-4 15B
54
#64Moonshot v1
53
#65Z-1
52
#66GPT-OSS 120B
51
#67Gemini 2.5 Flash
50
#70Llama 4 Scout
47
#72Gemma 3 27B
45
#73DeepSeek-R1
44
#74Qwen2.5-VL-32B
43
#76Nova Pro
41
#78Qwen3 235B 2507
39
#80GLM-4.5
37
#81MiniMax M1 80k
36
#82GLM-4.5-Air
35
#84DeepSeek V3.1
33
#85Kimi K2
32
#86GPT-OSS 20B
31
#87Mistral 7B v0.3
30
#88Mistral 8x7B v0.2
29

FAQ

What does MMLU measure?

A comprehensive multiple-choice question answering test covering 57 tasks including elementary mathematics, US history, computer science, law, and more. Tests knowledge across diverse academic subjects from high school to professional level.

Which model scores highest on MMLU?

GPT-5.4 by OpenAI currently leads with a score of 99 on MMLU.

How many models are evaluated on MMLU?

88 AI models have been evaluated on MMLU on BenchLM.