An enhanced version of MMLU with 10 answer choices instead of 4, featuring more reasoning-focused questions that better differentiate frontier models.
Year
2024
Tasks
Multiple subjects
Format
10-way multiple choice
Difficulty
Professional level
MMLU-Pro increases the number of choices from 4 to 10 and integrates more reasoning-focused problems, reducing the chance of correct guessing and better evaluating true understanding. It serves as a more robust discriminator of model capabilities.
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding BenchmarkAn enhanced version of MMLU with 10 answer choices instead of 4, featuring more reasoning-focused questions that better differentiate frontier models.
Claude Opus 4.6 by Anthropic currently leads with a score of 92 on MMLU-Pro.
88 AI models have been evaluated on MMLU-Pro on BenchLM.