A multilingual extension of professional-level academic evaluation across many languages.
According to BenchLM.ai, GPT-5.4 Pro leads the MMLU-ProX benchmark with a score of 95, followed by GPT-5.4 (94) and Claude Opus 4.6 (94). The top models are clustered within 1 points, suggesting this benchmark is nearing saturation for frontier models.
121 models have been evaluated on MMLU-ProX. The benchmark falls in the multilingual category, which carries a 7% weight in BenchLM.ai's overall scoring system. Strong performance here directly impacts a model's overall ranking.
Year
2025
Tasks
Multilingual professional QA
Format
Multilingual multiple choice
Difficulty
Professional multilingual
MMLU-ProX expands multilingual evaluation beyond translated arithmetic, making it a better signal for broad cross-lingual reasoning and knowledge.
MMLU-ProXA multilingual extension of professional-level academic evaluation across many languages.
GPT-5.4 Pro by OpenAI currently leads with a score of 95 on MMLU-ProX.
121 AI models have been evaluated on MMLU-ProX on BenchLM.
Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.
Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.