Evaluates Korean expert-level knowledge across 45 subjects. 20% of questions require Korean cultural context.
As of March 2026, Claude Sonnet 4.6 leads the KMMLU leaderboard with 85% , followed by GPT-5.4 (83.7%) and Solar Pro 2 (80.1%).
Claude Sonnet 4.6
Anthropic
GPT-5.4
OpenAI
Solar Pro 2
Upstage
According to BenchLM.ai, Claude Sonnet 4.6 leads the KMMLU benchmark with a score of 85%, followed by GPT-5.4 (83.7%) and Solar Pro 2 (80.1%). The scores show moderate spread, with meaningful differences between the top tier and mid-tier models.
16 models have been evaluated on KMMLU. The benchmark falls in the Korean Benchmarks category. BenchLM tracks this category separately from its weighted global scoring system, so these results are best compared on the dedicated Korean benchmark views. KMMLU is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.
Year
2024
Tasks
35,030 questions
Format
Multiple choice questions
Difficulty
Elementary to professional level in Korean
Tests human-level understanding and reasoning in the Korean language across diverse subjects.
KMMLU: Measuring Massive Multitask Language Understanding in KoreanEvaluates Korean expert-level knowledge across 45 subjects. 20% of questions require Korean cultural context.
Claude Sonnet 4.6 by Anthropic currently leads with a score of 85% on KMMLU.
16 AI models have been evaluated on KMMLU on BenchLM.
Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.
Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.