Korean Massive Multitask Language Understanding (KMMLU)

Evaluates Korean expert-level knowledge across 45 subjects. 20% of questions require Korean cultural context.

Top Models on KMMLU — March 2026

As of March 2026, Claude Sonnet 4.6 leads the KMMLU leaderboard with 85% , followed by GPT-5.4 (83.7%) and Solar Pro 2 (80.1%).

16 modelsKorean BenchmarksKorean-language benchmarkUpdated March 18, 2026

According to BenchLM.ai, Claude Sonnet 4.6 leads the KMMLU benchmark with a score of 85%, followed by GPT-5.4 (83.7%) and Solar Pro 2 (80.1%). The scores show moderate spread, with meaningful differences between the top tier and mid-tier models.

16 models have been evaluated on KMMLU. The benchmark falls in the Korean Benchmarks category. BenchLM tracks this category separately from its weighted global scoring system, so these results are best compared on the dedicated Korean benchmark views. KMMLU is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About KMMLU

Year

2024

Tasks

35,030 questions

Format

Multiple choice questions

Difficulty

Elementary to professional level in Korean

Tests human-level understanding and reasoning in the Korean language across diverse subjects.

KMMLU: Measuring Massive Multitask Language Understanding in Korean

Leaderboard (16 models)

#1Claude Sonnet 4.6
85%
#2GPT-5.4
83.7%
#3Solar Pro 2
80.1%
#4o1
79.5%
#5HyperClova X Think 32B
78.4%
#6GPT-5 mini
76.5%
#7Exaone 4.0 32B
75.2%
#8GPT-5.2
71.5%
#9GPT-5 nano
69.3%
#10GPT-5.1
65.9%
#11GPT-4.1
65.5%
#12GPT-4o
64.3%
#13GPT-4.1 mini
59.3%
#14GPT-4 Turbo
58.8%
#15GPT-4o mini
52.6%
#16GPT-4.1 nano
48.6%

FAQ

What does KMMLU measure?

Evaluates Korean expert-level knowledge across 45 subjects. 20% of questions require Korean cultural context.

Which model scores highest on KMMLU?

Claude Sonnet 4.6 by Anthropic currently leads with a score of 85% on KMMLU.

How many models are evaluated on KMMLU?

16 AI models have been evaluated on KMMLU on BenchLM.

Last updated: March 18, 2026

Weekly LLM Benchmark Digest

Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.

Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.