MIXED GLOBAL + REGIONAL

Korean Benchmarks Leaderboard

How do global frontier models stack up against regional Korean models on domestic tasks? This leaderboard ranks all models based exclusively on Korean benchmark performance like KMMLU, KMMLU-Hard, CLIcK, and KoBALT.

Claude Sonnet 4.6 currently leads the cross-market Korean view with an average score of 85.0.

This is the right page for deciding whether Korean-market specialists are actually outperforming global frontier models on Korean-native evaluation, rather than just inside a regional-only pool.

RankModelTypeKMMLUKMMLU-HardAvg Score
#1Claude Sonnet 4.6
Anthropic
GLOBAL85%85.0
#2Solar 🇰🇷
Upstage
REGIONAL80.1%80.1
#3o1
OpenAI
GLOBAL79.55%79.5
#4HyperClova X 🇰🇷
Naver Cloud
REGIONAL78.4%78.4
#5GPT-5.4
OpenAI
GLOBAL83.65%72.76%78.2
#6K-Exaone 🇰🇷
LG AI Research
REGIONAL76.0
#7Exaone 4.0 🇰🇷
LG AI Research
REGIONAL75.2%75.2
#8GPT-5
OpenAI
GLOBAL76.47%60.61%68.5
#9GPT-5.2
OpenAI
GLOBAL71.54%51.1%61.3
#10GPT-5
OpenAI
GLOBAL69.28%51.72%60.5
#11GPT-5.1
OpenAI
GLOBAL65.9%43.9%54.9
#12GPT-4.1
OpenAI
GLOBAL65.49%42.79%54.1
#13GPT-4o
OpenAI
GLOBAL64.26%39.62%51.9
#14GPT-4.1
OpenAI
GLOBAL59.26%35.6%47.4
#15GPT-4 Turbo
OpenAI
GLOBAL58.75%30.56%44.7
#16GPT-4o
OpenAI
GLOBAL52.63%24.56%38.6
#17GPT-4.1
OpenAI
GLOBAL48.57%24.34%36.5

What these rows mean

KMMLU: Measures massive multitask language understanding on 45 Korean expert-level subjects.

KMMLU-Hard: A computationally heavier slice focusing on complex Korean reasoning where models struggle most.

How to interpret the crossover

While global frontier models like GPT-5 and Claude lead in general reasoning, models like HyperClova X and Exaone are explicitly trained on high-quality Korean corpora. This leaderboard tracks the crossover points between sheer model scale and regional specialization.

View regional-only Korean LLMs

Korean benchmark updates

Get leaderboard shifts when Korean benchmark scores change for either regional or global models.

Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.

Recommended next step

If the mixed leaderboard shows a Korean-market model winning on your target rows, open its model page next and inspect the full score breakdown before choosing it over a global default.