Korean Benchmarks Leaderboard
How do global frontier models stack up against regional Korean models on domestic tasks? This leaderboard ranks all models based exclusively on Korean benchmark performance like KMMLU, KMMLU-Hard, CLIcK, and KoBALT.
Claude Sonnet 4.6 currently leads the cross-market Korean view with an average score of 85.0.
This is the right page for deciding whether Korean-market specialists are actually outperforming global frontier models on Korean-native evaluation, rather than just inside a regional-only pool.
| Rank | Model | Type | KMMLU | KMMLU-Hard | Avg Score |
|---|---|---|---|---|---|
| #1 | Claude Sonnet 4.6 Anthropic | GLOBAL | 85% | — | 85.0 |
| #2 | Solar 🇰🇷 Upstage | REGIONAL | 80.1% | — | 80.1 |
| #3 | o1 OpenAI | GLOBAL | 79.55% | — | 79.5 |
| #4 | HyperClova X 🇰🇷 Naver Cloud | REGIONAL | 78.4% | — | 78.4 |
| #5 | GPT-5.4 OpenAI | GLOBAL | 83.65% | 72.76% | 78.2 |
| #6 | K-Exaone 🇰🇷 LG AI Research | REGIONAL | — | — | 76.0 |
| #7 | Exaone 4.0 🇰🇷 LG AI Research | REGIONAL | 75.2% | — | 75.2 |
| #8 | GPT-5 OpenAI | GLOBAL | 76.47% | 60.61% | 68.5 |
| #9 | GPT-5.2 OpenAI | GLOBAL | 71.54% | 51.1% | 61.3 |
| #10 | GPT-5 OpenAI | GLOBAL | 69.28% | 51.72% | 60.5 |
| #11 | GPT-5.1 OpenAI | GLOBAL | 65.9% | 43.9% | 54.9 |
| #12 | GPT-4.1 OpenAI | GLOBAL | 65.49% | 42.79% | 54.1 |
| #13 | GPT-4o OpenAI | GLOBAL | 64.26% | 39.62% | 51.9 |
| #14 | GPT-4.1 OpenAI | GLOBAL | 59.26% | 35.6% | 47.4 |
| #15 | GPT-4 Turbo OpenAI | GLOBAL | 58.75% | 30.56% | 44.7 |
| #16 | GPT-4o OpenAI | GLOBAL | 52.63% | 24.56% | 38.6 |
| #17 | GPT-4.1 OpenAI | GLOBAL | 48.57% | 24.34% | 36.5 |
What these rows mean
KMMLU: Measures massive multitask language understanding on 45 Korean expert-level subjects.
KMMLU-Hard: A computationally heavier slice focusing on complex Korean reasoning where models struggle most.
How to interpret the crossover
While global frontier models like GPT-5 and Claude lead in general reasoning, models like HyperClova X and Exaone are explicitly trained on high-quality Korean corpora. This leaderboard tracks the crossover points between sheer model scale and regional specialization.
View regional-only Korean LLMsKorean benchmark updates
Get leaderboard shifts when Korean benchmark scores change for either regional or global models.
Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.
Recommended next step
If the mixed leaderboard shows a Korean-market model winning on your target rows, open its model page next and inspect the full score breakdown before choosing it over a global default.