| Agentic | Coding | Reasoning | MM/Grounded | Knowledge | Multilingual | IF | Math | Arena | |||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 GPT-5.4 Pro OpenAI | Closed | Reasoning | 1.05M | 91 | 90% | 88% | 84% | 95% | 86% | 86% | 89% | 97% | 95% | 98% | 95% | 97% | 94% | 96% | 99% | 99% | 97% | 94% | 94% | 50% | 92% | 97% | 95% | 97% | 99% | 99% | 99% | 96% | 98% | 97% | 97% | 99% | 1472 |
2 GPT-5.2 Pro OpenAI | Closed | Reasoning | 400K | 90 | 88% | 88% | 82% | 93% | 83% | 81% | 89% | 97% | 95% | 98% | 93% | 95% | 96% | 96% | 99% | 99% | 97% | 95% | 90% | 44% | 93% | 96% | 92% | 95% | 99% | 99% | 99% | 96% | 98% | 97% | 97% | 99% | 1442 |
3 GPT-5.4 OpenAI | Closed | Reasoning | 1.05M | 90 | 90% | 88% | 85% | 95% | 84% | 84% | 85% | 97% | 94% | 97% | 95% | 97% | 95% | 96% | 99% | 98% | 96% | 94% | 93% | 48% | 91% | 96% | 94% | 96% | 99% | 99% | 99% | 96% | 98% | 97% | 97% | 99% | 1454 |
4 GPT-5.3 Codex OpenAI | Closed | Reasoning | 400K | 89 | 90% | 88% | 86% | 95% | 85% | 85% | 90% | 95% | 93% | 98% | 92% | 93% | 89% | 94% | 99% | 97% | 95% | 93% | 90% | 44% | 90% | 96% | 91% | 93% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 99% | 1416 |
5 GPT-5.2 OpenAI | Closed | Reasoning | 400K | 88 | 90% | 84% | 81% | 91% | 80% | 79% | 85% | 95% | 93% | 96% | 91% | 93% | 95% | 95% | 99% | 97% | 95% | 93% | 88% | 42% | 91% | 95% | 91% | 94% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 98% | 1426 |
6 GPT-5.3 Instant OpenAI | Closed | Reasoning | 128K | 87 | 86% | 82% | 80% | 88% | 76% | 75% | 83% | 96% | 94% | 97% | 92% | 94% | 95% | 95% | 99% | 98% | 96% | 94% | 89% | 44% | 92% | 96% | 92% | 96% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 98% | 1438 |
7 GPT-5.3-Codex-Spark OpenAI | Closed | Reasoning | 256K | 87 | 90% | 82% | 83% | 91% | 80% | 80% | 85% | 94% | 92% | 97% | 91% | 92% | 86% | 91% | 97% | 95% | 93% | 91% | 88% | 42% | 88% | 94% | 89% | 92% | 98% | 98% | 97% | 94% | 96% | 95% | 95% | 98% | 1398 |
8 Claude Opus 4.6 Anthropic | Closed | Standard | 1M | 85 | 80% | 85% | 74% | 91% | 80% | 75% | 74% | 95% | 93% | 94% | 92% | 92% | 95% | 94% | 99% | 97% | 95% | 93% | 92% | 38% | 88% | 96% | 94% | 95% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 98% | 1422 |
9 GPT-5.2 Instant OpenAI | Closed | Reasoning | 128K | 85 | 83% | 82% | 74% | 87% | 75% | 74% | 77% | 95% | 93% | 96% | 89% | 84% | 94% | 92% | 98% | 97% | 95% | 93% | 88% | 43% | 91% | 95% | 94% | 95% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 98% | 1428 |
10 GPT-5.2-Codex OpenAI | Closed | Reasoning | 400K | 85 | 90% | 85% | 85% | 95% | 76% | 66% | 86% | 95% | 93% | 90% | 90% | 91% | 84% | 92% | 99% | 97% | 95% | 93% | 80% | 26% | 86% | 91% | 87% | 92% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 94% | 1331 |
11 Gemini 3.1 Pro Google | Closed | Standard | 1M | 84 | 77% | 86% | 68% | 91% | 75% | 71% | 72% | 95% | 93% | 92% | 93% | 90% | 95% | 95% | 99% | 97% | 95% | 93% | 92% | 40% | 88% | 96% | 93% | 95% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 97% | 1423 |
12 GPT-5.1-Codex-Max OpenAI | Closed | Reasoning | 400K | 84 | 90% | 85% | 82% | 94% | 75% | 67% | 84% | 94% | 92% | 92% | 90% | 93% | 85% | 92% | 98% | 96% | 94% | 92% | 82% | 27% | 84% | 89% | 87% | 91% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 93% | 1349 |
13 Grok 4.1 xAI | Closed | Standard | 1M | 84 | 79% | 79% | 73% | 91% | 77% | 73% | 73% | 95% | 93% | 93% | 90% | 89% | 95% | 91% | 99% | 97% | 95% | 93% | 90% | 40% | 91% | 96% | 91% | 93% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 97% | 1435 |
14 Gemini 3 Pro Deep Think Google | Closed | Reasoning | 2M | 81 | 77% | 87% | 73% | 91% | 58% | 58% | 63% | 95% | 93% | 95% | 94% | 96% | 95% | 95% | 99% | 97% | 95% | 93% | 81% | 32% | 88% | 92% | 85% | 89% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 92% | 1349 |
15 GPT-5.1 OpenAI | Closed | Reasoning | 200K | 80 | 78% | 79% | 71% | 89% | 68% | 61% | 71% | 93% | 91% | 92% | 84% | 84% | 94% | 89% | 97% | 95% | 93% | 91% | 83% | 27% | 84% | 89% | 87% | 89% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 94% | 1334 |
16 GPT-5 (high) OpenAI | Closed | Reasoning | 128K | 79 | 78% | 75% | 72% | 85% | 67% | 62% | 70% | 89% | 87% | 94% | 83% | 80% | 93% | 85% | 93% | 91% | 89% | 87% | 83% | 27% | 83% | 89% | 85% | 91% | 95% | 97% | 96% | 91% | 93% | 92% | 94% | 94% | 1337 |
17 Claude Sonnet 4.6 Anthropic | Closed | Standard | 200K | 78 | 70% | 77% | 68% | 93% | 69% | 54% | 64% | 95% | 93% | 88% | 83% | 79% | 95% | 88% | 99% | 97% | 95% | 93% | 83% | 21% | 85% | 91% | 89% | 91% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 91% | 1339 |
18 GLM-5 (Reasoning) Zhipu AI | Open | Reasoning | 200K | 78 | 81% | 80% | 74% | 88% | 62% | 58% | 67% | 92% | 90% | 91% | 86% | 87% | 74% | 84% | 96% | 94% | 92% | 90% | 81% | 29% | 83% | 89% | 85% | 92% | 98% | 99% | 98% | 94% | 96% | 95% | 96% | 92% | 1340 |
19 GPT-5 (medium) OpenAI | Closed | Reasoning | 128K | 78 | 77% | 78% | 72% | 83% | 67% | 60% | 72% | 87% | 85% | 92% | 81% | 81% | 89% | 87% | 91% | 89% | 87% | 85% | 81% | 27% | 82% | 90% | 87% | 88% | 93% | 95% | 94% | 89% | 91% | 90% | 92% | 92% | 1328 |
20 Claude Opus 4.5 Anthropic | Closed | Standard | 200K | 77 | 71% | 73% | 68% | 91% | 68% | 57% | 62% | 95% | 93% | 87% | 82% | 81% | 94% | 87% | 99% | 97% | 95% | 93% | 81% | 20% | 84% | 90% | 84% | 90% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 89% | 1349 |
21 Gemini 3 Pro Google | Closed | Standard | 2M | 77 | 68% | 83% | 66% | 91% | 59% | 49% | 58% | 95% | 93% | 90% | 90% | 87% | 94% | 92% | 99% | 97% | 95% | 93% | 83% | 20% | 86% | 89% | 85% | 88% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 91% | 1328 |
22 o1-preview OpenAI | Closed | Reasoning | 200K | 77 | 77% | 79% | 71% | 86% | 65% | 60% | 69% | 88% | 86% | 93% | 87% | 83% | 72% | 80% | 92% | 90% | 88% | 86% | 80% | 32% | 83% | 90% | 86% | 88% | 94% | 96% | 95% | 90% | 92% | 91% | 93% | 94% | 1328 |
23 Claude Sonnet 4.5 Anthropic | Closed | Standard | 200K | 76 | 69% | 74% | 69% | 87% | 66% | 53% | 60% | 91% | 89% | 88% | 82% | 81% | 95% | 87% | 95% | 93% | 91% | 89% | 84% | 21% | 84% | 91% | 87% | 90% | 97% | 99% | 98% | 93% | 95% | 94% | 96% | 88% | 1346 |
24 Grok 4.1 Fast xAI | Closed | Standard | 1M | 76 | 74% | 73% | 66% | 86% | 68% | 54% | 63% | 90% | 88% | 87% | 87% | 89% | 91% | 83% | 94% | 92% | 90% | 88% | 81% | 20% | 83% | 88% | 83% | 90% | 96% | 98% | 97% | 92% | 94% | 93% | 95% | 89% | 1342 |
25 Kimi K2.5 (Reasoning) Moonshot AI | Closed | Reasoning | 128K | 76 | 75% | 77% | 68% | 84% | 65% | 58% | 70% | 88% | 86% | 91% | 82% | 81% | 72% | 77% | 92% | 90% | 88% | 86% | 81% | 27% | 80% | 88% | 86% | 91% | 94% | 96% | 95% | 90% | 92% | 91% | 93% | 92% | 1325 |