| Knowledge | Coding | Math | Reasoning | IF | Multi | Arena | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 GPT-5.3 Codex OpenAI | Closed | Reasoning | 400K | 92 | 99% | 97% | 95% | 93% | 90% | 44% | 95% | 85% | 85% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 99% | 95% | 93% | 98% | 93% | 96% | 1416 | |
2 GPT-5.4 OpenAI | Closed | Reasoning | 1M | 91 | 99% | 97% | 95% | 93% | 91% | 46% | 91% | 81% | 75% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 99% | 95% | 93% | 95% | 95% | 95% | 1442 | |
3 GPT-5.2 OpenAI | Closed | Reasoning | 400K | 91 | 99% | 97% | 95% | 93% | 88% | 42% | 91% | 80% | 79% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 98% | 95% | 93% | 96% | 94% | 95% | 1426 | |
4 Claude Opus 4.6 Anthropic | Closed | Standard | 1M | 90 | 99% | 97% | 95% | 93% | 92% | 38% | 91% | 80% | 75% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 98% | 95% | 93% | 94% | 95% | 96% | 1422 | |
5 Gemini 3.1 Pro Google | Closed | Standard | 1M | 89 | 99% | 97% | 95% | 93% | 92% | 40% | 91% | 75% | 71% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 97% | 95% | 93% | 92% | 95% | 96% | 1423 | |
6 Grok 4.1 xAI | Closed | Standard | 128K | 89 | 99% | 97% | 95% | 93% | 90% | 40% | 91% | 77% | 73% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 97% | 95% | 93% | 93% | 93% | 96% | 1435 | |
7 GPT-5.2-Codex OpenAI | Closed | Reasoning | 400K | 88 | 99% | 97% | 95% | 93% | 80% | 26% | 95% | 76% | 66% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 94% | 95% | 93% | 90% | 92% | 91% | 1331 | |
8 GPT-5.1-Codex-Max OpenAI | Closed | Reasoning | 400K | 87 | 98% | 96% | 94% | 92% | 82% | 27% | 94% | 75% | 67% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 93% | 94% | 92% | 92% | 91% | 89% | 1349 | |
9 Claude Sonnet 4.6 Anthropic | Closed | Standard | 1M | 86 | 99% | 97% | 95% | 93% | 83% | 21% | 93% | 69% | 54% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 91% | 95% | 93% | 88% | 91% | 91% | 1339 | |
10 Gemini 3 Pro Deep Think Google | Closed | Reasoning | 2M | 85 | 99% | 97% | 95% | 93% | 81% | 32% | 91% | 58% | 58% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 92% | 95% | 93% | 95% | 89% | 92% | 1349 | |
11 Claude Opus 4.5 Anthropic | Closed | Standard | 200K | 85 | 99% | 97% | 95% | 93% | 81% | 20% | 91% | 68% | 57% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 89% | 95% | 93% | 87% | 90% | 90% | 1349 | |
12 GPT-5.1 OpenAI | Closed | Reasoning | 400K | 85 | 97% | 95% | 93% | 91% | 83% | 27% | 89% | 68% | 61% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 94% | 93% | 91% | 92% | 89% | 89% | 1334 | |
13 GPT-5 (high) OpenAI | Closed | Reasoning | 128K | 84 | 93% | 91% | 89% | 87% | 83% | 27% | 85% | 67% | 62% | 95% | 97% | 96% | 91% | 93% | 92% | 94% | 94% | 89% | 87% | 94% | 91% | 89% | 1337 | |
14 Gemini 3 Pro Google | Closed | Standard | 2M | 84 | 99% | 97% | 95% | 93% | 83% | 20% | 91% | 59% | 49% | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 91% | 95% | 93% | 90% | 88% | 89% | 1328 | |
15 GLM-5 (Reasoning) Zhipu AI | Open | Reasoning | 200K | 84 | 96% | 94% | 92% | 90% | 81% | 29% | 88% | 62% | 58% | 98% | 99% | 98% | 94% | 96% | 95% | 96% | 92% | 92% | 90% | 91% | 92% | 89% | 1340 | |
16 o1-preview OpenAI | Closed | Reasoning | 200K | 83 | 92% | 90% | 88% | 86% | 80% | 32% | 86% | 65% | 60% | 94% | 96% | 95% | 90% | 92% | 91% | 93% | 94% | 88% | 86% | 93% | 88% | 90% | 1328 | |
17 Claude Sonnet 4.5 Anthropic | Closed | Standard | 1M | 83 | 95% | 93% | 91% | 89% | 84% | 21% | 87% | 66% | 53% | 97% | 99% | 98% | 93% | 95% | 94% | 96% | 88% | 91% | 89% | 88% | 90% | 91% | 1346 | |
18 Grok 4.1 Fast xAI | Closed | Standard | 2M | 83 | 94% | 92% | 90% | 88% | 81% | 20% | 86% | 68% | 54% | 96% | 98% | 97% | 92% | 94% | 93% | 95% | 89% | 90% | 88% | 87% | 90% | 88% | 1342 | |
19 GPT-5 (medium) OpenAI | Closed | Reasoning | 128K | 82 | 91% | 89% | 87% | 85% | 81% | 27% | 83% | 67% | 60% | 93% | 95% | 94% | 89% | 91% | 90% | 92% | 92% | 87% | 85% | 92% | 88% | 90% | 1328 | |
20 Kimi K2.5 (Reasoning) Moonshot AI | Open | Reasoning | 128K | 82 | 92% | 90% | 88% | 86% | 81% | 27% | 84% | 65% | 58% | 94% | 96% | 95% | 90% | 92% | 91% | 93% | 92% | 88% | 86% | 91% | 91% | 88% | 1325 | |
21 Qwen3.5 397B (Reasoning) Alibaba | Open | Reasoning | 128K | 82 | 91% | 89% | 87% | 85% | 81% | 29% | 83% | 60% | 60% | 93% | 95% | 94% | 89% | 91% | 90% | 92% | 93% | 87% | 85% | 91% | 89% | 91% | 1326 | |
22 o3-pro OpenAI | Closed | Reasoning | 200K | 77 | 88% | 89% | 87% | 85% | 75% | 26% | 80% | 46% | 44% | 90% | 92% | 91% | 86% | 88% | 87% | 89% | 89% | 86% | 84% | 89% | 82% | 83% | 1242 | |
23 o3 OpenAI | Closed | Reasoning | 200K | 76 | 86% | 87% | 85% | 83% | 75% | 24% | 78% | 50% | 40% | 88% | 90% | 89% | 84% | 86% | 85% | 87% | 88% | 84% | 82% | 86% | 85% | 83% | 1258 | |
24 DeepSeek V3.2 (Thinking) DeepSeek | Open | Reasoning | 128K | 75 | 87% | 85% | 83% | 81% | 73% | 22% | 79% | 48% | 45% | 87% | 89% | 88% | 83% | 85% | 84% | 86% | 84% | 83% | 81% | 86% | 85% | 84% | 1260 | |
25 GPT-5 mini OpenAI | Closed | Reasoning | 128K | 74 | 88% | 86% | 84% | 82% | 73% | 16% | 80% | 41% | 37% | 90% | 92% | 91% | 86% | 88% | 87% | 89% | 85% | 84% | 82% | 87% | 82% | 82% | 1243 | |