Math
Math Benchmarks
Mathematical reasoning and problem solving
AIME 2023 · AIME 2024 · AIME 2025 · HMMT Feb 2023 · HMMT Feb 2024 · HMMT Feb 2025 · BRUMO 2025 · MATH-500
88 models
1 GPT-5.3 Codex OpenAI | Closed | Reasoning | 400K | 92 | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 99% |
2 GPT-5.4 OpenAI | Closed | Reasoning | 1M | 91 | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 99% |
3 GPT-5.2 OpenAI | Closed | Reasoning | 400K | 91 | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 98% |
4 Claude Opus 4.6 Anthropic | Closed | Standard | 1M | 90 | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 98% |
5 Gemini 3.1 Pro Google | Closed | Standard | 1M | 89 | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 97% |
6 Grok 4.1 xAI | Closed | Standard | 128K | 89 | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 97% |
7 GPT-5.2-Codex OpenAI | Closed | Reasoning | 400K | 88 | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 94% |
8 GPT-5.1-Codex-Max OpenAI | Closed | Reasoning | 400K | 87 | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 93% |
9 Claude Sonnet 4.6 Anthropic | Closed | Standard | 1M | 86 | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 91% |
10 Gemini 3 Pro Deep Think Google | Closed | Reasoning | 2M | 85 | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 92% |
11 Claude Opus 4.5 Anthropic | Closed | Standard | 200K | 85 | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 89% |
12 GPT-5.1 OpenAI | Closed | Reasoning | 400K | 85 | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 94% |
13 GPT-5 (high) OpenAI | Closed | Reasoning | 128K | 84 | 95% | 97% | 96% | 91% | 93% | 92% | 94% | 94% |
14 Gemini 3 Pro Google | Closed | Standard | 2M | 84 | 99% | 99% | 98% | 95% | 97% | 96% | 96% | 91% |
15 GLM-5 (Reasoning) Zhipu AI | Open | Reasoning | 200K | 84 | 98% | 99% | 98% | 94% | 96% | 95% | 96% | 92% |
16 o1-preview OpenAI | Closed | Reasoning | 200K | 83 | 94% | 96% | 95% | 90% | 92% | 91% | 93% | 94% |
17 Claude Sonnet 4.5 Anthropic | Closed | Standard | 1M | 83 | 97% | 99% | 98% | 93% | 95% | 94% | 96% | 88% |
18 Grok 4.1 Fast xAI | Closed | Standard | 2M | 83 | 96% | 98% | 97% | 92% | 94% | 93% | 95% | 89% |
19 GPT-5 (medium) OpenAI | Closed | Reasoning | 128K | 82 | 93% | 95% | 94% | 89% | 91% | 90% | 92% | 92% |
20 Kimi K2.5 (Reasoning) Moonshot AI | Open | Reasoning | 128K | 82 | 94% | 96% | 95% | 90% | 92% | 91% | 93% | 92% |
21 Qwen3.5 397B (Reasoning) Alibaba | Open | Reasoning | 128K | 82 | 93% | 95% | 94% | 89% | 91% | 90% | 92% | 93% |
22 o3-pro OpenAI | Closed | Reasoning | 200K | 77 | 90% | 92% | 91% | 86% | 88% | 87% | 89% | 89% |
23 o3 OpenAI | Closed | Reasoning | 200K | 76 | 88% | 90% | 89% | 84% | 86% | 85% | 87% | 88% |
24 DeepSeek V3.2 (Thinking) DeepSeek | Open | Reasoning | 128K | 75 | 87% | 89% | 88% | 83% | 85% | 84% | 86% | 84% |
25 GPT-5 mini OpenAI | Closed | Reasoning | 128K | 74 | 90% | 92% | 91% | 86% | 88% | 87% | 89% | 85% |
Showing 25 of 88
About Math Benchmarks
High school mathematics competition