Math

Math Benchmarks

Mathematical reasoning and problem solving

AIME 2023 · AIME 2024 · AIME 2025 · HMMT Feb 2023 · HMMT Feb 2024 · HMMT Feb 2025 · BRUMO 2025 · MATH-500

88 models
1
GPT-5.3 Codex
OpenAI
ClosedReasoning400K9299%99%98%95%97%96%96%99%
2
GPT-5.4
OpenAI
ClosedReasoning1M9199%99%98%95%97%96%96%99%
3
GPT-5.2
OpenAI
ClosedReasoning400K9199%99%98%95%97%96%96%98%
4
Claude Opus 4.6
Anthropic
ClosedStandard1M9099%99%98%95%97%96%96%98%
5
Gemini 3.1 Pro
Google
ClosedStandard1M8999%99%98%95%97%96%96%97%
6
Grok 4.1
xAI
ClosedStandard128K8999%99%98%95%97%96%96%97%
7
GPT-5.2-Codex
OpenAI
ClosedReasoning400K8899%99%98%95%97%96%96%94%
8
GPT-5.1-Codex-Max
OpenAI
ClosedReasoning400K8799%99%98%95%97%96%96%93%
9
Claude Sonnet 4.6
Anthropic
ClosedStandard1M8699%99%98%95%97%96%96%91%
10
Gemini 3 Pro Deep Think
Google
ClosedReasoning2M8599%99%98%95%97%96%96%92%
11
Claude Opus 4.5
Anthropic
ClosedStandard200K8599%99%98%95%97%96%96%89%
12
GPT-5.1
OpenAI
ClosedReasoning400K8599%99%98%95%97%96%96%94%
13
GPT-5 (high)
OpenAI
ClosedReasoning128K8495%97%96%91%93%92%94%94%
14
Gemini 3 Pro
Google
ClosedStandard2M8499%99%98%95%97%96%96%91%
15
GLM-5 (Reasoning)
Zhipu AI
OpenReasoning200K8498%99%98%94%96%95%96%92%
16
o1-preview
OpenAI
ClosedReasoning200K8394%96%95%90%92%91%93%94%
17
Claude Sonnet 4.5
Anthropic
ClosedStandard1M8397%99%98%93%95%94%96%88%
18
Grok 4.1 Fast
xAI
ClosedStandard2M8396%98%97%92%94%93%95%89%
19
GPT-5 (medium)
OpenAI
ClosedReasoning128K8293%95%94%89%91%90%92%92%
20
Kimi K2.5 (Reasoning)
Moonshot AI
OpenReasoning128K8294%96%95%90%92%91%93%92%
21
Qwen3.5 397B (Reasoning)
Alibaba
OpenReasoning128K8293%95%94%89%91%90%92%93%
22
o3-pro
OpenAI
ClosedReasoning200K7790%92%91%86%88%87%89%89%
23
o3
OpenAI
ClosedReasoning200K7688%90%89%84%86%85%87%88%
24
DeepSeek V3.2 (Thinking)
DeepSeek
OpenReasoning128K7587%89%88%83%85%84%86%84%
25
GPT-5 mini
OpenAI
ClosedReasoning128K7490%92%91%86%88%87%89%85%
Showing 25 of 88

About Math Benchmarks

High school mathematics competition