Math Benchmarks

Mathematical reasoning and problem solving - Compare AI models across 7 mathematical benchmarks including AIME, HMMT, BRUMO, and more.

Filters & Search

Filter models by creator, type, reasoning, or search by name to find the perfect AI model for your needs

Math Benchmark Results

Showing 25 of 52 models • Click column headers to sort

1
GPT-5 (high)
OpenAI
OpenAIProprietaryReasoning128K7295%97%96%91%93%92%94%
2
o1-preview
OpenAI
OpenAIProprietaryReasoning200K7194%96%95%90%92%91%93%
3
GPT-5 (medium)
OpenAI
OpenAIProprietaryReasoning128K7093%95%94%89%91%90%92%
4
Grok 4
xAI
xAIProprietaryNon-Reasoning128K6987%89%88%84%86%85%87%
5
GPT-5 mini
OpenAI
OpenAIProprietaryReasoning128K6890%92%91%86%88%87%89%
6
o3-pro
OpenAI
OpenAIProprietaryReasoning200K6890%92%91%86%88%87%89%
7
o3
OpenAI
OpenAIProprietaryReasoning200K6788%90%89%84%86%85%87%
8
Qwen2.5-1M
Alibaba
AlibabaOpen WeightNon-Reasoning1M6685%87%86%81%83%82%84%
9
Qwen2.5-72B
Alibaba
AlibabaOpen WeightNon-Reasoning128K6584%86%85%80%82%81%83%
10
o4-mini (high)
OpenAI
OpenAIProprietaryNon-Reasoning200K6583%85%84%79%81%80%82%
11
Gemini 2.5 Pro
Google
GoogleProprietaryNon-Reasoning2M6584%86%85%80%82%81%83%
12
DeepSeek Coder 2.0
DeepSeek
DeepSeekOpen WeightNon-Reasoning128K6481%83%82%77%79%78%80%
13
DeepSeek LLM 2.0
DeepSeek
DeepSeekOpen WeightNon-Reasoning128K6380%82%81%76%78%77%79%
14
Claude 4.1 Opus
Anthropic
AnthropicProprietaryNon-Reasoning200K6176%78%77%72%74%73%75%
15
Claude 4 Sonnet
Anthropic
AnthropicProprietaryNon-Reasoning200K5973%75%74%69%71%70%72%
16
Llama 3.1 405B
Meta
MetaOpen WeightNon-Reasoning128K5870%72%71%66%68%67%69%
17
Mistral Large 2
Mistral
MistralProprietaryNon-Reasoning128K5768%70%69%64%66%65%67%
18
GPT-4o
OpenAI
OpenAIProprietaryNon-Reasoning128K5666%68%67%62%64%63%65%
19
Claude 3.5 Sonnet
Anthropic
AnthropicProprietaryNon-Reasoning200K5565%67%66%61%63%62%64%
20
Gemini 1.5 Pro
Google
GoogleProprietaryNon-Reasoning2M5464%66%65%60%62%61%63%
21
Mistral 8x7B
Mistral
MistralOpen WeightNon-Reasoning32K5265%67%66%61%63%62%64%
22
Gemini 1.0 Pro
Google
GoogleProprietaryNon-Reasoning32K5262%64%63%58%60%59%61%
23
Claude 3 Opus
Anthropic
AnthropicProprietaryNon-Reasoning200K5161%63%62%57%59%58%60%
24
GPT-4 Turbo
OpenAI
OpenAIProprietaryNon-Reasoning128K5060%62%61%56%58%57%59%
25
Llama 3 70B
Meta
MetaOpen WeightNon-Reasoning128K4858%60%59%54%56%55%57%

Showing 25 of 52 models

About Math Benchmarks

AIME 2023

High school mathematics competition

AIME 2024

High school mathematics competition

AIME 2025

High school mathematics competition

HMMT Feb 2023

Collegiate mathematics competition

HMMT Feb 2024

Collegiate mathematics competition

HMMT Feb 2025

Collegiate mathematics competition

BRUMO 2025

University-level mathematics olympiad