Skip to main content
Skip to main content
Math

Math Benchmarks — AIME, HMMT & MATH-500 Leaderboard

Mathematical reasoning and problem solving

Bottom line: Competition math is largely solved by frontier models — AIME and HMMT are saturated. BRUMO and MATH-500 still show meaningful separation.

AIME 2023 · AIME 2024 · AIME 2025 · AIME25 (Arcee) · HMMT Feb 2023 · HMMT Feb 2024 · HMMT Feb 2025 · BRUMO 2025 · MATH-500

Best Math picks

BenchLM summaries for math plus the practical tradeoffs users check next: open weights, price, speed, latency, and context.

Top AI Models for MathApril 2026

As of April 2026, GPT-5.3 Codex leads the provisional math leaderboard with a weighted score of 100.0%, followed by GPT-5.2-Codex (97.7%) and GPT-5.1-Codex-Max (97.2%). BenchLM is currently showing 86 provisional-ranked models and 0 verified-ranked models in this category.

What changed

Claude Mythos Preview leads math with top BRUMO and MATH-500 scores.

GPT-5.4 close second, with near-perfect AIME scores.

Gemini 3.1 Pro strong third — best value option for math-heavy workloads.

How to choose

Top models by benchmark

High school mathematics competition(25% of category score)

Math Leaderboard

Updated April 21, 2026

Sorted by math weighted score. Switch between provisional-ranked and verified-ranked modes to see the broader public dataset versus sourced-only ranking. Click column headers to re-sort by overall score or any benchmark.

86 ranked models
CSVJSON
Provisional-ranked mode includes source-unverified non-generated benchmark evidence.P = provisional benchmark row
100%
Est.89
97.7%
Est.79
97.2%
Est.78
96%
Est.86
5
94.9%
80
93.7%
Est.72
93%
Est.84
8
GPT-5.4
OpenAI
92.8%
93
92.3%
Est.80
91.9%
Est.80
91.7%
Est.73
12
90.4%
84
13
90.4%
Est.41
14
89.4%
91
99.8%
15
GLM-5
Z.AI
87.7%
77
93.3%
87.7%
Est.67
87%
17
o3-pro
OpenAI
86.4%
Est.59
18
GPT-5.2
OpenAI
83.7%
83
19
o3
OpenAI
83.4%
Est.59
20
83%
Est.83
21
82.7%
Est.68
82.1%
Est.62
94.1%
23
81.2%
Est.42
24
80%
Est.67
25
79.8%
Est.71
95.7%
Showing 25 of 86

These rankings update weekly

Get notified when models move. One email a week with what changed and why.

Free. No spam. Unsubscribe anytime.

Score in Context

What these scores mean

Math carries a 5% weight in overall scoring — relatively low because frontier models have saturated the main competition benchmarks. AIME and HMMT scores are 95-99% across top models. The weighted score now relies on BRUMO and MATH-500, which still show meaningful separation.

Known limitations

AIME and HMMT are effectively solved by AI — they are displayed for reference but no longer factor into the weighted score. If math reasoning is critical for your use case, look at BRUMO scores specifically, and consider models with explicit reasoning capabilities (chain-of-thought). See the AIME & HMMT explainer.

How we weight

Mathematics carries a 5% weight in BenchLM.ai's overall scoring. Frontier models score 95-99% on AIME and HMMT — competition math is effectively solved by AI.

AIME and HMMT are still displayed for reference but no longer factor into the weighted score due to saturation. BRUMO and MATH-500 show more meaningful separation. If mathematical reasoning is critical, prioritize models with explicit reasoning capabilities. See the math leaderboard or read the AIME & HMMT explainer.

Leaderboards exclude benchmark rows that BenchLM generated from other scores or cloned from reference models. When a weighted benchmark is missing after that filter, the category falls back to the remaining trustworthy public rows instead of filling the gap with synthetic values.

The full scoring rules, freshness handling, and runtime/pricing caveats live on the BenchLM methodology page.

BenchmarkWeightStatusDescription
AIME 2023Display onlyHigh school mathematics competition
AIME 2024Display onlyHigh school mathematics competition
AIME 202525%WeightedHigh school mathematics competition
AIME25 (Arcee)Display onlyDisplay-only AIME25 reference from Arcee AI's Trinity-Large-Thinking launch chart.
HMMT Feb 2023Display onlyCollegiate mathematics competition
HMMT Feb 2024Display onlyCollegiate mathematics competition
HMMT Feb 2025Display onlyCollegiate mathematics competition
BRUMO 202525%WeightedUniversity-level mathematics olympiad
MATH-50015%WeightedCurated 500-problem subset of the MATH dataset covering algebra, geometry, number theory, and more

Math benchmark updates

Math model rankings change weekly. Stay current.

Free. No spam. Unsubscribe anytime.

About Math Benchmarks

High school mathematics competition

Related