Skip to main content

FrontierMath

An expert-level mathematical reasoning benchmark by Epoch AI featuring original, research-level problems created by mathematicians including IMO gold medalists and Fields Medal recipients. Problems require deep creativity and multi-step reasoning.

Top models on FrontierMath — May 22, 2026

As of May 22, 2026, GPT-5.5 Pro leads the FrontierMath leaderboard with 52.4% , followed by GPT-5.5 (51.7%) and GPT-5.4 Pro (50%).

4 modelsMath35% of category scoreRefreshingUpdated May 22, 2026

According to BenchLM.ai, GPT-5.5 Pro leads the FrontierMath benchmark with a score of 52.4%, followed by GPT-5.5 (51.7%) and GPT-5.4 Pro (50%). The top models are clustered within 2.4 points, suggesting this benchmark is nearing saturation for frontier models.

4 models have been evaluated on FrontierMath. The benchmark falls in the Math category. This category carries a 5% weight in BenchLM.ai's overall scoring system. Within that category, FrontierMath contributes 35% of the category score, so strong performance here directly affects a model's overall ranking.

About FrontierMath

Year

2024

Tasks

350 original research-level math problems

Format

Open-ended mathematical reasoning with tool access

Difficulty

Research-level mathematics

FrontierMath is the hardest public math benchmark. It consists of 300 Tier 1-3 problems and 50 Tier 4 problems, all original and unpublished. Models are evaluated with access to Python and computational tools. Top models score under 50%, making it a critical discriminator for frontier mathematical reasoning.

BenchLM freshness & provenance

Version

FrontierMath 2024

Refresh cadence

Annual

Staleness state

Refreshing

Question availability

Public benchmark set

Refreshing

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Leaderboard (4 models)

1
52.4%
2
51.7%
3
50%
4
43.8%

FAQ

What does FrontierMath measure?

An expert-level mathematical reasoning benchmark by Epoch AI featuring original, research-level problems created by mathematicians including IMO gold medalists and Fields Medal recipients. Problems require deep creativity and multi-step reasoning.

Which model scores highest on FrontierMath?

GPT-5.5 Pro by OpenAI currently leads with a score of 52.4% on FrontierMath.

How many models are evaluated on FrontierMath?

4 AI models have been evaluated on FrontierMath on BenchLM.

Last updated: May 22, 2026 · BenchLM version FrontierMath 2024

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.