Skip to main content

FrontierMath

An expert-level mathematical reasoning benchmark by Epoch AI featuring original, research-level problems created by mathematicians including IMO gold medalists and Fields Medal recipients. Problems require deep creativity and multi-step reasoning.

Top models on FrontierMath — April 16, 2026

As of April 16, 2026, GPT-5.4 Pro leads the FrontierMath leaderboard with 50%.

1 modelsMath35% of category scoreRefreshingUpdated April 16, 2026

About FrontierMath

Year

2024

Tasks

350 original research-level math problems

Format

Open-ended mathematical reasoning with tool access

Difficulty

Research-level mathematics

FrontierMath is the hardest public math benchmark. It consists of 300 Tier 1-3 problems and 50 Tier 4 problems, all original and unpublished. Models are evaluated with access to Python and computational tools. Top models score under 50%, making it a critical discriminator for frontier mathematical reasoning.

BenchLM freshness & provenance

Version

FrontierMath 2024

Refresh cadence

Annual

Staleness state

Refreshing

Question availability

Public benchmark set

Refreshing

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Leaderboard (1 models)

1
50%

FAQ

What does FrontierMath measure?

An expert-level mathematical reasoning benchmark by Epoch AI featuring original, research-level problems created by mathematicians including IMO gold medalists and Fields Medal recipients. Problems require deep creativity and multi-step reasoning.

Which model scores highest on FrontierMath?

GPT-5.4 Pro by OpenAI currently leads with a score of 50% on FrontierMath.

How many models are evaluated on FrontierMath?

1 AI models have been evaluated on FrontierMath on BenchLM.

Last updated: April 16, 2026 · BenchLM version FrontierMath 2024

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.