An expert-level mathematical reasoning benchmark by Epoch AI featuring original, research-level problems created by mathematicians including IMO gold medalists and Fields Medal recipients. Problems require deep creativity and multi-step reasoning.
As of May 22, 2026, GPT-5.5 Pro leads the FrontierMath leaderboard with 52.4% , followed by GPT-5.5 (51.7%) and GPT-5.4 Pro (50%).
GPT-5.5 Pro
OpenAI
GPT-5.5
OpenAI
GPT-5.4 Pro
OpenAI
According to BenchLM.ai, GPT-5.5 Pro leads the FrontierMath benchmark with a score of 52.4%, followed by GPT-5.5 (51.7%) and GPT-5.4 Pro (50%). The top models are clustered within 2.4 points, suggesting this benchmark is nearing saturation for frontier models.
4 models have been evaluated on FrontierMath. The benchmark falls in the Math category. This category carries a 5% weight in BenchLM.ai's overall scoring system. Within that category, FrontierMath contributes 35% of the category score, so strong performance here directly affects a model's overall ranking.
Year
2024
Tasks
350 original research-level math problems
Format
Open-ended mathematical reasoning with tool access
Difficulty
Research-level mathematics
FrontierMath is the hardest public math benchmark. It consists of 300 Tier 1-3 problems and 50 Tier 4 problems, all original and unpublished. Models are evaluated with access to Python and computational tools. Top models score under 50%, making it a critical discriminator for frontier mathematical reasoning.
Version
FrontierMath 2024
Refresh cadence
Annual
Staleness state
Refreshing
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
An expert-level mathematical reasoning benchmark by Epoch AI featuring original, research-level problems created by mathematicians including IMO gold medalists and Fields Medal recipients. Problems require deep creativity and multi-step reasoning.
GPT-5.5 Pro by OpenAI currently leads with a score of 52.4% on FrontierMath.
4 AI models have been evaluated on FrontierMath on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.