A curated subset of 500 problems from the MATH dataset, covering algebra, counting and probability, geometry, intermediate algebra, number theory, prealgebra, and precalculus.
Year
2021
Tasks
500 problems
Format
Free-form mathematical answers
Difficulty
High school to undergraduate
MATH-500 is one of the most widely cited math benchmarks. It is nearing saturation with top reasoning models scoring 96-99%, making it less useful for differentiating frontier models but still a standard baseline.
Measuring Mathematical Problem Solving With the MATH DatasetA curated subset of 500 problems from the MATH dataset, covering algebra, counting and probability, geometry, intermediate algebra, number theory, prealgebra, and precalculus.
GPT-5.3 Codex by OpenAI currently leads with a score of 99 on MATH-500.
88 AI models have been evaluated on MATH-500 on BenchLM.