MATH-500 Problem Set (MATH-500)

Name: MATH-500 Problem Set
Creator: BenchLM

A curated subset of 500 problems from the MATH dataset, covering algebra, counting and probability, geometry, intermediate algebra, number theory, prealgebra, and precalculus.

Top models on MATH-500 — June 2, 2026

As of June 2, 2026, MiniCPM5-1B leads the MATH-500 leaderboard with 91.6% , followed by LFM2.5-8B-A1B (88.8%).

1Open

MiniCPM5-1B

OpenBMB

91.6%

Overall ~34Context 131K

2Open

LFM2.5-8B-A1B

LiquidAI

88.8%

Overall —Context 128K

2 modelsMath15% of category scoreStaleUpdated June 2, 2026

About MATH-500

Year

2021

Tasks

500 problems

Format

Free-form mathematical answers

Difficulty

High school to undergraduate

MATH-500 is one of the most widely cited math benchmarks. It is nearing saturation with top reasoning models scoring 96-99%, making it less useful for differentiating frontier models but still a standard baseline.

Measuring Mathematical Problem Solving With the MATH Dataset

BenchLM freshness & provenance

Version

MATH-500 2021

Refresh cadence

Static

Staleness state

Stale

Question availability

Public benchmark set

Stale

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Leaderboard (2 models)

MiniCPM5-1B

OpenBMBOpen

91.6%

LFM2.5-8B-A1B

LiquidAIOpen

88.8%

FAQ

What does MATH-500 measure?

A curated subset of 500 problems from the MATH dataset, covering algebra, counting and probability, geometry, intermediate algebra, number theory, prealgebra, and precalculus.

Which model scores highest on MATH-500?

MiniCPM5-1B by OpenBMB currently leads with a score of 91.6% on MATH-500.

How many models are evaluated on MATH-500?

2 AI models have been evaluated on MATH-500 on BenchLM.

Compare Top Models on MATH-500

MiniCPM5-1B vs LFM2.5-8B-A1B

Last updated: June 2, 2026 · BenchLM version MATH-500 2021

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.