A 15-question, 3-hour examination where each answer is an integer from 000 to 999. Serves as the intermediate step between AMC 10/12 and the USA Mathematical Olympiad (USAMO).
BenchLM is tracking AIME 2023 in the local dataset, but exact-source verification records for these rows are still being attached. To avoid a blank benchmark page, BenchLM shows the current tracked rows below as a display-only reference table.
These tracked rows are useful for inspection and spot-checking, but until exact-source attachments are completed they should not be treated as fully verified public benchmark rows.
BenchLM mirrors the published tracked score view for AIME 2023. GPT-5.1-Codex-Max leads the public snapshot at 99% , followed by GPT-5.2-Codex (99%) and GPT-5.3 Codex (99%). BenchLM does not use these results to rank models overall.
GPT-5.1-Codex-Max
OpenAI
gpt-5-1-codex-max
GPT-5.2-Codex
OpenAI
gpt-5-2-codex
GPT-5.3 Codex
OpenAI
gpt-5-3-codex
The published AIME 2023 snapshot is tightly clustered at the top: GPT-5.1-Codex-Max sits at 99%, while the third row is only 0.0 points behind. The broader top-10 spread is 0.0 points, so many of the published scores sit in a relatively narrow band.
106 models have been evaluated on AIME 2023. The benchmark falls in the Math category. This category carries a 5% weight in BenchLM.ai's overall scoring system. AIME 2023 is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.
Year
2023
Tasks
15 problems
Format
Integer answers 000-999
Difficulty
High school olympiad level
AIME is designed for students who score well on AMC 10/12. Problems require creative problem-solving and mathematical insight beyond standard high school curriculum. Only the top scorers qualify for USAMO.
Version
AIME 2023 2023
Refresh cadence
Static
Staleness state
Stale
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
A 15-question, 3-hour examination where each answer is an integer from 000 to 999. Serves as the intermediate step between AMC 10/12 and the USA Mathematical Olympiad (USAMO).
GPT-5.1-Codex-Max currently leads the published AIME 2023 snapshot with a tracked score of 99%. BenchLM shows this benchmark for display only and does not use it in overall rankings.
106 AI models are included in BenchLM's mirrored AIME 2023 snapshot, based on the public leaderboard captured on April 20, 2026.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.