Skip to main content

Harvard-MIT Mathematics Tournament February 2023 (HMMT Feb 2023)

A prestigious high school mathematics competition hosted jointly by Harvard and MIT, featuring challenging problems across various mathematical disciplines.

How BenchLM shows HMMT Feb 2023 right now

BenchLM is tracking HMMT Feb 2023 in the local dataset, but exact-source verification records for these rows are still being attached. To avoid a blank benchmark page, BenchLM shows the current tracked rows below as a display-only reference table.

These tracked rows are useful for inspection and spot-checking, but until exact-source attachments are completed they should not be treated as fully verified public benchmark rows.

106 tracked modelsLocal tracked rowsAwaiting exact-source attachmentsDisplay only

Tracked score on HMMT Feb 2023 — April 20, 2026

BenchLM mirrors the published tracked score view for HMMT Feb 2023. GPT-5.4 leads the public snapshot at 96% , followed by GPT-5.2 Pro (96%) and GPT-5.1-Codex-Max (95%). BenchLM does not use these results to rank models overall.

106 modelsMathStaleDisplay onlyUpdated April 20, 2026

The published HMMT Feb 2023 snapshot is tightly clustered at the top: GPT-5.4 sits at 96%, while the third row is only 1.0 points behind. The broader top-10 spread is 1.0 points, so many of the published scores sit in a relatively narrow band.

106 models have been evaluated on HMMT Feb 2023. The benchmark falls in the Math category. This category carries a 5% weight in BenchLM.ai's overall scoring system. HMMT Feb 2023 is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About HMMT Feb 2023

Year

2023

Tasks

Tournament problems

Format

Competition mathematics

Difficulty

High school olympiad level

HMMT is one of the most competitive high school mathematics tournaments in the US. Problems span algebra, geometry, combinatorics, and number theory, requiring deep mathematical insight.

BenchLM freshness & provenance

Version

HMMT Feb 2023 2023

Refresh cadence

Static

Staleness state

Stale

Question availability

Public benchmark set

StaleDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Tracked score table (106 models)

1
GPT-5.4gpt-5-4
96%
2
GPT-5.2 Progpt-5-2-pro
96%
3
GPT-5.1-Codex-Maxgpt-5-1-codex-max
95%
4
GPT-5.2-Codexgpt-5-2-codex
95%
5
GPT-5.3 Codexgpt-5-3-codex
95%
6
Grok 4.1grok-4-1
95%
7
Gemini 3 Pro Deep Thinkgemini-3-pro-deep-think
95%
8
Claude Opus 4.6claude-opus-4-6
95%
9
GPT-5.1gpt-5-1
95%
10
GPT-5.2gpt-5-2
95%
11
Claude Sonnet 4.6claude-sonnet-4-6
95%
12
Gemini 3 Progemini-3-pro
95%
13
Claude Opus 4.5claude-opus-4-5
95%
14
GPT-5.3 Instantgpt-5-3-instant
95%
15
GPT-5.2 Instantgpt-5-2-instant
95%
16
GLM-5 (Reasoning)glm-5-reasoning
94%
17
GPT-5.3-Codex-Sparkgpt-5-3-codex-spark
94%
18
Claude Sonnet 4.5claude-sonnet-4-5
93%
19
Grok 4.1 Fastgrok-4-1-fast
92%
20
GPT-5 (high)gpt-5-high
91%
21
90%
22
Kimi K2.5 (Reasoning)kimi-k2-5-reasoning
90%
23
GPT-5 (medium)gpt-5-medium
89%
24
Qwen3.5 397B (Reasoning)qwen3-5-397b-reasoning
89%
25
86%
26
GPT-5 minigpt-5-mini
86%
27
84%
28
GLM-5glm-5
84%
29
Grok 4grok-4
84%
30
DeepSeek V3.2 (Thinking)deepseek-v3-2-thinking
83%
31
GLM-4.7glm-4-7
82%
32
Qwen2.5-1Mqwen2-5-1m
81%
33
Step 3.5 Flashstep-3-5-flash
81%
34
Gemini 2.5 Progemini-2-5-pro
80%
35
Qwen2.5-72Bqwen2-5-72b
80%
36
DeepSeek V3.2deepseek-v3-2
80%
37
Qwen3.5 397Bqwen3-5-397b
79%
38
o4-mini (high)o4-mini-high
79%
39
DeepSeek Coder 2.0deepseek-coder-2-0
77%
40
Mercury 2mercury-2
77%
41
DeepSeekMath V2deepseekmath-v2
76%
42
DeepSeek LLM 2.0deepseek-llm-2-0
76%
43
MiMo-V2-Flashmimo-v2-flash
75%
44
Kimi K2.5kimi-k2-5
73%
45
Claude 4.1 Opusclaude-4-1-opus
72%
46
Mistral Large 3mistral-large-3
72%
47
Nemotron 3 Ultra 500Bnemotron-3-ultra-500b
70%
48
Aion-2.0aion-2-0
70%
49
Claude 4 Sonnetclaude-4-sonnet
69%
50
Ministral 3 14B (Reasoning)ministral-3-14b-reasoning
69%
51
MiniMax M2.5minimax-m2-5
69%
52
Seed 1.6seed-1-6
68%
53
Seed-2.0-Liteseed-2-0-lite
67%
54
Gemini 3 Flashgemini-3-flash
66%
55
Llama 3.1 405Bllama-3-1-405b
66%
56
Claude Haiku 4.5claude-haiku-4-5
64%
57
Mistral Large 2mistral-large-2
64%
58
Ministral 3 14Bministral-3-14b
64%
59
Nemotron 3 Super 120B A12Bnemotron-3-super-120b-a12b
63%
60
GPT-4ogpt-4o
62%
61
GLM-4.7-Flashglm-4-7-flash
62%
62
Nemotron 3 Super 100Bnemotron-3-super-100b
61%
63
Claude 3.5 Sonnetclaude-3-5-sonnet
61%
64
Mistral 8x7Bmistral-8x7b
61%
65
Grok Code Fast 1grok-code-fast-1
60%
66
Gemini 1.5 Progemini-1-5-pro
60%
67
Seed 1.6 Flashseed-1-6-flash
60%
68
Gemini 3.1 Flash-Litegemini-3-1-flash-lite
59%
69
Gemini 1.0 Progemini-1-0-pro
58%
70
Seed-2.0-Miniseed-2-0-mini
58%
71
Claude 3 Opusclaude-3-opus
57%
72
GPT-4 Turbogpt-4-turbo
56%
73
Llama 3 70Bllama-3-70b
54%
74
Nemotron 3 Nano 30Bnemotron-3-nano-30b
53%
75
Claude 3 Haikuclaude-3-haiku
52%
76
Nemotron-4 15Bnemotron-4-15b
50%
77
Moonshot v1moonshot-v1
49%
78
Z-1z-1
48%
79
GPT-OSS 120Bgpt-oss-120b
47%
80
Gemini 2.5 Flashgemini-2-5-flash
46%
81
Nemotron Ultra 253Bnemotron-ultra-253b
45%
82
Llama 4 Behemothllama-4-behemoth
44%
83
Llama 4 Scoutllama-4-scout
43%
84
Llama 4 Maverickllama-4-maverick
42%
85
LFM2-24B-A2Blfm2-24b-a2b
42%
86
Gemma 3 27Bgemma-3-27b
41%
87
DeepSeek-R1deepseek-r1
40%
88
Grok 3 [Beta]grok-3-beta
38%
89
Nova Pronova-pro
37%
90
Qwen3 235B 2507 (Reasoning)qwen3-235b-2507-reasoning
36%
91
Qwen3 235B 2507qwen3-235b-2507
35%
92
Claude 4.1 Opus Thinkingclaude-4-1-opus-thinking
34%
93
GLM-4.5glm-4-5
33%
94
MiniMax M1 80kminimax-m1-80k
32%
95
GLM-4.5-Airglm-4-5-air
31%
96
DeepSeek V3.1 (Reasoning)deepseek-v3-1-reasoning
30%
97
DeepSeek V3.1deepseek-v3-1
29%
98
Ministral 3 8B (Reasoning)ministral-3-8b-reasoning
29%
99
GPT-OSS 20Bgpt-oss-20b
27%
100
Mistral 7B v0.3mistral-7b-v0-3
26%
101
Ministral 3 8Bministral-3-8b
26%
102
Mistral 8x7B v0.2mistral-8x7b-v0-2
25%
103
LFM2.5-1.2B-Thinkinglfm2-5-1-2b-thinking
24%
104
Ministral 3 3B (Reasoning)ministral-3-3b-reasoning
23%
105
LFM2.5-1.2B-Instructlfm2-5-1-2b-instruct
20%
106
Ministral 3 3Bministral-3-3b
19%

FAQ

What does HMMT Feb 2023 measure?

A prestigious high school mathematics competition hosted jointly by Harvard and MIT, featuring challenging problems across various mathematical disciplines.

Which model leads the published HMMT Feb 2023 snapshot?

GPT-5.4 currently leads the published HMMT Feb 2023 snapshot with a tracked score of 96%. BenchLM shows this benchmark for display only and does not use it in overall rankings.

How many models are evaluated on HMMT Feb 2023?

106 AI models are included in BenchLM's mirrored HMMT Feb 2023 snapshot, based on the public leaderboard captured on April 20, 2026.

Last updated: April 20, 2026 · mirrored from the public benchmark leaderboard

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.