Skip to main content

Harvard-MIT Mathematics Tournament February 2024 (HMMT Feb 2024)

The 2024 February edition of the Harvard-MIT Mathematics Tournament, continuing the tradition of challenging high school mathematics competition.

How BenchLM shows HMMT Feb 2024 right now

BenchLM is tracking HMMT Feb 2024 in the local dataset, but exact-source verification records for these rows are still being attached. To avoid a blank benchmark page, BenchLM shows the current tracked rows below as a display-only reference table.

These tracked rows are useful for inspection and spot-checking, but until exact-source attachments are completed they should not be treated as fully verified public benchmark rows.

106 tracked modelsLocal tracked rowsAwaiting exact-source attachmentsDisplay only

Tracked score on HMMT Feb 2024 — April 20, 2026

BenchLM mirrors the published tracked score view for HMMT Feb 2024. GPT-5.4 leads the public snapshot at 98% , followed by GPT-5.2 Pro (98%) and GPT-5.1-Codex-Max (97%). BenchLM does not use these results to rank models overall.

106 modelsMathRefreshingDisplay onlyUpdated April 20, 2026

The published HMMT Feb 2024 snapshot is tightly clustered at the top: GPT-5.4 sits at 98%, while the third row is only 1.0 points behind. The broader top-10 spread is 1.0 points, so many of the published scores sit in a relatively narrow band.

106 models have been evaluated on HMMT Feb 2024. The benchmark falls in the Math category. This category carries a 5% weight in BenchLM.ai's overall scoring system. HMMT Feb 2024 is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About HMMT Feb 2024

Year

2024

Tasks

Tournament problems

Format

Competition mathematics

Difficulty

High school olympiad level

HMMT Feb 2024 maintains the high standards of mathematical rigor and creativity expected from this premier competition. Problems test advanced mathematical reasoning skills.

BenchLM freshness & provenance

Version

HMMT Feb 2024 2024

Refresh cadence

Annual

Staleness state

Refreshing

Question availability

Public benchmark set

RefreshingDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Tracked score table (106 models)

1
GPT-5.4gpt-5-4
98%
2
GPT-5.2 Progpt-5-2-pro
98%
3
GPT-5.1-Codex-Maxgpt-5-1-codex-max
97%
4
GPT-5.2-Codexgpt-5-2-codex
97%
5
GPT-5.3 Codexgpt-5-3-codex
97%
6
Grok 4.1grok-4-1
97%
7
Gemini 3 Pro Deep Thinkgemini-3-pro-deep-think
97%
8
Claude Opus 4.6claude-opus-4-6
97%
9
GPT-5.1gpt-5-1
97%
10
GPT-5.2gpt-5-2
97%
11
Claude Sonnet 4.6claude-sonnet-4-6
97%
12
Gemini 3 Progemini-3-pro
97%
13
Claude Opus 4.5claude-opus-4-5
97%
14
GPT-5.3 Instantgpt-5-3-instant
97%
15
GPT-5.2 Instantgpt-5-2-instant
97%
16
GLM-5 (Reasoning)glm-5-reasoning
96%
17
GPT-5.3-Codex-Sparkgpt-5-3-codex-spark
96%
18
Claude Sonnet 4.5claude-sonnet-4-5
95%
19
Grok 4.1 Fastgrok-4-1-fast
94%
20
GPT-5 (high)gpt-5-high
93%
21
92%
22
Kimi K2.5 (Reasoning)kimi-k2-5-reasoning
92%
23
GPT-5 (medium)gpt-5-medium
91%
24
Qwen3.5 397B (Reasoning)qwen3-5-397b-reasoning
91%
25
88%
26
GPT-5 minigpt-5-mini
88%
27
86%
28
GLM-5glm-5
86%
29
Grok 4grok-4
86%
30
DeepSeek V3.2 (Thinking)deepseek-v3-2-thinking
85%
31
GLM-4.7glm-4-7
84%
32
Qwen2.5-1Mqwen2-5-1m
83%
33
Step 3.5 Flashstep-3-5-flash
83%
34
Gemini 2.5 Progemini-2-5-pro
82%
35
Qwen2.5-72Bqwen2-5-72b
82%
36
DeepSeek V3.2deepseek-v3-2
82%
37
Qwen3.5 397Bqwen3-5-397b
81%
38
o4-mini (high)o4-mini-high
81%
39
DeepSeek Coder 2.0deepseek-coder-2-0
79%
40
Mercury 2mercury-2
79%
41
DeepSeekMath V2deepseekmath-v2
78%
42
DeepSeek LLM 2.0deepseek-llm-2-0
78%
43
MiMo-V2-Flashmimo-v2-flash
77%
44
Kimi K2.5kimi-k2-5
75%
45
Claude 4.1 Opusclaude-4-1-opus
74%
46
Mistral Large 3mistral-large-3
74%
47
Nemotron 3 Ultra 500Bnemotron-3-ultra-500b
72%
48
Aion-2.0aion-2-0
72%
49
Claude 4 Sonnetclaude-4-sonnet
71%
50
Ministral 3 14B (Reasoning)ministral-3-14b-reasoning
71%
51
MiniMax M2.5minimax-m2-5
71%
52
Seed 1.6seed-1-6
70%
53
Seed-2.0-Liteseed-2-0-lite
69%
54
Gemini 3 Flashgemini-3-flash
68%
55
Llama 3.1 405Bllama-3-1-405b
68%
56
Claude Haiku 4.5claude-haiku-4-5
66%
57
Mistral Large 2mistral-large-2
66%
58
Ministral 3 14Bministral-3-14b
66%
59
Nemotron 3 Super 120B A12Bnemotron-3-super-120b-a12b
65%
60
GPT-4ogpt-4o
64%
61
GLM-4.7-Flashglm-4-7-flash
64%
62
Nemotron 3 Super 100Bnemotron-3-super-100b
63%
63
Claude 3.5 Sonnetclaude-3-5-sonnet
63%
64
Mistral 8x7Bmistral-8x7b
63%
65
Grok Code Fast 1grok-code-fast-1
62%
66
Gemini 1.5 Progemini-1-5-pro
62%
67
Seed 1.6 Flashseed-1-6-flash
62%
68
Gemini 3.1 Flash-Litegemini-3-1-flash-lite
61%
69
Gemini 1.0 Progemini-1-0-pro
60%
70
Seed-2.0-Miniseed-2-0-mini
60%
71
Claude 3 Opusclaude-3-opus
59%
72
GPT-4 Turbogpt-4-turbo
58%
73
Llama 3 70Bllama-3-70b
56%
74
Nemotron 3 Nano 30Bnemotron-3-nano-30b
55%
75
Claude 3 Haikuclaude-3-haiku
54%
76
Nemotron-4 15Bnemotron-4-15b
52%
77
Moonshot v1moonshot-v1
51%
78
Z-1z-1
50%
79
GPT-OSS 120Bgpt-oss-120b
49%
80
Gemini 2.5 Flashgemini-2-5-flash
48%
81
Nemotron Ultra 253Bnemotron-ultra-253b
47%
82
Llama 4 Behemothllama-4-behemoth
46%
83
Llama 4 Scoutllama-4-scout
45%
84
Llama 4 Maverickllama-4-maverick
44%
85
LFM2-24B-A2Blfm2-24b-a2b
44%
86
Gemma 3 27Bgemma-3-27b
43%
87
DeepSeek-R1deepseek-r1
42%
88
Grok 3 [Beta]grok-3-beta
40%
89
Nova Pronova-pro
39%
90
Qwen3 235B 2507 (Reasoning)qwen3-235b-2507-reasoning
38%
91
Qwen3 235B 2507qwen3-235b-2507
37%
92
Claude 4.1 Opus Thinkingclaude-4-1-opus-thinking
36%
93
GLM-4.5glm-4-5
35%
94
MiniMax M1 80kminimax-m1-80k
34%
95
GLM-4.5-Airglm-4-5-air
33%
96
DeepSeek V3.1 (Reasoning)deepseek-v3-1-reasoning
32%
97
DeepSeek V3.1deepseek-v3-1
31%
98
Ministral 3 8B (Reasoning)ministral-3-8b-reasoning
31%
99
GPT-OSS 20Bgpt-oss-20b
29%
100
Mistral 7B v0.3mistral-7b-v0-3
28%
101
Ministral 3 8Bministral-3-8b
28%
102
Mistral 8x7B v0.2mistral-8x7b-v0-2
27%
103
LFM2.5-1.2B-Thinkinglfm2-5-1-2b-thinking
26%
104
Ministral 3 3B (Reasoning)ministral-3-3b-reasoning
25%
105
LFM2.5-1.2B-Instructlfm2-5-1-2b-instruct
22%
106
Ministral 3 3Bministral-3-3b
21%

FAQ

What does HMMT Feb 2024 measure?

The 2024 February edition of the Harvard-MIT Mathematics Tournament, continuing the tradition of challenging high school mathematics competition.

Which model leads the published HMMT Feb 2024 snapshot?

GPT-5.4 currently leads the published HMMT Feb 2024 snapshot with a tracked score of 98%. BenchLM shows this benchmark for display only and does not use it in overall rankings.

How many models are evaluated on HMMT Feb 2024?

106 AI models are included in BenchLM's mirrored HMMT Feb 2024 snapshot, based on the public leaderboard captured on April 20, 2026.

Last updated: April 20, 2026 · mirrored from the public benchmark leaderboard

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.