Skip to main content

Harvard-MIT Mathematics Tournament February 2025 (HMMT Feb 2025)

The most recent February edition of the Harvard-MIT Mathematics Tournament, featuring the latest challenging problems in competitive mathematics.

How BenchLM shows HMMT Feb 2025 right now

BenchLM is tracking HMMT Feb 2025 in the local dataset, but exact-source verification records for these rows are still being attached. To avoid a blank benchmark page, BenchLM shows the current tracked rows below as a display-only reference table.

These tracked rows are useful for inspection and spot-checking, but until exact-source attachments are completed they should not be treated as fully verified public benchmark rows.

107 tracked modelsLocal tracked rowsAwaiting exact-source attachmentsDisplay only

Tracked score on HMMT Feb 2025 — April 20, 2026

BenchLM mirrors the published tracked score view for HMMT Feb 2025. GLM-4.7 leads the public snapshot at 97.1% , followed by GPT-5.4 (97%) and GPT-5.2 Pro (97%). BenchLM does not use these results to rank models overall.

107 modelsMathCurrentDisplay onlyUpdated April 20, 2026

The published HMMT Feb 2025 snapshot is tightly clustered at the top: GLM-4.7 sits at 97.1%, while the third row is only 0.1 points behind. The broader top-10 spread is 1.1 points, so many of the published scores sit in a relatively narrow band.

107 models have been evaluated on HMMT Feb 2025. The benchmark falls in the Math category. This category carries a 5% weight in BenchLM.ai's overall scoring system. HMMT Feb 2025 is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About HMMT Feb 2025

Year

2025

Tasks

Tournament problems

Format

Competition mathematics

Difficulty

High school olympiad level

HMMT Feb 2025 represents the current pinnacle of high school mathematics competition, with problems designed to challenge the brightest mathematical minds.

BenchLM freshness & provenance

Version

HMMT Feb 2025 2025

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Tracked score table (107 models)

1
GLM-4.7glm-4-7
97.1%
2
GPT-5.4gpt-5-4
97%
3
GPT-5.2 Progpt-5-2-pro
97%
4
GPT-5.1-Codex-Maxgpt-5-1-codex-max
96%
5
GPT-5.2-Codexgpt-5-2-codex
96%
6
GPT-5.3 Codexgpt-5-3-codex
96%
7
Grok 4.1grok-4-1
96%
8
Gemini 3 Pro Deep Thinkgemini-3-pro-deep-think
96%
9
Claude Opus 4.6claude-opus-4-6
96%
10
GPT-5.1gpt-5-1
96%
11
GPT-5.2gpt-5-2
96%
12
Claude Sonnet 4.6claude-sonnet-4-6
96%
13
Gemini 3 Progemini-3-pro
96%
14
Claude Opus 4.5claude-opus-4-5
96%
15
GPT-5.3 Instantgpt-5-3-instant
96%
16
GPT-5.2 Instantgpt-5-2-instant
96%
17
Kimi K2.5 (Reasoning)kimi-k2-5-reasoning
95.4%
18
GLM-5 (Reasoning)glm-5-reasoning
95%
19
GPT-5.3-Codex-Sparkgpt-5-3-codex-spark
95%
20
Claude Sonnet 4.5claude-sonnet-4-5
94%
21
Grok 4.1 Fastgrok-4-1-fast
93%
22
GPT-5 (high)gpt-5-high
92%
23
91%
24
GPT-5 (medium)gpt-5-medium
90%
25
Qwen3.5 397B (Reasoning)qwen3-5-397b-reasoning
90%
26
87%
27
GPT-5 minigpt-5-mini
87%
28
85%
29
GLM-5glm-5
85%
30
Grok 4grok-4
85%
31
DeepSeek V3.2 (Thinking)deepseek-v3-2-thinking
84%
32
Qwen2.5-1Mqwen2-5-1m
82%
33
Step 3.5 Flashstep-3-5-flash
82%
34
Gemini 2.5 Progemini-2-5-pro
81%
35
Qwen2.5-72Bqwen2-5-72b
81%
36
DeepSeek V3.2deepseek-v3-2
81%
37
Qwen3.5 397Bqwen3-5-397b
80%
38
o4-mini (high)o4-mini-high
80%
39
DeepSeek Coder 2.0deepseek-coder-2-0
78%
40
Mercury 2mercury-2
78%
41
DeepSeekMath V2deepseekmath-v2
77%
42
DeepSeek LLM 2.0deepseek-llm-2-0
77%
43
MiMo-V2-Flashmimo-v2-flash
76%
44
Kimi K2.5kimi-k2-5
74%
45
Claude 4.1 Opusclaude-4-1-opus
73%
46
Mistral Large 3mistral-large-3
73%
47
Nemotron 3 Ultra 500Bnemotron-3-ultra-500b
71%
48
Aion-2.0aion-2-0
71%
49
Claude 4 Sonnetclaude-4-sonnet
70%
50
MiniMax M2.5minimax-m2-5
70%
51
Seed 1.6seed-1-6
69%
52
Seed-2.0-Liteseed-2-0-lite
68%
53
Ministral 3 14B (Reasoning)ministral-3-14b-reasoning
67.5%
54
Gemini 3 Flashgemini-3-flash
67%
55
Llama 3.1 405Bllama-3-1-405b
67%
56
Claude Haiku 4.5claude-haiku-4-5
65%
57
Mistral Large 2mistral-large-2
65%
58
Ministral 3 14Bministral-3-14b
65%
59
Nemotron 3 Super 120B A12Bnemotron-3-super-120b-a12b
64%
60
GPT-4ogpt-4o
63%
61
GLM-4.7-Flashglm-4-7-flash
63%
62
Nemotron 3 Super 100Bnemotron-3-super-100b
62%
63
Claude 3.5 Sonnetclaude-3-5-sonnet
62%
64
Mistral 8x7Bmistral-8x7b
62%
65
Grok Code Fast 1grok-code-fast-1
61%
66
Gemini 1.5 Progemini-1-5-pro
61%
67
Seed 1.6 Flashseed-1-6-flash
61%
68
Gemini 3.1 Flash-Litegemini-3-1-flash-lite
60%
69
Gemini 1.0 Progemini-1-0-pro
59%
70
Seed-2.0-Miniseed-2-0-mini
59%
71
Claude 3 Opusclaude-3-opus
58%
72
GPT-4 Turbogpt-4-turbo
57%
73
Llama 3 70Bllama-3-70b
55%
74
Nemotron 3 Nano 30Bnemotron-3-nano-30b
54%
75
Claude 3 Haikuclaude-3-haiku
53%
76
Nemotron-4 15Bnemotron-4-15b
51%
77
Moonshot v1moonshot-v1
50%
78
Z-1z-1
49%
79
GPT-OSS 120Bgpt-oss-120b
48%
80
Gemini 2.5 Flashgemini-2-5-flash
47%
81
Nemotron Ultra 253Bnemotron-ultra-253b
46%
82
Llama 4 Behemothllama-4-behemoth
45%
83
Llama 4 Scoutllama-4-scout
44%
84
Llama 4 Maverickllama-4-maverick
43%
85
LFM2-24B-A2Blfm2-24b-a2b
43%
86
Gemma 3 27Bgemma-3-27b
42%
87
DeepSeek-R1deepseek-r1
41%
88
Grok 3 [Beta]grok-3-beta
39%
89
Kimi K2kimi-k2
38.8%
90
Nova Pronova-pro
38%
91
Qwen3 235B 2507 (Reasoning)qwen3-235b-2507-reasoning
37%
92
Qwen3 235B 2507qwen3-235b-2507
36%
93
Claude 4.1 Opus Thinkingclaude-4-1-opus-thinking
35%
94
GLM-4.5glm-4-5
34%
95
MiniMax M1 80kminimax-m1-80k
33%
96
GLM-4.5-Airglm-4-5-air
32%
97
DeepSeek V3.1 (Reasoning)deepseek-v3-1-reasoning
31%
98
DeepSeek V3.1deepseek-v3-1
30%
99
Ministral 3 8B (Reasoning)ministral-3-8b-reasoning
30%
100
GPT-OSS 20Bgpt-oss-20b
28%
101
Mistral 7B v0.3mistral-7b-v0-3
27%
102
Ministral 3 8Bministral-3-8b
27%
103
Mistral 8x7B v0.2mistral-8x7b-v0-2
26%
104
LFM2.5-1.2B-Thinkinglfm2-5-1-2b-thinking
25%
105
Ministral 3 3B (Reasoning)ministral-3-3b-reasoning
24%
106
LFM2.5-1.2B-Instructlfm2-5-1-2b-instruct
21%
107
Ministral 3 3Bministral-3-3b
20%

FAQ

What does HMMT Feb 2025 measure?

The most recent February edition of the Harvard-MIT Mathematics Tournament, featuring the latest challenging problems in competitive mathematics.

Which model leads the published HMMT Feb 2025 snapshot?

GLM-4.7 currently leads the published HMMT Feb 2025 snapshot with a tracked score of 97.1%. BenchLM shows this benchmark for display only and does not use it in overall rankings.

How many models are evaluated on HMMT Feb 2025?

107 AI models are included in BenchLM's mirrored HMMT Feb 2025 snapshot, based on the public leaderboard captured on April 20, 2026.

Last updated: April 20, 2026 · mirrored from the public benchmark leaderboard

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.