Skip to main content

Bulgarian Mathematical Olympiad 2025 (BRUMO 2025)

A challenging mathematical olympiad competition featuring problems that test advanced mathematical reasoning and problem-solving skills at the olympiad level.

How BenchLM shows BRUMO 2025 right now

BenchLM is tracking BRUMO 2025 in the local dataset, but exact-source verification records for these rows are still being attached. To avoid a blank benchmark page, BenchLM shows the current tracked rows below as a display-only reference table.

These tracked rows are useful for inspection and spot-checking, but until exact-source attachments are completed they should not be treated as fully verified public benchmark rows.

107 tracked modelsLocal tracked rowsAwaiting exact-source attachmentsDisplay only

Tracked score on BRUMO 2025 — April 20, 2026

BenchLM mirrors the published tracked score view for BRUMO 2025. GPT-5.4 leads the public snapshot at 97% , followed by GPT-5.2 Pro (97%) and GPT-5.1-Codex-Max (96%). BenchLM does not use these results to rank models overall.

107 modelsMath25% of category scoreCurrentUpdated April 20, 2026

The published BRUMO 2025 snapshot is tightly clustered at the top: GPT-5.4 sits at 97%, while the third row is only 1.0 points behind. The broader top-10 spread is 1.0 points, so many of the published scores sit in a relatively narrow band.

107 models have been evaluated on BRUMO 2025. The benchmark falls in the Math category. This category carries a 5% weight in BenchLM.ai's overall scoring system. Within that category, BRUMO 2025 contributes 25% of the category score, so strong performance here directly affects a model's overall ranking.

About BRUMO 2025

Year

2025

Tasks

Olympiad problems

Format

Mathematical olympiad

Difficulty

Mathematical olympiad level

BRUMO represents the Bulgarian tradition of mathematical excellence, featuring problems that require deep mathematical insight and creative problem-solving approaches.

BenchLM freshness & provenance

Version

BRUMO 2025 2025

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

Current

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Tracked score table (107 models)

1
GPT-5.4gpt-5-4
97%
2
GPT-5.2 Progpt-5-2-pro
97%
3
GPT-5.1-Codex-Maxgpt-5-1-codex-max
96%
4
GPT-5.2-Codexgpt-5-2-codex
96%
5
GPT-5.3 Codexgpt-5-3-codex
96%
6
Grok 4.1grok-4-1
96%
7
Gemini 3 Pro Deep Thinkgemini-3-pro-deep-think
96%
8
Claude Opus 4.6claude-opus-4-6
96%
9
GLM-5 (Reasoning)glm-5-reasoning
96%
10
GPT-5.1gpt-5-1
96%
11
GPT-5.2gpt-5-2
96%
12
Claude Sonnet 4.6claude-sonnet-4-6
96%
13
Claude Sonnet 4.5claude-sonnet-4-5
96%
14
Gemini 3 Progemini-3-pro
96%
15
Claude Opus 4.5claude-opus-4-5
96%
16
GPT-5.3 Instantgpt-5-3-instant
96%
17
GPT-5.2 Instantgpt-5-2-instant
96%
18
Grok 4.1 Fastgrok-4-1-fast
95%
19
GPT-5.3-Codex-Sparkgpt-5-3-codex-spark
95%
20
GPT-5 (high)gpt-5-high
94%
21
93%
22
Kimi K2.5 (Reasoning)kimi-k2-5-reasoning
93%
23
GPT-5 (medium)gpt-5-medium
92%
24
Qwen3.5 397B (Reasoning)qwen3-5-397b-reasoning
92%
25
89%
26
GPT-5 minigpt-5-mini
89%
27
GLM-5.1glm-5-1
87%
28
87%
29
GLM-5glm-5
87%
30
Grok 4grok-4
87%
31
DeepSeek V3.2 (Thinking)deepseek-v3-2-thinking
86%
32
GLM-4.7glm-4-7
85%
33
Qwen2.5-1Mqwen2-5-1m
84%
34
Step 3.5 Flashstep-3-5-flash
84%
35
Gemini 2.5 Progemini-2-5-pro
83%
36
Qwen2.5-72Bqwen2-5-72b
83%
37
DeepSeek V3.2deepseek-v3-2
83%
38
Qwen3.5 397Bqwen3-5-397b
82%
39
o4-mini (high)o4-mini-high
82%
40
DeepSeek Coder 2.0deepseek-coder-2-0
80%
41
Mercury 2mercury-2
80%
42
DeepSeekMath V2deepseekmath-v2
79%
43
DeepSeek LLM 2.0deepseek-llm-2-0
79%
44
MiMo-V2-Flashmimo-v2-flash
78%
45
Kimi K2.5kimi-k2-5
76%
46
Claude 4.1 Opusclaude-4-1-opus
75%
47
Mistral Large 3mistral-large-3
75%
48
Nemotron 3 Ultra 500Bnemotron-3-ultra-500b
73%
49
Aion-2.0aion-2-0
73%
50
Claude 4 Sonnetclaude-4-sonnet
72%
51
Ministral 3 14B (Reasoning)ministral-3-14b-reasoning
72%
52
MiniMax M2.5minimax-m2-5
72%
53
Seed 1.6seed-1-6
71%
54
Seed-2.0-Liteseed-2-0-lite
70%
55
Gemini 3 Flashgemini-3-flash
69%
56
Llama 3.1 405Bllama-3-1-405b
69%
57
Claude Haiku 4.5claude-haiku-4-5
67%
58
Mistral Large 2mistral-large-2
67%
59
Ministral 3 14Bministral-3-14b
67%
60
Nemotron 3 Super 120B A12Bnemotron-3-super-120b-a12b
66%
61
GPT-4ogpt-4o
65%
62
GLM-4.7-Flashglm-4-7-flash
65%
63
Nemotron 3 Super 100Bnemotron-3-super-100b
64%
64
Claude 3.5 Sonnetclaude-3-5-sonnet
64%
65
Mistral 8x7Bmistral-8x7b
64%
66
Grok Code Fast 1grok-code-fast-1
63%
67
Gemini 1.5 Progemini-1-5-pro
63%
68
Seed 1.6 Flashseed-1-6-flash
63%
69
Gemini 3.1 Flash-Litegemini-3-1-flash-lite
62%
70
Gemini 1.0 Progemini-1-0-pro
61%
71
Seed-2.0-Miniseed-2-0-mini
61%
72
Claude 3 Opusclaude-3-opus
60%
73
GPT-4 Turbogpt-4-turbo
59%
74
Llama 3 70Bllama-3-70b
57%
75
Nemotron 3 Nano 30Bnemotron-3-nano-30b
56%
76
Claude 3 Haikuclaude-3-haiku
55%
77
Nemotron-4 15Bnemotron-4-15b
53%
78
Moonshot v1moonshot-v1
52%
79
Z-1z-1
51%
80
GPT-OSS 120Bgpt-oss-120b
50%
81
Gemini 2.5 Flashgemini-2-5-flash
49%
82
Nemotron Ultra 253Bnemotron-ultra-253b
48%
83
Llama 4 Behemothllama-4-behemoth
47%
84
Llama 4 Scoutllama-4-scout
46%
85
Llama 4 Maverickllama-4-maverick
45%
86
LFM2-24B-A2Blfm2-24b-a2b
45%
87
Gemma 3 27Bgemma-3-27b
44%
88
DeepSeek-R1deepseek-r1
43%
89
Grok 3 [Beta]grok-3-beta
41%
90
Nova Pronova-pro
40%
91
Qwen3 235B 2507 (Reasoning)qwen3-235b-2507-reasoning
39%
92
Qwen3 235B 2507qwen3-235b-2507
38%
93
Claude 4.1 Opus Thinkingclaude-4-1-opus-thinking
37%
94
GLM-4.5glm-4-5
36%
95
MiniMax M1 80kminimax-m1-80k
35%
96
GLM-4.5-Airglm-4-5-air
34%
97
DeepSeek V3.1 (Reasoning)deepseek-v3-1-reasoning
33%
98
DeepSeek V3.1deepseek-v3-1
32%
99
Ministral 3 8B (Reasoning)ministral-3-8b-reasoning
32%
100
GPT-OSS 20Bgpt-oss-20b
30%
101
Mistral 7B v0.3mistral-7b-v0-3
29%
102
Ministral 3 8Bministral-3-8b
29%
103
Mistral 8x7B v0.2mistral-8x7b-v0-2
28%
104
LFM2.5-1.2B-Thinkinglfm2-5-1-2b-thinking
27%
105
Ministral 3 3B (Reasoning)ministral-3-3b-reasoning
26%
106
LFM2.5-1.2B-Instructlfm2-5-1-2b-instruct
23%
107
Ministral 3 3Bministral-3-3b
22%

FAQ

What does BRUMO 2025 measure?

A challenging mathematical olympiad competition featuring problems that test advanced mathematical reasoning and problem-solving skills at the olympiad level.

Which model leads the published BRUMO 2025 snapshot?

GPT-5.4 currently leads the published BRUMO 2025 snapshot with a tracked score of 97%. BenchLM shows this benchmark for display only and does not use it in overall rankings.

How many models are evaluated on BRUMO 2025?

107 AI models are included in BenchLM's mirrored BRUMO 2025 snapshot, based on the public leaderboard captured on April 20, 2026.

Last updated: April 20, 2026 · mirrored from the public benchmark leaderboard

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.