Bulgarian Mathematical Olympiad 2025 (BRUMO 2025)

Name: Bulgarian Mathematical Olympiad 2025
Creator: BenchLM

A challenging mathematical olympiad competition featuring problems that test advanced mathematical reasoning and problem-solving skills at the olympiad level.

How BenchLM shows BRUMO 2025 right now

BenchLM is tracking BRUMO 2025 in the local dataset, but exact-source verification records for these rows are still being attached. To avoid a blank benchmark page, BenchLM shows the current tracked rows below as a display-only reference table.

These tracked rows are useful for inspection and spot-checking, but until exact-source attachments are completed they should not be treated as fully verified public benchmark rows.

106 tracked modelsLocal tracked rowsAwaiting exact-source attachmentsDisplay only

Bulgarian Mathematical Olympiad

Tracked score on BRUMO 2025 — June 2, 2026

BenchLM mirrors the published tracked score view for BRUMO 2025. GPT-5.4 leads the public snapshot at 97% , followed by GPT-5.2 Pro (97%) and GPT-5.1-Codex-Max (96%). BenchLM does not use these results to rank models overall.

GPT-5.4

OpenAI

gpt-5-4

97%

Overall —

GPT-5.2 Pro

OpenAI

gpt-5-2-pro

97%

Overall —

GPT-5.1-Codex-Max

OpenAI

gpt-5-1-codex-max

96%

Overall —

106 modelsMath25% of category scoreCurrentUpdated June 2, 2026

The published BRUMO 2025 snapshot is tightly clustered at the top: GPT-5.4 sits at 97%, while the third row is only 1.0 points behind. The broader top-10 spread is 1.0 points, so many of the published scores sit in a relatively narrow band.

106 models have been evaluated on BRUMO 2025. The benchmark falls in the Math category. This category carries a 5% weight in BenchLM.ai's overall scoring system. Within that category, BRUMO 2025 contributes 25% of the category score, so strong performance here directly affects a model's overall ranking.

About BRUMO 2025

Year

2025

Tasks

Olympiad problems

Format

Mathematical olympiad

Difficulty

Mathematical olympiad level

BRUMO represents the Bulgarian tradition of mathematical excellence, featuring problems that require deep mathematical insight and creative problem-solving approaches.

Bulgarian Mathematical Olympiad Public benchmark source

BenchLM freshness & provenance

Version

BRUMO 2025 2025

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

Current

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Tracked score table (106 models)

GPT-5.4gpt-5-4

OpenAI

97%

GPT-5.2 Progpt-5-2-pro

OpenAI

97%

GPT-5.1-Codex-Maxgpt-5-1-codex-max

OpenAI

96%

GPT-5.2-Codexgpt-5-2-codex

OpenAI

96%

GPT-5.3 Codexgpt-5-3-codex

OpenAI

96%

Grok 4.1grok-4-1

xAI

96%

Gemini 3 Pro Deep Thinkgemini-3-pro-deep-think

Google

96%

Claude Opus 4.6claude-opus-4-6

Anthropic

96%

GLM-5 (Reasoning)glm-5-reasoning

Z.AI

96%

GPT-5.1gpt-5-1

OpenAI

96%

GPT-5.2gpt-5-2

OpenAI

96%

Claude Sonnet 4.6claude-sonnet-4-6

Anthropic

96%

Claude Sonnet 4.5claude-sonnet-4-5

Anthropic

96%

Gemini 3 Progemini-3-pro

Google

96%

Claude Opus 4.5claude-opus-4-5

Anthropic

96%

GPT-5.3 Instantgpt-5-3-instant

OpenAI

96%

GPT-5.2 Instantgpt-5-2-instant

OpenAI

96%

Grok 4.1 Fastgrok-4-1-fast

xAI

95%

GPT-5.3-Codex-Sparkgpt-5-3-codex-spark

OpenAI

95%

GPT-5 (high)gpt-5-high

OpenAI

94%

o1-preview

OpenAI

93%

Kimi K2.5 (Reasoning)kimi-k2-5-reasoning

Moonshot AI

93%

GPT-5 (medium)gpt-5-medium

OpenAI

92%

Qwen3.5 397B (Reasoning)qwen3-5-397b-reasoning

Alibaba

92%

o3-pro

OpenAI

89%

GPT-5 minigpt-5-mini

OpenAI

89%

GLM-5.1glm-5-1

Z.AI

87%

OpenAI

87%

GLM-5glm-5

Z.AI

87%

Grok 4grok-4

xAI

87%

DeepSeek V3.2 (Thinking)deepseek-v3-2-thinking

DeepSeek

86%

GLM-4.7glm-4-7

Z.AI

85%

Qwen2.5-1Mqwen2-5-1m

Alibaba

84%

Step 3.5 Flashstep-3-5-flash

StepFun

84%

Gemini 2.5 Progemini-2-5-pro

Google

83%

Qwen2.5-72Bqwen2-5-72b

Alibaba

83%

DeepSeek V3.2deepseek-v3-2

DeepSeek

83%

Qwen3.5 397Bqwen3-5-397b

Alibaba

82%

o4-mini (high)o4-mini-high

OpenAI

82%

DeepSeek Coder 2.0deepseek-coder-2-0

DeepSeek

80%

Mercury 2mercury-2

Inception

80%

DeepSeekMath V2deepseekmath-v2

DeepSeek

79%

DeepSeek LLM 2.0deepseek-llm-2-0

DeepSeek

79%

MiMo-V2-Flashmimo-v2-flash

Xiaomi

78%

Kimi K2.5kimi-k2-5

Moonshot AI

76%

Claude 4.1 Opusclaude-4-1-opus

Anthropic

75%

Mistral Large 3mistral-large-3

Mistral

75%

Aion-2.0aion-2-0

Aion Labs

73%

Claude 4 Sonnetclaude-4-sonnet

Anthropic

72%

Ministral 3 14B (Reasoning)ministral-3-14b-reasoning

Mistral

72%

MiniMax M2.5minimax-m2-5

MiniMax

72%

Seed 1.6seed-1-6

ByteDance

71%

Seed-2.0-Liteseed-2-0-lite

ByteDance

70%

Gemini 3 Flashgemini-3-flash

Google

69%

Llama 3.1 405Bllama-3-1-405b

Meta

69%

Claude Haiku 4.5claude-haiku-4-5

Anthropic

67%

Mistral Large 2mistral-large-2

Mistral

67%

Ministral 3 14Bministral-3-14b

Mistral

67%

Nemotron 3 Super 120B A12Bnemotron-3-super-120b-a12b

NVIDIA

66%

GPT-4ogpt-4o

OpenAI

65%

GLM-4.7-Flashglm-4-7-flash

Z.AI

65%

Nemotron 3 Super 100Bnemotron-3-super-100b

NVIDIA

64%

Claude 3.5 Sonnetclaude-3-5-sonnet

Anthropic

64%

Mistral 8x7Bmistral-8x7b

Mistral

64%

Grok Code Fast 1grok-code-fast-1

xAI

63%

Gemini 1.5 Progemini-1-5-pro

Google

63%

Seed 1.6 Flashseed-1-6-flash

ByteDance

63%

Gemini 3.1 Flash-Litegemini-3-1-flash-lite

Google

62%

Gemini 1.0 Progemini-1-0-pro

Google

61%

Seed-2.0-Miniseed-2-0-mini

ByteDance

61%

Claude 3 Opusclaude-3-opus

Anthropic

60%

GPT-4 Turbogpt-4-turbo

OpenAI

59%

Llama 3 70Bllama-3-70b

Meta

57%

Nemotron 3 Nano 30Bnemotron-3-nano-30b

NVIDIA

56%

Claude 3 Haikuclaude-3-haiku

Anthropic

55%

Nemotron-4 15Bnemotron-4-15b

NVIDIA

53%

Moonshot v1moonshot-v1

Moonshot AI

52%

Z-1z-1

51%

GPT-OSS 120Bgpt-oss-120b

OpenAI

50%

Gemini 2.5 Flashgemini-2-5-flash

Google

49%

Nemotron Ultra 253Bnemotron-ultra-253b

NVIDIA

48%

Llama 4 Behemothllama-4-behemoth

Meta

47%

Llama 4 Scoutllama-4-scout

Meta

46%

Llama 4 Maverickllama-4-maverick

Meta

45%

LFM2-24B-A2Blfm2-24b-a2b

LiquidAI

45%

Gemma 3 27Bgemma-3-27b

Google

44%

DeepSeek-R1deepseek-r1

DeepSeek

43%

Grok 3 [Beta]grok-3-beta

xAI

41%

Nova Pronova-pro

Amazon

40%

Qwen3 235B 2507 (Reasoning)qwen3-235b-2507-reasoning

Alibaba

39%

Qwen3 235B 2507qwen3-235b-2507

Alibaba

38%

Claude 4.1 Opus Thinkingclaude-4-1-opus-thinking

Anthropic

37%

GLM-4.5glm-4-5

Z.AI

36%

MiniMax M1 80kminimax-m1-80k

MiniMax

35%

GLM-4.5-Airglm-4-5-air

Z.AI

34%

DeepSeek V3.1 (Reasoning)deepseek-v3-1-reasoning

DeepSeek

33%

DeepSeek V3.1deepseek-v3-1

DeepSeek

32%

Ministral 3 8B (Reasoning)ministral-3-8b-reasoning

Mistral

32%

GPT-OSS 20Bgpt-oss-20b

OpenAI

30%

100

Mistral 7B v0.3mistral-7b-v0-3

Mistral

29%

101

Ministral 3 8Bministral-3-8b

Mistral

29%

102

Mistral 8x7B v0.2mistral-8x7b-v0-2

Mistral

28%

103

LFM2.5-1.2B-Thinkinglfm2-5-1-2b-thinking

LiquidAI

27%

104

Ministral 3 3B (Reasoning)ministral-3-3b-reasoning

Mistral

26%

105

LFM2.5-1.2B-Instructlfm2-5-1-2b-instruct

LiquidAI

23%

106

Ministral 3 3Bministral-3-3b

Mistral

22%

FAQ

What does BRUMO 2025 measure?

A challenging mathematical olympiad competition featuring problems that test advanced mathematical reasoning and problem-solving skills at the olympiad level.

Which model leads the published BRUMO 2025 snapshot?

GPT-5.4 currently leads the published BRUMO 2025 snapshot with 97% tracked score. BenchLM shows this benchmark for display only and does not use it in overall rankings.

How many models are evaluated on BRUMO 2025?

106 AI models are included in BenchLM's mirrored BRUMO 2025 snapshot, based on the public leaderboard captured on June 2, 2026.

Last updated: June 2, 2026 · mirrored from the public benchmark leaderboard

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.