Side-by-side benchmark comparison across knowledge, coding, math, and reasoning.
Both models are tied with an overall score of 65.
o4-mini (high)
80.5
Qwen2.5-72B
80.8
o4-mini (high)
74
Qwen2.5-72B
75
o4-mini (high)
82
Qwen2.5-72B
83
o4-mini (high)
79
Qwen2.5-72B
79
o4-mini (high) and Qwen2.5-72B are tied with identical overall scores of 65.
Qwen2.5-72B leads in knowledge tasks with an average score of 80.8 vs 80.5.
Qwen2.5-72B leads in coding with an average score of 75 vs 74.
Qwen2.5-72B leads in math with an average score of 83 vs 82.
o4-mini (high) and Qwen2.5-72B are tied for reasoning with average scores of 79.