Side-by-side benchmark comparison across knowledge, coding, math, and reasoning.
Both models are tied with an overall score of 65.
o4-mini (high)
80.5
Qwen3.5 397B
80.8
o4-mini (high)
74
Qwen3.5 397B
75
o4-mini (high)
82
Qwen3.5 397B
82
o4-mini (high)
79
Qwen3.5 397B
79
o4-mini (high) and Qwen3.5 397B are tied with identical overall scores of 65.
Qwen3.5 397B leads in knowledge tasks with an average score of 80.8 vs 80.5.
Qwen3.5 397B leads in coding with an average score of 75 vs 74.
o4-mini (high) and Qwen3.5 397B are tied for math with average scores of 82.
o4-mini (high) and Qwen3.5 397B are tied for reasoning with average scores of 79.