o4-mini (high) vs Qwen3.5 397B

Side-by-side benchmark comparison across knowledge, coding, math, and reasoning.

Quick Verdict

Both models are tied with an overall score of 65.

Knowledge

Qwen3.5 397B

o4-mini (high)

80.5

Qwen3.5 397B

80.8

82
MMLU
83
82
GPQA
82
80
SuperGPQA
80
78
OpenBookQA
78

Coding

Qwen3.5 397B

o4-mini (high)

74

Qwen3.5 397B

75

74
HumanEval
75

Mathematics

Tie

o4-mini (high)

82

Qwen3.5 397B

82

83
AIME 2023
83
85
AIME 2024
85
84
AIME 2025
84
79
HMMT Feb 2023
79
81
HMMT Feb 2024
81
80
HMMT Feb 2025
80
82
BRUMO 2025
82

Reasoning

Tie

o4-mini (high)

79

Qwen3.5 397B

79

80
SimpleQA
80
78
MuSR
78

Frequently Asked Questions

Which is better, o4-mini (high) or Qwen3.5 397B?

o4-mini (high) and Qwen3.5 397B are tied with identical overall scores of 65.

Which is better for knowledge tasks, o4-mini (high) or Qwen3.5 397B?

Qwen3.5 397B leads in knowledge tasks with an average score of 80.8 vs 80.5.

Which is better for coding, o4-mini (high) or Qwen3.5 397B?

Qwen3.5 397B leads in coding with an average score of 75 vs 74.

Which is better for math, o4-mini (high) or Qwen3.5 397B?

o4-mini (high) and Qwen3.5 397B are tied for math with average scores of 82.

Which is better for reasoning, o4-mini (high) or Qwen3.5 397B?

o4-mini (high) and Qwen3.5 397B are tied for reasoning with average scores of 79.