GPT-5 (medium) vs Qwen3.5 397B (Reasoning)

Side-by-side benchmark comparison across knowledge, coding, math, and reasoning.

Quick Verdict

Both models are tied with an overall score of 70.

Knowledge

Tie

GPT-5 (medium)

88

Qwen3.5 397B (Reasoning)

88

91
MMLU
91
89
GPQA
89
87
SuperGPQA
87
85
OpenBookQA
85

Coding

Tie

GPT-5 (medium)

83

Qwen3.5 397B (Reasoning)

83

83
HumanEval
83

Mathematics

Tie

GPT-5 (medium)

92

Qwen3.5 397B (Reasoning)

92

93
AIME 2023
93
95
AIME 2024
95
94
AIME 2025
94
89
HMMT Feb 2023
89
91
HMMT Feb 2024
91
90
HMMT Feb 2025
90
92
BRUMO 2025
92

Reasoning

Tie

GPT-5 (medium)

86

Qwen3.5 397B (Reasoning)

86

87
SimpleQA
87
85
MuSR
85

Frequently Asked Questions

Which is better, GPT-5 (medium) or Qwen3.5 397B (Reasoning)?

GPT-5 (medium) and Qwen3.5 397B (Reasoning) are tied with identical overall scores of 70.

Which is better for knowledge tasks, GPT-5 (medium) or Qwen3.5 397B (Reasoning)?

GPT-5 (medium) and Qwen3.5 397B (Reasoning) are tied for knowledge tasks with average scores of 88.

Which is better for coding, GPT-5 (medium) or Qwen3.5 397B (Reasoning)?

GPT-5 (medium) and Qwen3.5 397B (Reasoning) are tied for coding with average scores of 83.

Which is better for math, GPT-5 (medium) or Qwen3.5 397B (Reasoning)?

GPT-5 (medium) and Qwen3.5 397B (Reasoning) are tied for math with average scores of 92.

Which is better for reasoning, GPT-5 (medium) or Qwen3.5 397B (Reasoning)?

GPT-5 (medium) and Qwen3.5 397B (Reasoning) are tied for reasoning with average scores of 86.