DeepSeek V3.1 vs GLM-4.5-Air

Side-by-side benchmark comparison across knowledge, coding, math, and reasoning.

Quick Verdict

GLM-4.5-Air wins overall with a score of 26 vs 24 (2 point difference).GLM-4.5-Air wins 4 out of 4 categories.

Knowledge

GLM-4.5-Air

DeepSeek V3.1

30.8

GLM-4.5-Air

32.8

33
MMLU
35
32
GPQA
34
30
SuperGPQA
32
28
OpenBookQA
30

Coding

GLM-4.5-Air

DeepSeek V3.1

25

GLM-4.5-Air

27

25
HumanEval
27

Mathematics

GLM-4.5-Air

DeepSeek V3.1

32

GLM-4.5-Air

34

33
AIME 2023
35
35
AIME 2024
37
34
AIME 2025
36
29
HMMT Feb 2023
31
31
HMMT Feb 2024
33
30
HMMT Feb 2025
32
32
BRUMO 2025
34

Reasoning

GLM-4.5-Air

DeepSeek V3.1

30

GLM-4.5-Air

32

31
SimpleQA
33
29
MuSR
31

Frequently Asked Questions

Which is better, DeepSeek V3.1 or GLM-4.5-Air?

GLM-4.5-Air scores higher overall with 26 vs 24, a difference of 2 points across all benchmarks.

Which is better for knowledge tasks, DeepSeek V3.1 or GLM-4.5-Air?

GLM-4.5-Air leads in knowledge tasks with an average score of 32.8 vs 30.8.

Which is better for coding, DeepSeek V3.1 or GLM-4.5-Air?

GLM-4.5-Air leads in coding with an average score of 27 vs 25.

Which is better for math, DeepSeek V3.1 or GLM-4.5-Air?

GLM-4.5-Air leads in math with an average score of 34 vs 32.

Which is better for reasoning, DeepSeek V3.1 or GLM-4.5-Air?

GLM-4.5-Air leads in reasoning with an average score of 32 vs 30.