GLM-4.5-Air vs GPT-OSS 20B

Side-by-side benchmark comparison across knowledge, coding, math, and reasoning.

Quick Verdict

GLM-4.5-Air wins overall with a score of 26 vs 22 (4 point difference).GLM-4.5-Air wins 4 out of 4 categories.

Knowledge

GLM-4.5-Air

GLM-4.5-Air

32.8

GPT-OSS 20B

28.8

35
MMLU
31
34
GPQA
30
32
SuperGPQA
28
30
OpenBookQA
26

Coding

GLM-4.5-Air

GLM-4.5-Air

27

GPT-OSS 20B

23

27
HumanEval
23

Mathematics

GLM-4.5-Air

GLM-4.5-Air

34

GPT-OSS 20B

30

35
AIME 2023
31
37
AIME 2024
33
36
AIME 2025
32
31
HMMT Feb 2023
27
33
HMMT Feb 2024
29
32
HMMT Feb 2025
28
34
BRUMO 2025
30

Reasoning

GLM-4.5-Air

GLM-4.5-Air

32

GPT-OSS 20B

28

33
SimpleQA
29
31
MuSR
27

Frequently Asked Questions

Which is better, GLM-4.5-Air or GPT-OSS 20B?

GLM-4.5-Air scores higher overall with 26 vs 22, a difference of 4 points across all benchmarks.

Which is better for knowledge tasks, GLM-4.5-Air or GPT-OSS 20B?

GLM-4.5-Air leads in knowledge tasks with an average score of 32.8 vs 28.8.

Which is better for coding, GLM-4.5-Air or GPT-OSS 20B?

GLM-4.5-Air leads in coding with an average score of 27 vs 23.

Which is better for math, GLM-4.5-Air or GPT-OSS 20B?

GLM-4.5-Air leads in math with an average score of 34 vs 30.

Which is better for reasoning, GLM-4.5-Air or GPT-OSS 20B?

GLM-4.5-Air leads in reasoning with an average score of 32 vs 28.