Side-by-side benchmark comparison across knowledge, coding, math, and reasoning.
Both models are tied with an overall score of 67.
GLM-4.7
83
o3
85.3
GLM-4.7
78
o3
78
GLM-4.7
85
o3
87
GLM-4.7
81
o3
83
GLM-4.7 and o3 are tied with identical overall scores of 67.
o3 leads in knowledge tasks with an average score of 85.3 vs 83.
GLM-4.7 and o3 are tied for coding with average scores of 78.
o3 leads in math with an average score of 87 vs 85.
o3 leads in reasoning with an average score of 83 vs 81.