Side-by-side benchmark comparison across knowledge, coding, math, and reasoning.
Both models are tied with an overall score of 69.
DeepSeek V3.2 (Thinking)
84
Grok 4
84.8
DeepSeek V3.2 (Thinking)
79
Grok 4
79
DeepSeek V3.2 (Thinking)
86
Grok 4
86.6
DeepSeek V3.2 (Thinking)
82
Grok 4
82
DeepSeek V3.2 (Thinking) and Grok 4 are tied with identical overall scores of 69.
Grok 4 leads in knowledge tasks with an average score of 84.8 vs 84.
DeepSeek V3.2 (Thinking) and Grok 4 are tied for coding with average scores of 79.
Grok 4 leads in math with an average score of 86.6 vs 86.
DeepSeek V3.2 (Thinking) and Grok 4 are tied for reasoning with average scores of 82.