GPT-5.1-Codex-Max vs Grok 3 [Beta]

Side-by-side benchmark comparison across knowledge, coding, math, and reasoning.

Quick Verdict

GPT-5.1-Codex-Max wins overall with a score of 77 vs 33 (44 point difference).GPT-5.1-Codex-Max wins 4 out of 4 categories.

Knowledge

GPT-5.1-Codex-Max

GPT-5.1-Codex-Max

95

Grok 3 [Beta]

39.8

98
MMLU
42
96
GPQA
41
94
SuperGPQA
39
92
OpenBookQA
37

Coding

GPT-5.1-Codex-Max

GPT-5.1-Codex-Max

94

Grok 3 [Beta]

34

94
HumanEval
34

Mathematics

GPT-5.1-Codex-Max

GPT-5.1-Codex-Max

97.1

Grok 3 [Beta]

41

99
AIME 2023
42
99
AIME 2024
44
98
AIME 2025
43
95
HMMT Feb 2023
38
97
HMMT Feb 2024
40
96
HMMT Feb 2025
39
96
BRUMO 2025
41

Reasoning

GPT-5.1-Codex-Max

GPT-5.1-Codex-Max

93

Grok 3 [Beta]

39

94
SimpleQA
40
92
MuSR
38

Frequently Asked Questions

Which is better, GPT-5.1-Codex-Max or Grok 3 [Beta]?

GPT-5.1-Codex-Max scores higher overall with 77 vs 33, a difference of 44 points across all benchmarks.

Which is better for knowledge tasks, GPT-5.1-Codex-Max or Grok 3 [Beta]?

GPT-5.1-Codex-Max leads in knowledge tasks with an average score of 95 vs 39.8.

Which is better for coding, GPT-5.1-Codex-Max or Grok 3 [Beta]?

GPT-5.1-Codex-Max leads in coding with an average score of 94 vs 34.

Which is better for math, GPT-5.1-Codex-Max or Grok 3 [Beta]?

GPT-5.1-Codex-Max leads in math with an average score of 97.1 vs 41.

Which is better for reasoning, GPT-5.1-Codex-Max or Grok 3 [Beta]?

GPT-5.1-Codex-Max leads in reasoning with an average score of 93 vs 39.