GPT-5.3 Codex vs Mistral Large 3

Side-by-side benchmark comparison across knowledge, coding, math, and reasoning.

Quick Verdict

GPT-5.3 Codex wins overall with a score of 85 vs 61 (24 point difference).GPT-5.3 Codex wins 4 out of 4 categories.

Knowledge

GPT-5.3 Codex

GPT-5.3 Codex

96

Mistral Large 3

73.8

99
MMLU
76
97
GPQA
75
95
SuperGPQA
73
93
OpenBookQA
71

Coding

GPT-5.3 Codex

GPT-5.3 Codex

95

Mistral Large 3

68

95
HumanEval
68

Mathematics

GPT-5.3 Codex

GPT-5.3 Codex

97.1

Mistral Large 3

75

99
AIME 2023
76
99
AIME 2024
78
98
AIME 2025
77
95
HMMT Feb 2023
72
97
HMMT Feb 2024
74
96
HMMT Feb 2025
73
96
BRUMO 2025
75

Reasoning

GPT-5.3 Codex

GPT-5.3 Codex

94

Mistral Large 3

72

95
SimpleQA
73
93
MuSR
71

Frequently Asked Questions

Which is better, GPT-5.3 Codex or Mistral Large 3?

GPT-5.3 Codex scores higher overall with 85 vs 61, a difference of 24 points across all benchmarks.

Which is better for knowledge tasks, GPT-5.3 Codex or Mistral Large 3?

GPT-5.3 Codex leads in knowledge tasks with an average score of 96 vs 73.8.

Which is better for coding, GPT-5.3 Codex or Mistral Large 3?

GPT-5.3 Codex leads in coding with an average score of 95 vs 68.

Which is better for math, GPT-5.3 Codex or Mistral Large 3?

GPT-5.3 Codex leads in math with an average score of 97.1 vs 75.

Which is better for reasoning, GPT-5.3 Codex or Mistral Large 3?

GPT-5.3 Codex leads in reasoning with an average score of 94 vs 72.