GPT-5 mini vs Llama 4 Behemoth

Side-by-side benchmark comparison across knowledge, coding, math, and reasoning.

Quick Verdict

GPT-5 mini wins overall with a score of 68 vs 39 (29 point difference).GPT-5 mini wins 4 out of 4 categories.

Knowledge

GPT-5 mini

GPT-5 mini

85

Llama 4 Behemoth

45.8

88
MMLU
48
86
GPQA
47
84
SuperGPQA
45
82
OpenBookQA
43

Coding

GPT-5 mini

GPT-5 mini

80

Llama 4 Behemoth

40

80
HumanEval
40

Mathematics

GPT-5 mini

GPT-5 mini

89

Llama 4 Behemoth

47

90
AIME 2023
48
92
AIME 2024
50
91
AIME 2025
49
86
HMMT Feb 2023
44
88
HMMT Feb 2024
46
87
HMMT Feb 2025
45
89
BRUMO 2025
47

Reasoning

GPT-5 mini

GPT-5 mini

83

Llama 4 Behemoth

45

84
SimpleQA
46
82
MuSR
44

Frequently Asked Questions

Which is better, GPT-5 mini or Llama 4 Behemoth?

GPT-5 mini scores higher overall with 68 vs 39, a difference of 29 points across all benchmarks.

Which is better for knowledge tasks, GPT-5 mini or Llama 4 Behemoth?

GPT-5 mini leads in knowledge tasks with an average score of 85 vs 45.8.

Which is better for coding, GPT-5 mini or Llama 4 Behemoth?

GPT-5 mini leads in coding with an average score of 80 vs 40.

Which is better for math, GPT-5 mini or Llama 4 Behemoth?

GPT-5 mini leads in math with an average score of 89 vs 47.

Which is better for reasoning, GPT-5 mini or Llama 4 Behemoth?

GPT-5 mini leads in reasoning with an average score of 83 vs 45.