GPT-5 mini vs Llama 3.1 405B

Side-by-side benchmark comparison across knowledge, coding, math, and reasoning.

Quick Verdict

GPT-5 mini wins overall with a score of 68 vs 58 (10 point difference).GPT-5 mini wins 4 out of 4 categories.

Knowledge

GPT-5 mini

GPT-5 mini

85

Llama 3.1 405B

68.5

88
MMLU
70
86
GPQA
70
84
SuperGPQA
68
82
OpenBookQA
66

Coding

GPT-5 mini

GPT-5 mini

80

Llama 3.1 405B

62

80
HumanEval
62

Mathematics

GPT-5 mini

GPT-5 mini

89

Llama 3.1 405B

69

90
AIME 2023
70
92
AIME 2024
72
91
AIME 2025
71
86
HMMT Feb 2023
66
88
HMMT Feb 2024
68
87
HMMT Feb 2025
67
89
BRUMO 2025
69

Reasoning

GPT-5 mini

GPT-5 mini

83

Llama 3.1 405B

67

84
SimpleQA
68
82
MuSR
66

Frequently Asked Questions

Which is better, GPT-5 mini or Llama 3.1 405B?

GPT-5 mini scores higher overall with 68 vs 58, a difference of 10 points across all benchmarks.

Which is better for knowledge tasks, GPT-5 mini or Llama 3.1 405B?

GPT-5 mini leads in knowledge tasks with an average score of 85 vs 68.5.

Which is better for coding, GPT-5 mini or Llama 3.1 405B?

GPT-5 mini leads in coding with an average score of 80 vs 62.

Which is better for math, GPT-5 mini or Llama 3.1 405B?

GPT-5 mini leads in math with an average score of 89 vs 69.

Which is better for reasoning, GPT-5 mini or Llama 3.1 405B?

GPT-5 mini leads in reasoning with an average score of 83 vs 67.