Llama 3 70B vs Mistral Large 3

Side-by-side benchmark comparison across knowledge, coding, math, and reasoning.

Quick Verdict

Mistral Large 3 wins overall with a score of 61 vs 48 (13 point difference).Mistral Large 3 wins 4 out of 4 categories.

Knowledge

Mistral Large 3

Llama 3 70B

56.5

Mistral Large 3

73.8

58
MMLU
76
58
GPQA
75
56
SuperGPQA
73
54
OpenBookQA
71

Coding

Mistral Large 3

Llama 3 70B

50

Mistral Large 3

68

50
HumanEval
68

Mathematics

Mistral Large 3

Llama 3 70B

57

Mistral Large 3

75

58
AIME 2023
76
60
AIME 2024
78
59
AIME 2025
77
54
HMMT Feb 2023
72
56
HMMT Feb 2024
74
55
HMMT Feb 2025
73
57
BRUMO 2025
75

Reasoning

Mistral Large 3

Llama 3 70B

55

Mistral Large 3

72

56
SimpleQA
73
54
MuSR
71

Frequently Asked Questions

Which is better, Llama 3 70B or Mistral Large 3?

Mistral Large 3 scores higher overall with 61 vs 48, a difference of 13 points across all benchmarks.

Which is better for knowledge tasks, Llama 3 70B or Mistral Large 3?

Mistral Large 3 leads in knowledge tasks with an average score of 73.8 vs 56.5.

Which is better for coding, Llama 3 70B or Mistral Large 3?

Mistral Large 3 leads in coding with an average score of 68 vs 50.

Which is better for math, Llama 3 70B or Mistral Large 3?

Mistral Large 3 leads in math with an average score of 75 vs 57.

Which is better for reasoning, Llama 3 70B or Mistral Large 3?

Mistral Large 3 leads in reasoning with an average score of 72 vs 55.