Llama 3.1 405B Benchmark Scores & Performance

Benchmark analysis of Llama 3.1 405B by Meta across 14 tests.

Creator

Meta

Source Type

Open Weight

Reasoning

Non-Reasoning

Context Window

128K

Overall Score

58#45 of 88

Knowledge Benchmarks

MMLU
70
GPQA
70
SuperGPQA
68
OpenBookQA
66

Coding Benchmarks

HumanEval
62

Mathematics Benchmarks

AIME 2023
70
AIME 2024
72
AIME 2025
71
HMMT Feb 2023
66
HMMT Feb 2024
68
HMMT Feb 2025
67
BRUMO 2025
69

Reasoning Benchmarks

SimpleQA
68
MuSR
66

Frequently Asked Questions

How does Llama 3.1 405B perform overall in AI benchmarks?

Llama 3.1 405B ranks #45 out of 88 models with an overall score of 58. It is created by Meta and features a 128K context window.

Is Llama 3.1 405B good for knowledge and understanding?

Llama 3.1 405B ranks #45 out of 88 models in knowledge and understanding benchmarks with an average score of 68.5. There are stronger options in this category.

Is Llama 3.1 405B good for coding and programming?

Llama 3.1 405B ranks #45 out of 88 models in coding and programming benchmarks with an average score of 62. There are stronger options in this category.

Is Llama 3.1 405B good for mathematics?

Llama 3.1 405B ranks #45 out of 88 models in mathematics benchmarks with an average score of 69. There are stronger options in this category.

Is Llama 3.1 405B good for reasoning and logic?

Llama 3.1 405B ranks #45 out of 88 models in reasoning and logic benchmarks with an average score of 67. There are stronger options in this category.

Is Llama 3.1 405B open source?

Yes, Llama 3.1 405B is an open weight model created by Meta, meaning it can be downloaded and run locally or fine-tuned for specific use cases.

What is the context window size of Llama 3.1 405B?

Llama 3.1 405B has a context window of 128K tokens, which determines how much text it can process in a single interaction.