Llama 4 Scout Benchmark Scores & Performance

Benchmark analysis of Llama 4 Scout by Meta across 14 tests.

Creator

Meta

Source Type

Open Weight

Reasoning

Non-Reasoning

Context Window

32K

Overall Score

38#70 of 88

Knowledge Benchmarks

MMLU
47
GPQA
46
SuperGPQA
44
OpenBookQA
42

Coding Benchmarks

HumanEval
39

Mathematics Benchmarks

AIME 2023
47
AIME 2024
49
AIME 2025
48
HMMT Feb 2023
43
HMMT Feb 2024
45
HMMT Feb 2025
44
BRUMO 2025
46

Reasoning Benchmarks

SimpleQA
45
MuSR
43

Frequently Asked Questions

How does Llama 4 Scout perform overall in AI benchmarks?

Llama 4 Scout ranks #70 out of 88 models with an overall score of 38. It is created by Meta and features a 32K context window.

Is Llama 4 Scout good for knowledge and understanding?

Llama 4 Scout ranks #70 out of 88 models in knowledge and understanding benchmarks with an average score of 44.8. There are stronger options in this category.

Is Llama 4 Scout good for coding and programming?

Llama 4 Scout ranks #70 out of 88 models in coding and programming benchmarks with an average score of 39. There are stronger options in this category.

Is Llama 4 Scout good for mathematics?

Llama 4 Scout ranks #70 out of 88 models in mathematics benchmarks with an average score of 46. There are stronger options in this category.

Is Llama 4 Scout good for reasoning and logic?

Llama 4 Scout ranks #70 out of 88 models in reasoning and logic benchmarks with an average score of 44. There are stronger options in this category.

Is Llama 4 Scout open source?

Yes, Llama 4 Scout is an open weight model created by Meta, meaning it can be downloaded and run locally or fine-tuned for specific use cases.

What is the context window size of Llama 4 Scout?

Llama 4 Scout has a context window of 32K tokens, which determines how much text it can process in a single interaction.