Llama 4 Maverick Benchmark Scores & Performance

Benchmark analysis of Llama 4 Maverick by Meta across 14 tests.

Creator

Meta

Source Type

Open Weight

Reasoning

Non-Reasoning

Context Window

32K

Overall Score

37#71 of 88

Knowledge Benchmarks

MMLU
46
GPQA
45
SuperGPQA
43
OpenBookQA
41

Coding Benchmarks

HumanEval
38

Mathematics Benchmarks

AIME 2023
46
AIME 2024
48
AIME 2025
47
HMMT Feb 2023
42
HMMT Feb 2024
44
HMMT Feb 2025
43
BRUMO 2025
45

Reasoning Benchmarks

SimpleQA
44
MuSR
42

Frequently Asked Questions

How does Llama 4 Maverick perform overall in AI benchmarks?

Llama 4 Maverick ranks #71 out of 88 models with an overall score of 37. It is created by Meta and features a 32K context window.

Is Llama 4 Maverick good for knowledge and understanding?

Llama 4 Maverick ranks #71 out of 88 models in knowledge and understanding benchmarks with an average score of 43.8. There are stronger options in this category.

Is Llama 4 Maverick good for coding and programming?

Llama 4 Maverick ranks #71 out of 88 models in coding and programming benchmarks with an average score of 38. There are stronger options in this category.

Is Llama 4 Maverick good for mathematics?

Llama 4 Maverick ranks #71 out of 88 models in mathematics benchmarks with an average score of 45. There are stronger options in this category.

Is Llama 4 Maverick good for reasoning and logic?

Llama 4 Maverick ranks #71 out of 88 models in reasoning and logic benchmarks with an average score of 43. There are stronger options in this category.

Is Llama 4 Maverick open source?

Yes, Llama 4 Maverick is an open weight model created by Meta, meaning it can be downloaded and run locally or fine-tuned for specific use cases.

What is the context window size of Llama 4 Maverick?

Llama 4 Maverick has a context window of 32K tokens, which determines how much text it can process in a single interaction.