Claude 4 Sonnet Benchmark Scores & Performance

Benchmark analysis of Claude 4 Sonnet by Anthropic across 14 tests.

Creator

Anthropic

Source Type

Proprietary

Reasoning

Non-Reasoning

Context Window

200K

Overall Score

59#43 of 88

Knowledge Benchmarks

MMLU
73
GPQA
73
SuperGPQA
71
OpenBookQA
69

Coding Benchmarks

HumanEval
65

Mathematics Benchmarks

AIME 2023
73
AIME 2024
75
AIME 2025
74
HMMT Feb 2023
69
HMMT Feb 2024
71
HMMT Feb 2025
70
BRUMO 2025
72

Reasoning Benchmarks

SimpleQA
71
MuSR
69

Frequently Asked Questions

How does Claude 4 Sonnet perform overall in AI benchmarks?

Claude 4 Sonnet ranks #43 out of 88 models with an overall score of 59. It is created by Anthropic and features a 200K context window.

Is Claude 4 Sonnet good for knowledge and understanding?

Claude 4 Sonnet ranks #43 out of 88 models in knowledge and understanding benchmarks with an average score of 71.5. There are stronger options in this category.

Is Claude 4 Sonnet good for coding and programming?

Claude 4 Sonnet ranks #43 out of 88 models in coding and programming benchmarks with an average score of 65. There are stronger options in this category.

Is Claude 4 Sonnet good for mathematics?

Claude 4 Sonnet ranks #43 out of 88 models in mathematics benchmarks with an average score of 72. There are stronger options in this category.

Is Claude 4 Sonnet good for reasoning and logic?

Claude 4 Sonnet ranks #42 out of 88 models in reasoning and logic benchmarks with an average score of 70. There are stronger options in this category.

What is the context window size of Claude 4 Sonnet?

Claude 4 Sonnet has a context window of 200K tokens, which determines how much text it can process in a single interaction.