Claude 3.5 Sonnet Benchmark Scores & Performance

Benchmark analysis of Claude 3.5 Sonnet by Anthropic across 14 tests.

Creator

Anthropic

Source Type

Proprietary

Reasoning

Non-Reasoning

Context Window

200K

Overall Score

55#51 of 88

Knowledge Benchmarks

MMLU
65
GPQA
65
SuperGPQA
63
OpenBookQA
61

Coding Benchmarks

HumanEval
57

Mathematics Benchmarks

AIME 2023
65
AIME 2024
67
AIME 2025
66
HMMT Feb 2023
61
HMMT Feb 2024
63
HMMT Feb 2025
62
BRUMO 2025
64

Reasoning Benchmarks

SimpleQA
63
MuSR
61

Frequently Asked Questions

How does Claude 3.5 Sonnet perform overall in AI benchmarks?

Claude 3.5 Sonnet ranks #51 out of 88 models with an overall score of 55. It is created by Anthropic and features a 200K context window.

Is Claude 3.5 Sonnet good for knowledge and understanding?

Claude 3.5 Sonnet ranks #51 out of 88 models in knowledge and understanding benchmarks with an average score of 63.5. There are stronger options in this category.

Is Claude 3.5 Sonnet good for coding and programming?

Claude 3.5 Sonnet ranks #52 out of 88 models in coding and programming benchmarks with an average score of 57. There are stronger options in this category.

Is Claude 3.5 Sonnet good for mathematics?

Claude 3.5 Sonnet ranks #52 out of 88 models in mathematics benchmarks with an average score of 64. There are stronger options in this category.

Is Claude 3.5 Sonnet good for reasoning and logic?

Claude 3.5 Sonnet ranks #51 out of 88 models in reasoning and logic benchmarks with an average score of 62. There are stronger options in this category.

What is the context window size of Claude 3.5 Sonnet?

Claude 3.5 Sonnet has a context window of 200K tokens, which determines how much text it can process in a single interaction.