Name: Claude 4 Sonnet
Rating: 50 (19 reviews)
Author: Anthropic

Question 1

How does Claude 4 Sonnet perform overall in AI benchmarks?

Accepted Answer

Claude 4 Sonnet currently ranks #67 out of 119 models on BenchLM's provisional leaderboard with an overall score of 50 (estimated). It is created by Anthropic and features a 200K context window.

Question 2

Is Claude 4 Sonnet good for knowledge and understanding?

Accepted Answer

Claude 4 Sonnet ranks #58 out of 119 models in knowledge and understanding benchmarks with an average score of 50.1. There are stronger options in this category.

Question 3

Is Claude 4 Sonnet good for coding and programming?

Accepted Answer

Claude 4 Sonnet ranks #51 out of 119 models in coding and programming benchmarks with an average score of 53.9. There are stronger options in this category.

Question 4

Is Claude 4 Sonnet good for reasoning and logic?

Accepted Answer

Claude 4 Sonnet ranks #48 out of 119 models in reasoning and logic benchmarks with an average score of 54.3. There are stronger options in this category.

Question 5

Is Claude 4 Sonnet good for agentic tool use and computer tasks?

Accepted Answer

Claude 4 Sonnet ranks #53 out of 119 models in agentic tool use and computer tasks benchmarks with an average score of 44.6. There are stronger options in this category.

Question 6

Is Claude 4 Sonnet good for multimodal and grounded tasks?

Accepted Answer

Claude 4 Sonnet ranks #21 out of 119 models in multimodal and grounded tasks benchmarks with an average score of 74.7. There are stronger options in this category.

Question 7

Is Claude 4 Sonnet good for instruction following?

Accepted Answer

Claude 4 Sonnet ranks #70 out of 119 models in instruction following benchmarks with an average score of 58.6. There are stronger options in this category.

Question 8

Does Claude 4 Sonnet have full benchmark coverage on BenchLM?

Accepted Answer

Not yet. Claude 4 Sonnet currently has 19 published benchmark scores out of the 225 benchmarks BenchLM tracks. BenchLM only exposes non-generated public benchmark rows, so missing categories stay blank until a sourced evaluation is available.

Question 9

What is the context window size of Claude 4 Sonnet?

Accepted Answer

Claude 4 Sonnet has a context window of 200K, which determines how much text it can process in a single interaction.

Claude 4 Sonnet

Ranking Distribution

Category Performance

Category Breakdown

Agentic

Coding

Reasoning

Knowledge

Math

Multilingual

Multimodal

Inst. Following

Chatbot Arena Performance

Benchmark Details

Compare This Model

Frequently Asked Questions