Claude 4.1 Opus Benchmark Scores & Performance

Benchmark analysis of Claude 4.1 Opus by Anthropic across 14 tests.

Creator

Anthropic

Source Type

Proprietary

Reasoning

Non-Reasoning

Context Window

200K

Overall Score

61#40 of 88

Knowledge Benchmarks

MMLU
76
GPQA
76
SuperGPQA
74
OpenBookQA
72

Coding Benchmarks

HumanEval
68

Mathematics Benchmarks

AIME 2023
76
AIME 2024
78
AIME 2025
77
HMMT Feb 2023
72
HMMT Feb 2024
74
HMMT Feb 2025
73
BRUMO 2025
75

Reasoning Benchmarks

SimpleQA
74
MuSR
72

Frequently Asked Questions

How does Claude 4.1 Opus perform overall in AI benchmarks?

Claude 4.1 Opus ranks #40 out of 88 models with an overall score of 61. It is created by Anthropic and features a 200K context window.

Is Claude 4.1 Opus good for knowledge and understanding?

Claude 4.1 Opus ranks #40 out of 88 models in knowledge and understanding benchmarks with an average score of 74.5. There are stronger options in this category.

Is Claude 4.1 Opus good for coding and programming?

Claude 4.1 Opus ranks #40 out of 88 models in coding and programming benchmarks with an average score of 68. There are stronger options in this category.

Is Claude 4.1 Opus good for mathematics?

Claude 4.1 Opus ranks #40 out of 88 models in mathematics benchmarks with an average score of 75. There are stronger options in this category.

Is Claude 4.1 Opus good for reasoning and logic?

Claude 4.1 Opus ranks #39 out of 88 models in reasoning and logic benchmarks with an average score of 73. There are stronger options in this category.

What is the context window size of Claude 4.1 Opus?

Claude 4.1 Opus has a context window of 200K tokens, which determines how much text it can process in a single interaction.