Model profile

Claude Opus 4.7

Name: Claude Opus 4.7
Author: Anthropic

AnthropicCurrentReleased Apr 16, 2026

Data verified July 15, 2026

Overall Score

Unranked

Arena Elo

1494

Categories Ranked

0of 8

Price (1M tokens)

$5 in / $25 out

API pricing

Speed

Not listed

Context

Evidence coverage

22 of 300 tracked benchmarks are published. 8 are verified and 14 provisional. 7 of 8 categories are measured.

Updated July 15, 2026Methodology

Published / tracked: 22 / 300
Verified: 8
Provisional: 14
Categories measured: 7 / 8

Agentic4 benchmarks
Mixed evidence
Coding5 benchmarks
Mixed evidence
Reasoning2 benchmarks
Reported
Knowledge6 benchmarks
Reported
Math2 benchmarks
Verified
Multilingual0 benchmarks
Not measured
Multimodal2 benchmarks
Reported
Inst. Following1 benchmark
Reported

ProprietaryNon-Reasoning

Confidence:

Low

base

BenchLM is tracking Claude Opus 4.7, but this profile is currently excluded from the public leaderboard because it still lacks enough non-generated benchmark coverage to rank safely. Only non-generated public benchmark rows appear below.

Claude Opus 4.7 is a proprietary model with a 1M token context window. It processes queries without explicit chain-of-thought reasoning, offering faster response times and lower token usage.

Claude Opus 4.7 sits inside the Claude Opus 4.7 family alongside Claude Opus 4.7 (Adaptive). BenchLM links it directly to Claude Opus 4.6 as the earlier related model in that lineage. This profile currently has 22 of 300 tracked benchmarks. BenchLM only exposes non-generated benchmark rows publicly, so missing categories stay blank until a sourced evaluation is available.

Peer position

Exact provisional scores and ranks for the closest listed peers.

Range 68.0–70.0

DeepSeek V4 Flash (Max)
DeepSeek
#2870.0
DeepSeek V4 Flash (Max) is #28 with a score of 70.0.
Compare
Kimi K2.5 (Reasoning)
Moonshot AI
#2970.0
Kimi K2.5 (Reasoning) is #29 with a score of 70.0.
Compare
Inkling
Thinking Machines Lab
#3069.0
Inkling is #30 with a score of 69.0.
Compare
Grok 4.20
xAI
#3168.0
Grok 4.20 is #31 with a score of 68.0.
Compare
Claude Opus 4.7Current model
Anthropic
Unranked69.0
Claude Opus 4.7 is Unranked with a score of 69.0.
Qwen3.5 397B (Reasoning)
Alibaba
Unranked69.0
Qwen3.5 397B (Reasoning) is Unranked with a score of 69.0.
Compare
Grok 4.1 Fast
xAI
Unranked68.0
Grok 4.1 Fast is Unranked with a score of 68.0.
Compare

Category percentile

Relative position among models eligible for each sourced category. A higher percentile means a stronger position within that category's ranked cohort; 100 is highest.

No eligible category percentile is available from the published evidence yet.

Category evidence

Scores and ranks appear only where this model has published benchmark evidence. Categories without displayable source records remain not measured.

Category scores, ranks, weighting, benchmark coverage, and evidence status
Category	Score	Rank	Percentile	Weight	Benchmarks	Evidence
AgenticRank Not rankedWeight 22%4 benchmarksMixed sources	0.0	Not ranked	Not available	22%	4 benchmarks	Mixed sources
CodingRank Not rankedWeight 20%5 benchmarksMixed sources	0.0	Not ranked	Not available	20%	5 benchmarks	Mixed sources
ReasoningRank Not rankedWeight 17%2 benchmarksReported	0.0	Not ranked	Not available	17%	2 benchmarks	Reported
KnowledgeRank Not rankedWeight 12%6 benchmarksReported	0.0	Not ranked	Not available	12%	6 benchmarks	Reported
MathRank Not rankedWeight 5%2 benchmarksVerified	65.7	Not ranked	Not available	5%	2 benchmarks	Verified
MultilingualWeight 7%0 benchmarksNot measured	Not measured	Not ranked	Not available	7%	0 benchmarks	Not measured
MultimodalRank Not rankedWeight 12%2 benchmarksReported	0.0	Not ranked	Not available	12%	2 benchmarks	Reported
Inst. FollowingRank Not rankedWeight 5%1 benchmarkReported	0.0	Not ranked	Not available	5%	1 benchmark	Reported

Chatbot Arena performance

Scroll horizontally to inspect confidence intervals and vote counts.

Chatbot Arena Elo, confidence interval, and vote count by evaluation view
View	Elo	Confidence interval	Votes
Text Overall	1494	±4.3	47,222
Coding	1550	±6.7	13,419
Math	1494	±12.6	2,445
Instruction Following	1495	±6.2	16,077
Creative Writing	1482	±8.1	8,037
Multi-turn	1519	±7.9	8,415
Hard Prompts	1521	±5.2	31,335
Hard Prompts (English)	1523	±6.3	15,688
Longer Query	1511	±6.0	21,011

Benchmark Details

Rows below have a displayable published verification record. Each source link and provenance note remains in the page HTML while its category is closed. Source-unverified manual rows and generated rows stay hidden.

Agentic4 benchmarks

τ²-bench resultsReported

τ²-Bench Tool-Agent-User Evaluation

74%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Gert LabsBenchmark exact

Gert Labs Composite Game Benchmark

65.59%Display only

Source: Gert Labs rankingsProvenance: Gert Labs reports this composite leaderboard score in the public rankings API. BenchLM scales the source gscore from 0-1 to 0-100 and stores it as a display-only agentic benchmark.

ResearchClawBenchBenchmark exact

20.7%Display only

Source: ResearchClawBench leaderboardProvenance: ResearchClawBench reports this model as ResearchHarness (Claude-Opus-4.7) in the official Pass@1 leaderboard. BenchLM stores the one-decimal RADS average on the local ResearchClawBench display key and excludes it from weighted rankings.

OSWorld 2.0Benchmark exact

13.9%Display only

Source: OSWorld 2.0 paperProvenance: OSWorld 2.0 reports Claude Opus 4.7 single action on its 500-step main table. BenchLM stores the binary completion score and notes the corresponding partial score was 49.1%.

Coding5 benchmarks

Vibe Code BenchBenchmark exact

Vibe Code Bench v1.1

71.00%Display only

Source: Vals AI: Vibe Code Bench v1.1Provenance: Vals Vibe Code Bench v1.1 reports this exact row under anthropic/claude-opus-4-7; BenchLM stores it on the local vibeCodeBench key.

React Native EvalsBenchmark exact

82.8%Display only

Source: React Native Evals leaderboardProvenance: React Native Evals reports this exact overall score for Claude Opus 4.7 in the public dashboard run finished on 2026-04-28.

Terminal-Bench HardReported

54.5%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-SciCodeReported

Artificial Analysis SciCode

50.1%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

FrontierCodeBenchmark exact

FrontierCode Diamond

38.5%Display only

Source: Cognition: FrontierCode 1.1Provenance: Cognition reports Claude Opus 4.7 at 38.5% on FrontierCode 1.1 Main, using the best max effort row from the published data JSON.

Reasoning2 benchmarks

AA-LCRReported

Artificial Analysis Long Context Reasoning

67.0%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

CritPtReported

Critical Physics Tasks

5.1%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Knowledge6 benchmarks

Artificial Analysis Intelligence IndexReported

42.7%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-GPQA DiamondReported

Artificial Analysis GPQA Diamond

88.5%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-HLEReported

Artificial Analysis Humanity's Last Exam

31.2%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-Omniscience IndexReported

Artificial Analysis Omniscience Index

14.2%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-Omniscience AccuracyReported

Artificial Analysis Omniscience Accuracy

43.5%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-Omniscience Hallucination RateReported

Artificial Analysis Omniscience Hallucination Rate

51.9%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Math2 benchmarks

FrontierMath v2 (Tiers 1-3)Benchmark exact

FrontierMath v2 Tiers 1-3

43.793%Weighted 30%

Source: Epoch AI FrontierMath v2 leaderboardProvenance: Epoch AI reports FrontierMath v2 Tiers 1-3 at 43.793% for claude-opus-4-7_xhigh. BenchLM selects the highest published thinking effort for the model and stores the v2 benchmark slice separately.

FrontierMath v2 (Tier 4)Benchmark exact

FrontierMath v2 Tier 4

22.917%Weighted 10%

Source: Epoch AI FrontierMath v2 leaderboardProvenance: Epoch AI reports FrontierMath v2 Tier 4 at 22.917% for claude-opus-4-7_xhigh. BenchLM selects the highest published thinking effort for the model and stores the v2 benchmark slice separately.

Multimodal2 benchmarks

AA-MMMU-ProReported

Artificial Analysis MMMU-Pro

76.4%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Design Arena WebsiteReported

Design Arena Website Elo

1328Display only

Source: OpenRouter model benchmarksProvenance: Display-only Design Arena Website Elo synced from OpenRouter model benchmark metadata. It is excluded from BenchLM weighted scoring.

Inst. Following1 benchmark

AA-IFBenchReported

Artificial Analysis IFBench

43.6%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Claude Opus 4.7 Family

Base entry

Related Earlier Model

Claude Opus 4.6

Claude Opus 4.7 (Adaptive)Prov. 75

Frequently Asked Questions

How does Claude Opus 4.7 perform overall in AI benchmarks?

Claude Opus 4.7 has 22 published benchmark scores on BenchLM, but it does not yet have enough non-generated coverage to receive a global overall rank.

Is Claude Opus 4.7 good for knowledge and understanding?

Claude Opus 4.7 has visible benchmark coverage in knowledge and understanding, but BenchLM does not currently assign it a global category rank there.

Is Claude Opus 4.7 good for coding and programming?

Claude Opus 4.7 has visible benchmark coverage in coding and programming, but BenchLM does not currently assign it a global category rank there.

Is Claude Opus 4.7 good for mathematics?

Claude Opus 4.7 has visible benchmark coverage in mathematics, but BenchLM does not currently assign it a global category rank there.

Is Claude Opus 4.7 good for reasoning and logic?

Claude Opus 4.7 has visible benchmark coverage in reasoning and logic, but BenchLM does not currently assign it a global category rank there.

Is Claude Opus 4.7 good for agentic tool use and computer tasks?

Claude Opus 4.7 has visible benchmark coverage in agentic tool use and computer tasks, but BenchLM does not currently assign it a global category rank there.

Is Claude Opus 4.7 good for multimodal and grounded tasks?

Claude Opus 4.7 has visible benchmark coverage in multimodal and grounded tasks, but BenchLM does not currently assign it a global category rank there.

Is Claude Opus 4.7 good for instruction following?

Claude Opus 4.7 has visible benchmark coverage in instruction following, but BenchLM does not currently assign it a global category rank there.

Which sibling models are related to Claude Opus 4.7?

Claude Opus 4.7 belongs to the Claude Opus 4.7 family. Related variants on BenchLM include Claude Opus 4.7 (Adaptive).

Does Claude Opus 4.7 have full benchmark coverage on BenchLM?

Not yet. Claude Opus 4.7 currently has 22 published benchmark scores out of the 300 benchmarks BenchLM tracks. BenchLM only exposes non-generated public benchmark rows, so missing categories stay blank until a sourced evaluation is available.

What is the context window size of Claude Opus 4.7?

Claude Opus 4.7 has a context window of 1M, which determines how much text it can process in a single interaction.

Related Resources

Last updated: July 15, 2026 · Runtime metrics stay blank until BenchLM has a sourced snapshot.

Don't miss the next GPT moment

Which models moved up, what is new, and what it costs. One email each week.

Free. One email per week.

Claude Opus 4.7

Evidence coverage

Evidence by category

Peer position

Category percentile

Category evidence

Chatbot Arena performance

Benchmark Details

Claude Opus 4.7 Family

Frequently Asked Questions

How does Claude Opus 4.7 perform overall in AI benchmarks?

Is Claude Opus 4.7 good for knowledge and understanding?

Is Claude Opus 4.7 good for coding and programming?

Is Claude Opus 4.7 good for mathematics?

Is Claude Opus 4.7 good for reasoning and logic?

Is Claude Opus 4.7 good for agentic tool use and computer tasks?

Is Claude Opus 4.7 good for multimodal and grounded tasks?

Is Claude Opus 4.7 good for instruction following?

Which sibling models are related to Claude Opus 4.7?

Does Claude Opus 4.7 have full benchmark coverage on BenchLM?

What is the context window size of Claude Opus 4.7?

Related Resources

Don't miss the next GPT moment