Skip to main content
comparisonclaudegptbenchmarkscoding

Claude Opus 4.6 vs GPT-5.4: Where Each Model Actually Wins

A direct benchmark comparison of Claude Opus 4.6 and GPT-5.4 on current BenchLM data. GPT-5.4 now leads overall, while Claude remains highly competitive on coding and still wins on some workflow-specific factors.

Glevd·Published March 7, 2026·Updated April 8, 2026·8 min read

Share This Report

Copy the link, post it, or save a PDF version.

Share on XShare on LinkedIn

GPT-5.4 now leads Claude Opus 4.6 on BenchLM's current overall score, 94 to 92. The old storyline where Claude clearly beat GPT-5.4 on the blended leaderboard no longer holds. What remains true is that Claude is still close, still preferable for some workflows, and still one of the strongest flagships in the dataset.

Headline comparison

Claude Opus 4.6 GPT-5.4
Overall score 92 94
Overall rank #4 #3
Coding score 90.8 90.7
Agentic score 92.6 93.5
Knowledge score 92.4 97.6
Math score 89.4 94.5
API price $15 / $75 $2.50 / $15

Where Claude still wins

  • HLE: 53 vs 48
  • SWE-bench Pro: 74 vs 57.7
  • Interaction style: non-reasoning, lower-latency, and often better for drafting and editing

These are not trivial edges. HLE is still one of the better hard-knowledge separators, and SWE-bench Pro remains one of the most meaningful software-engineering benchmarks in the public set.

Where GPT-5.4 wins now

  • Overall score: 94 vs 92
  • SWE-bench Verified: 84 vs 80.8
  • LiveCodeBench: 84 vs 76
  • Terminal-Bench 2.0: 75.1 vs 65.4
  • OSWorld-Verified: 75 vs 72.7
  • SimpleQA: 97 vs 72
  • MMLU-Pro: 93 vs 82
  • LongBench v2 / MRCRv2: 95 / 97 vs 92 / 92

The pattern is straightforward: GPT-5.4 wins more of the broad-purpose benchmark set and does it at a much lower price.

Coding: effectively a tie, but for different reasons

Claude and GPT-5.4 are now almost dead even on BenchLM's blended coding score, 90.8 to 90.7. That does not mean they are interchangeable.

  • Pick GPT-5.4 if you care most about raw SWE-bench Verified and LiveCodeBench performance.
  • Pick Claude Opus 4.6 if you care more about SWE-bench Pro and the quality of the interaction around the engineering work.

Verdict

Use GPT-5.4 if you want the stronger broad default and the better cost profile.

Use Claude Opus 4.6 if you want a flagship model that stays very close on coding while still feeling better for writing-heavy, lower-latency, or more collaborative workflows.

This is now a narrow GPT-5.4 lead, not a decisive Claude lead.

Compare all models on the full leaderboard · Updated comparison · Side-by-side comparison


Frequently asked questions

Is Claude Opus 4.6 better than GPT-5.4? Not on the current overall score. GPT-5.4 leads 94 to 92.

What is the price difference between Claude Opus 4.6 and GPT-5.4? Claude is roughly 6x higher on input and 5x higher on output.

Which is better for coding — Claude or GPT-5.4? They are nearly tied on coding category score. GPT-5.4 wins more raw coding benchmarks; Claude wins SWE-bench Pro.

Why does Claude score higher on HLE? Claude still leads HLE 53 to 48, which remains one of its strongest raw benchmark wins.

Is Claude Opus 4.6 faster than GPT-5.4? Yes. Claude is non-reasoning and usually feels faster in interactive use.


All benchmark data is from our leaderboard. Compare these models on our comparison page.

These rankings update with every new model. We send one email a week with what moved and why.