A direct benchmark comparison of Claude Opus 4.6 and GPT-5.4 on current BenchLM data. GPT-5.4 now leads overall, while Claude remains highly competitive on coding and still wins on some workflow-specific factors.
Share This Report
Copy the link, post it, or save a PDF version.
GPT-5.4 now leads Claude Opus 4.6 on BenchLM's current overall score, 94 to 92. The old storyline where Claude clearly beat GPT-5.4 on the blended leaderboard no longer holds. What remains true is that Claude is still close, still preferable for some workflows, and still one of the strongest flagships in the dataset.
| Claude Opus 4.6 | GPT-5.4 | |
|---|---|---|
| Overall score | 92 | 94 |
| Overall rank | #4 | #3 |
| Coding score | 90.8 | 90.7 |
| Agentic score | 92.6 | 93.5 |
| Knowledge score | 92.4 | 97.6 |
| Math score | 89.4 | 94.5 |
| API price | $15 / $75 | $2.50 / $15 |
These are not trivial edges. HLE is still one of the better hard-knowledge separators, and SWE-bench Pro remains one of the most meaningful software-engineering benchmarks in the public set.
The pattern is straightforward: GPT-5.4 wins more of the broad-purpose benchmark set and does it at a much lower price.
Claude and GPT-5.4 are now almost dead even on BenchLM's blended coding score, 90.8 to 90.7. That does not mean they are interchangeable.
Use GPT-5.4 if you want the stronger broad default and the better cost profile.
Use Claude Opus 4.6 if you want a flagship model that stays very close on coding while still feeling better for writing-heavy, lower-latency, or more collaborative workflows.
This is now a narrow GPT-5.4 lead, not a decisive Claude lead.
→ Compare all models on the full leaderboard · Updated comparison · Side-by-side comparison
Is Claude Opus 4.6 better than GPT-5.4? Not on the current overall score. GPT-5.4 leads 94 to 92.
What is the price difference between Claude Opus 4.6 and GPT-5.4? Claude is roughly 6x higher on input and 5x higher on output.
Which is better for coding — Claude or GPT-5.4? They are nearly tied on coding category score. GPT-5.4 wins more raw coding benchmarks; Claude wins SWE-bench Pro.
Why does Claude score higher on HLE? Claude still leads HLE 53 to 48, which remains one of its strongest raw benchmark wins.
Is Claude Opus 4.6 faster than GPT-5.4? Yes. Claude is non-reasoning and usually feels faster in interactive use.
All benchmark data is from our leaderboard. Compare these models on our comparison page.
These rankings update with every new model. We send one email a week with what moved and why.
Share This Report
Copy the link, post it, or save a PDF version.
On this page
Which models moved up, what’s new, and what it costs. One email a week, 3-min read.
Free. One email per week.
Claude Mythos Preview beats Opus 4.6 by double digits on every coding benchmark Anthropic released. Then they shelved it. Here's what the numbers actually show, and why the shipping decision matters more than the launch.
Claude Opus 4.6 vs GPT-5.4 head-to-head: current benchmark scores, pricing, and where each model actually wins. GPT-5.4 now leads overall, while Claude stays extremely close and still has real workflow-specific advantages.
We ranked every major LLM by BenchLM's current coding formula — SWE-Rebench, SWE-bench Pro, LiveCodeBench, and SWE-bench Verified. Here's which models actually come out on top and why.