Which is better for coding — Claude Opus 4.6 or GPT-5.4?

They are nearly tied on BenchLM's current coding category score: Claude Opus 4.6 at 90.8 and GPT-5.4 at 90.7. GPT-5.4 still leads raw SWE-bench Verified and LiveCodeBench, while Claude remains stronger on SWE-bench Pro.

Why does Claude Opus 4.6 score higher on HLE than GPT-5.4?

Claude Opus 4.6 currently scores 53 on HLE versus GPT-5.4 at 48. That is still one of Claude's clearest raw benchmark wins in this matchup.

Claude Opus 4.6 vs GPT-5.4: Where Each Model Actually Wins

Q: Is Claude Opus 4.6 better than GPT-5.4?

On BenchLM's current data, GPT-5.4 is ahead overall at 94 versus Claude Opus 4.6 at 92. Claude remains very close on coding and still makes sense for lower-latency, writing-first workflows.

Q: What is the price difference between Claude Opus 4.6 and GPT-5.4?

GPT-5.4 costs $2.50 input and $15 output per million tokens. Claude Opus 4.6 costs $15 input and $75 output. Claude is about 6x higher on input and 5x higher on output.

Q: Is Claude Opus 4.6 faster than GPT-5.4?

Yes. Claude Opus 4.6 is a non-reasoning model, so it generally avoids the extra inference-time latency GPT-5.4 pays for reasoning.

GPT-5.4 now leads Claude Opus 4.6 on BenchLM's current overall score, 94 to 92. The old storyline where Claude clearly beat GPT-5.4 on the blended leaderboard no longer holds. What remains true is that Claude is still close, still preferable for some workflows, and still one of the strongest flagships in the dataset.

Headline comparison

	Claude Opus 4.6	GPT-5.4
Overall score	92	94
Overall rank	#4	#3
Coding score	90.8	90.7
Agentic score	92.6	93.5
Knowledge score	92.4	97.6
Math score	89.4	94.5
API price	$15 / $75	$2.50 / $15

Where Claude still wins

HLE: 53 vs 48
SWE-bench Pro: 74 vs 57.7
Interaction style: non-reasoning, lower-latency, and often better for drafting and editing

These are not trivial edges. HLE is still one of the better hard-knowledge separators, and SWE-bench Pro remains one of the most meaningful software-engineering benchmarks in the public set.

Where GPT-5.4 wins now

Overall score: 94 vs 92
SWE-bench Verified: 84 vs 80.8
LiveCodeBench: 84 vs 76
Terminal-Bench 2.0: 75.1 vs 65.4
OSWorld-Verified: 75 vs 72.7
SimpleQA: 97 vs 72
MMLU-Pro: 93 vs 82
LongBench v2 / MRCRv2: 95 / 97 vs 92 / 92

The pattern is straightforward: GPT-5.4 wins more of the broad-purpose benchmark set and does it at a much lower price.

Coding: effectively a tie, but for different reasons

Claude and GPT-5.4 are now almost dead even on BenchLM's blended coding score, 90.8 to 90.7. That does not mean they are interchangeable.

Pick GPT-5.4 if you care most about raw SWE-bench Verified and LiveCodeBench performance.
Pick Claude Opus 4.6 if you care more about SWE-bench Pro and the quality of the interaction around the engineering work.

Verdict

Use GPT-5.4 if you want the stronger broad default and the better cost profile.

Use Claude Opus 4.6 if you want a flagship model that stays very close on coding while still feeling better for writing-heavy, lower-latency, or more collaborative workflows.

This is now a narrow GPT-5.4 lead, not a decisive Claude lead.

→ Compare all models on the full leaderboard · Updated comparison · Side-by-side comparison

Frequently asked questions

Is Claude Opus 4.6 better than GPT-5.4? Not on the current overall score. GPT-5.4 leads 94 to 92.

What is the price difference between Claude Opus 4.6 and GPT-5.4? Claude is roughly 6x higher on input and 5x higher on output.

Which is better for coding — Claude or GPT-5.4? They are nearly tied on coding category score. GPT-5.4 wins more raw coding benchmarks; Claude wins SWE-bench Pro.

Why does Claude score higher on HLE? Claude still leads HLE 53 to 48, which remains one of its strongest raw benchmark wins.

Is Claude Opus 4.6 faster than GPT-5.4? Yes. Claude is non-reasoning and usually feels faster in interactive use.

All benchmark data is from our leaderboard. Compare these models on our comparison page.

Claude Opus 4.6 vs GPT-5.4: Where Each Model Actually Wins

Headline comparison

Where Claude still wins

Where GPT-5.4 wins now

Coding: effectively a tie, but for different reasons

Verdict

Frequently asked questions

Don't miss the next GPT moment

Related Posts

LLM Context Window Comparison 2026: Advertised vs Effective, Input vs Output

DeepSeek V4 Pro vs Claude Opus 4.7 vs GPT-5.5: The Frontier in April 2026

Mythos Preview is the first frontier model Anthropic decided not to ship. The benchmarks show why.

Stay ahead of the LLM curve