A stronger coding-agent benchmark than SWE-bench Verified, intended to differentiate frontier models on realistic software engineering work.
As of April 29, 2026, Claude Mythos Preview leads the SWE-bench Pro leaderboard with 77.8% , followed by Claude Opus 4.7 (Adaptive) (64.3%) and GPT-5.5 (58.6%).
Claude Mythos Preview
Anthropic
Claude Opus 4.7 (Adaptive)
Anthropic
GPT-5.5
OpenAI
According to BenchLM.ai, Claude Mythos Preview leads the SWE-bench Pro benchmark with a score of 77.8%, followed by Claude Opus 4.7 (Adaptive) (64.3%) and GPT-5.5 (58.6%). There is significant spread across the leaderboard, making this benchmark effective at differentiating model capabilities.
30 models have been evaluated on SWE-bench Pro. The benchmark falls in the Coding category. This category carries a 20% weight in BenchLM.ai's overall scoring system. Within that category, SWE-bench Pro contributes 23% of the category score, so strong performance here directly affects a model's overall ranking.
Year
2026
Tasks
Real-world software engineering
Format
Repository task completion
Difficulty
Frontier coding agent
SWE-bench Pro is the more relevant frontier signal when selecting coding agents in 2026. It reflects more realistic difficulty than the older verified subset.
Version
SWE-bench Pro 2026
Refresh cadence
Quarterly
Staleness state
Current
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
A stronger coding-agent benchmark than SWE-bench Verified, intended to differentiate frontier models on realistic software engineering work.
Claude Mythos Preview by Anthropic currently leads with a score of 77.8% on SWE-bench Pro.
30 AI models have been evaluated on SWE-bench Pro on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.