Skip to main content

SWE-Marathon

A long-horizon software engineering benchmark from Abundant AI with multi-hour tasks spanning library reproductions, full-stack product clones, and ML engineering.

How BenchLM shows SWE-Marathon

BenchLM tracks SWE-Marathon as a source-backed external benchmark. The official v1.0 site describes 20 multi-hour software engineering tasks and 1,300 logged trials across library reproductions, full-stack product clones, and ML engineering work.

SWE-Marathon is display only on BenchLM. The public site exposes rich task-level leaderboards and trajectory artifacts, but BenchLM is not mirroring an aggregate model table until there is a stable public feed for those rows.

20 multi-hour tasks1,300 logged trials< 19% task resolutionApache 2.0Source metadata only

About SWE-Marathon

Year

2026

Tasks

20 multi-hour software engineering tasks

Format

Task resolution and trajectory review

Difficulty

Ultra-long-horizon software engineering

BenchLM tracks SWE-Marathon as a display-only external benchmark. The official v1.0 site reports 20 multi-hour tasks, 1,300 logged trials, task-level leaderboards, and replayable trajectory artifacts; BenchLM keeps it source-metadata-only until there is a stable public aggregate feed.

BenchLM freshness & provenance

Version

SWE-Marathon 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Task resolution table (0 models)

FAQ

What does SWE-Marathon measure?

A long-horizon software engineering benchmark from Abundant AI with multi-hour tasks spanning library reproductions, full-stack product clones, and ML engineering.

Which model leads the published SWE-Marathon snapshot?

No models have been evaluated on SWE-Marathon yet.

How many models are evaluated on SWE-Marathon?

0 AI models are included in BenchLM's mirrored SWE-Marathon snapshot, based on the public leaderboard captured on SWE-Marathon v1.0.

Last updated: SWE-Marathon v1.0 · mirrored from the public benchmark leaderboard

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.