Multi-SWE Bench (Multi-SWE Bench)

A multi-language software-engineering benchmark that measures repository-level bug fixing and implementation across more than one programming ecosystem.

Top Models on Multi-SWE Bench — March 2026

As of March 2026, MiniMax M2.7 leads the Multi-SWE Bench leaderboard with 52.7%.

1 modelsCodingUpdated March 18, 2026

About Multi-SWE Bench

Year

2026

Tasks

Multi-language repo tasks

Format

Repository task completion

Difficulty

Professional software engineering

MiniMax positions Multi-SWE Bench as a benchmark closer to real engineering work than isolated code generation, emphasizing multi-language repository workflows.

MiniMax M2.7: Early Echoes of Self-Evolution

Leaderboard (1 models)

#1MiniMax M2.7
52.7%

FAQ

What does Multi-SWE Bench measure?

A multi-language software-engineering benchmark that measures repository-level bug fixing and implementation across more than one programming ecosystem.

Which model scores highest on Multi-SWE Bench?

MiniMax M2.7 by MiniMax currently leads with a score of 52.7% on Multi-SWE Bench.

How many models are evaluated on Multi-SWE Bench?

1 AI models have been evaluated on Multi-SWE Bench on BenchLM.

Last updated: March 18, 2026

Weekly LLM Benchmark Digest

Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.

Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.