A multi-language software-engineering benchmark that measures repository-level bug fixing and implementation across more than one programming ecosystem.
As of March 2026, MiniMax M2.7 leads the Multi-SWE Bench leaderboard with 52.7%.
Year
2026
Tasks
Multi-language repo tasks
Format
Repository task completion
Difficulty
Professional software engineering
MiniMax positions Multi-SWE Bench as a benchmark closer to real engineering work than isolated code generation, emphasizing multi-language repository workflows.
MiniMax M2.7: Early Echoes of Self-EvolutionA multi-language software-engineering benchmark that measures repository-level bug fixing and implementation across more than one programming ecosystem.
MiniMax M2.7 by MiniMax currently leads with a score of 52.7% on Multi-SWE Bench.
1 AI models have been evaluated on Multi-SWE Bench on BenchLM.
Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.
Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.