A repository-understanding benchmark that measures whether models can map natural-language requests onto the right code locations and system changes.
As of March 2026, MiniMax M2.7 leads the NL2Repo leaderboard with 39.8%.
Year
2026
Tasks
Natural language to repository tasks
Format
Repository understanding benchmark
Difficulty
System-level software comprehension
MiniMax cites NL2Repo as a system-level engineering benchmark that rewards deep understanding of complex repositories and their operational structure.
MiniMax M2.7: Early Echoes of Self-EvolutionA repository-understanding benchmark that measures whether models can map natural-language requests onto the right code locations and system changes.
MiniMax M2.7 by MiniMax currently leads with a score of 39.8% on NL2Repo.
1 AI models have been evaluated on NL2Repo on BenchLM.
Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.
Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.