A lightweight machine-learning competition benchmark that measures whether models can iteratively train, evaluate, and improve ML systems in low-resource settings.
BenchLM mirrors the published score view for MLE-Bench Lite. MiniMax M2.7 leads the public snapshot at 66.6%. BenchLM does not use these results to rank models overall.
Year
2026
Tasks
Low-resource ML competitions
Format
Autonomous iterative ML optimization
Difficulty
Agentic machine learning
MiniMax reports MLE-Bench Lite results from autonomous multi-round optimization on low-resource machine-learning competitions, making it a useful signal for agentic ML workflows.
Version
MLE-Bench Lite 2026
Refresh cadence
Quarterly
Staleness state
Current
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
A lightweight machine-learning competition benchmark that measures whether models can iteratively train, evaluate, and improve ML systems in low-resource settings.
MiniMax M2.7 by MiniMax currently leads with a score of 66.6% on MLE-Bench Lite.
1 AI models have been evaluated on MLE-Bench Lite on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.