An OpenClaw-derived agent benchmark covering practical work and life tasks such as office document delivery, research, planning, and code maintenance.
BenchLM mirrors the published score view for MM-ClawBench. MiniMax M2.7 leads the public snapshot at 62.7% , followed by MiMo-V2.5 (23.8%). BenchLM does not use these results to rank models overall.
Year
2026
Tasks
OpenClaw-style real-world tasks
Format
Agent workflow evaluation
Difficulty
Broad real-world agentic execution
MiniMax built MM-ClawBench from commonly used OpenClaw tasks to evaluate how well models handle broad real-world agent scenarios across work and personal productivity.
Version
MM-ClawBench 2026
Refresh cadence
Quarterly
Staleness state
Current
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
An OpenClaw-derived agent benchmark covering practical work and life tasks such as office document delivery, research, planning, and code maintenance.
MiniMax M2.7 by MiniMax currently leads with a score of 62.7% on MM-ClawBench.
2 AI models have been evaluated on MM-ClawBench on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.