A comprehensive benchmark for multimodal large language models on video understanding, covering temporal reasoning, perception, and question answering over videos.
BenchLM mirrors the published score view for Video-MME. Kimi K2.5 leads the public snapshot at 87.4%. BenchLM does not use these results to rank models overall.
Year
2024
Tasks
Video understanding
Format
Video QA and analysis
Difficulty
Broad multimodal video reasoning
BenchLM tracks the aggregate Video-MME row as a display-oriented video benchmark when providers publish a single overall score rather than separate with-subtitle and without-subtitle splits.
Version
Video-MME 2024
Refresh cadence
Annual
Staleness state
Refreshing
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
A comprehensive benchmark for multimodal large language models on video understanding, covering temporal reasoning, perception, and question answering over videos.
Kimi K2.5 by Moonshot AI currently leads with a score of 87.4% on Video-MME.
1 AI models have been evaluated on Video-MME on BenchLM.
Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.
Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.