Video-MME

A comprehensive benchmark for multimodal large language models on video understanding, covering temporal reasoning, perception, and question answering over videos.

Benchmark score on Video-MME — April 8, 2026

BenchLM mirrors the published score view for Video-MME. Kimi K2.5 leads the public snapshot at 87.4%. BenchLM does not use these results to rank models overall.

1 modelsMultimodal & GroundedRefreshingDisplay onlyUpdated April 8, 2026

About Video-MME

Year

2024

Tasks

Video understanding

Format

Video QA and analysis

Difficulty

Broad multimodal video reasoning

BenchLM tracks the aggregate Video-MME row as a display-oriented video benchmark when providers publish a single overall score rather than separate with-subtitle and without-subtitle splits.

BenchLM freshness & provenance

Version

Video-MME 2024

Refresh cadence

Annual

Staleness state

Refreshing

Question availability

Public benchmark set

RefreshingDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (1 models)

1
87.4%

FAQ

What does Video-MME measure?

A comprehensive benchmark for multimodal large language models on video understanding, covering temporal reasoning, perception, and question answering over videos.

Which model scores highest on Video-MME?

Kimi K2.5 by Moonshot AI currently leads with a score of 87.4% on Video-MME.

How many models are evaluated on Video-MME?

1 AI models have been evaluated on Video-MME on BenchLM.

Last updated: April 8, 2026 · BenchLM version Video-MME 2024

Weekly LLM Benchmark Digest

Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.

Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.