Skip to main content

Video-MME

A comprehensive benchmark for multimodal large language models on video understanding, covering temporal reasoning, perception, and question answering over videos.

Benchmark score on Video-MME — April 10, 2026

BenchLM mirrors the published score view for Video-MME. Kimi K2.5 leads the public snapshot at 87.4%. BenchLM does not use these results to rank models overall.

1 modelsMultimodal & GroundedRefreshingDisplay onlyUpdated April 10, 2026

About Video-MME

Year

2024

Tasks

Video understanding

Format

Video QA and analysis

Difficulty

Broad multimodal video reasoning

BenchLM tracks the aggregate Video-MME row as a display-oriented video benchmark when providers publish a single overall score rather than separate with-subtitle and without-subtitle splits.

BenchLM freshness & provenance

Version

Video-MME 2024

Refresh cadence

Annual

Staleness state

Refreshing

Question availability

Public benchmark set

RefreshingDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (1 models)

1
87.4%

FAQ

What does Video-MME measure?

A comprehensive benchmark for multimodal large language models on video understanding, covering temporal reasoning, perception, and question answering over videos.

Which model scores highest on Video-MME?

Kimi K2.5 by Moonshot AI currently leads with a score of 87.4% on Video-MME.

How many models are evaluated on Video-MME?

1 AI models have been evaluated on Video-MME on BenchLM.

Last updated: April 10, 2026 · BenchLM version Video-MME 2024

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.