Flame-VLM-Code

Name: Flame-VLM-Code
Creator: BenchLM

A vision-language coding benchmark for generating correct code from visual and multimodal inputs.

How BenchLM shows Flame-VLM-Code right now

BenchLM is tracking Flame-VLM-Code in the local dataset, but exact-source verification records for these rows are still being attached. To avoid a blank benchmark page, BenchLM shows the current tracked rows below as a display-only reference table.

These tracked rows are useful for inspection and spot-checking, but until exact-source attachments are completed they should not be treated as fully verified public benchmark rows.

3 tracked modelsLocal tracked rowsAwaiting exact-source attachmentsDisplay only

GLM-5V-Turbo

Tracked score on Flame-VLM-Code — May 13, 2026

BenchLM mirrors the published tracked score view for Flame-VLM-Code. Claude Opus 4.6 leads the public snapshot at 98.8% , followed by GLM-5V-Turbo (93.8%) and Kimi K2.5 (88.8%). BenchLM does not use these results to rank models overall.

Claude Opus 4.6

Anthropic

claude-opus-4-6

98.8%

Overall —

GLM-5V-Turbo

Z.AI

glm-5v-turbo

93.8%

Overall —

Kimi K2.5

Moonshot AI

kimi-k2-5

88.8%

Overall —

3 modelsMultimodal & GroundedCurrentDisplay onlyUpdated May 13, 2026

The published Flame-VLM-Code snapshot is tightly clustered at the top: Claude Opus 4.6 sits at 98.8%, while the third row is only 10.0 points behind. The broader top-10 spread is 10.0 points, so the benchmark still separates strong models even when the leaders cluster.

3 models have been evaluated on Flame-VLM-Code. The benchmark falls in the Multimodal & Grounded category. This category carries a 12% weight in BenchLM.ai's overall scoring system. Flame-VLM-Code is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About Flame-VLM-Code

Year

2026

Tasks

Multimodal coding tasks

Format

Vision-language code generation

Difficulty

Multimodal coding

BenchLM tracks Flame-VLM-Code as a display-only multimodal coding benchmark reference.

GLM-5V-Turbo Public benchmark source

BenchLM freshness & provenance

Version

Flame-VLM-Code 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Tracked score table (3 models)

Claude Opus 4.6claude-opus-4-6

Anthropic

98.8%

GLM-5V-Turboglm-5v-turbo

Z.AI

93.8%

Kimi K2.5kimi-k2-5

Moonshot AI

88.8%

FAQ

What does Flame-VLM-Code measure?

A vision-language coding benchmark for generating correct code from visual and multimodal inputs.

Which model leads the published Flame-VLM-Code snapshot?

Claude Opus 4.6 currently leads the published Flame-VLM-Code snapshot with a tracked score of 98.8%. BenchLM shows this benchmark for display only and does not use it in overall rankings.

How many models are evaluated on Flame-VLM-Code?

3 AI models are included in BenchLM's mirrored Flame-VLM-Code snapshot, based on the public leaderboard captured on May 13, 2026.

Last updated: May 13, 2026 · mirrored from the public benchmark leaderboard

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.