This reporting page isolates visual reasoning and image understanding from the broader multimodal category. It prioritizes sourced benchmarks for diagrams, grounding, counting, real-world image QA, and multimodal math.
This page ranks models using only sourced image-understanding benchmarks in the reporting family.
Bottom line: Image understanding spans 13 benchmarks but sourced coverage is still sparse. MMMU-Pro is the best single signal — check the multimodal leaderboard for ranked models.
Get notified when models move. One email a week with what changed and why.
Free. No spam. Unsubscribe anytime.
The top model on this sourced reporting-family slice is Qwen3.6-35B-A3B by Alibaba with an average of 80.6.
The best open-weight model is Qwen3.6-35B-A3B at position #1.
1 models are listed with sourced benchmark coverage in this reporting family.
This is a reporting family ranking, not a weighted category. It averages sourced image understanding and visual reasoning benchmarks to give a focused view of this capability.
Models must have sourced results on at least a quarter of the benchmarks in this family to be included. Coverage varies — a model with 2 benchmark scores is less reliable than one with 5.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.