A visual mathematics benchmark that tests whether a model can solve math problems grounded in diagrams, equations, figures, and other visual inputs.
As of March 2026, Qwen3.5 397B leads the MathVision leaderboard with 88.6% , followed by Qwen3.6 Plus (88.0%) and Gemini 3 Pro (86.6%).
Qwen3.5 397B
Alibaba
Qwen3.6 Plus
Alibaba
Gemini 3 Pro
According to BenchLM.ai, Qwen3.5 397B leads the MathVision benchmark with a score of 88.6%, followed by Qwen3.6 Plus (88.0%) and Gemini 3 Pro (86.6%). The top models are clustered within 2.0 points, suggesting this benchmark is nearing saturation for frontier models.
6 models have been evaluated on MathVision. The benchmark falls in the Multimodal & Grounded category. This category carries a 12% weight in BenchLM.ai's overall scoring system. MathVision is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.
Year
2026
Tasks
Visually grounded math problems
Format
Image + math reasoning
Difficulty
Advanced multimodal mathematics
MathVision matters because text-only math ability does not guarantee strong performance when the relevant information is embedded in images, geometry diagrams, or formatted equations.
Qwen3.6 launch benchmarksVersion
MathVision 2026
Refresh cadence
Quarterly
Staleness state
Current
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
A visual mathematics benchmark that tests whether a model can solve math problems grounded in diagrams, equations, figures, and other visual inputs.
Qwen3.5 397B by Alibaba currently leads with a score of 88.6% on MathVision.
6 AI models have been evaluated on MathVision on BenchLM.
Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.
Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.