A scientific chart reasoning benchmark that tests whether models can understand, interpret, and reason about complex scientific visualizations including plots, diagrams, and data charts.
As of May 22, 2026, Claude Mythos Preview leads the CharXiv leaderboard with 93.2% , followed by Claude Opus 4.7 (Adaptive) (91%) and Muse Spark (86.4%).
Claude Mythos Preview
Anthropic
Claude Opus 4.7 (Adaptive)
Anthropic
Muse Spark
Meta
According to BenchLM.ai, Claude Mythos Preview leads the CharXiv benchmark with a score of 93.2%, followed by Claude Opus 4.7 (Adaptive) (91%) and Muse Spark (86.4%). The scores show moderate spread, with meaningful differences between the top tier and mid-tier models.
21 models have been evaluated on CharXiv. The benchmark falls in the Multimodal & Grounded category. This category carries a 12% weight in BenchLM.ai's overall scoring system. Within that category, CharXiv contributes 20% of the category score, so strong performance here directly affects a model's overall ranking.
Year
2024
Tasks
Scientific chart reasoning
Format
Chart understanding and reasoning
Difficulty
Scientific visualization reasoning
CharXiv evaluates a model's ability to reason about real-world scientific charts rather than simple visual QA. With-tools and without-tools variants isolate raw visual reasoning from tool-augmented performance.
Version
CharXiv 2024
Refresh cadence
Annual
Staleness state
Refreshing
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
A scientific chart reasoning benchmark that tests whether models can understand, interpret, and reason about complex scientific visualizations including plots, diagrams, and data charts.
Claude Mythos Preview by Anthropic currently leads with a score of 93.2% on CharXiv.
21 AI models have been evaluated on CharXiv on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.