CharXiv Reasoning (CharXiv)

A scientific chart reasoning benchmark that tests whether models can understand, interpret, and reason about complex scientific visualizations including plots, diagrams, and data charts.

Benchmark score on CharXiv — April 7, 2026

BenchLM mirrors the published score view for CharXiv. Claude Mythos Preview leads the public snapshot at 93.2%. BenchLM does not use these results to rank models overall.

1 modelsMultimodal & GroundedRefreshingDisplay onlyUpdated April 7, 2026

About CharXiv

Year

2024

Tasks

Scientific chart reasoning

Format

Chart understanding and reasoning

Difficulty

Scientific visualization reasoning

CharXiv evaluates a model's ability to reason about real-world scientific charts rather than simple visual QA. With-tools and without-tools variants isolate raw visual reasoning from tool-augmented performance.

BenchLM freshness & provenance

Version

CharXiv 2024

Refresh cadence

Annual

Staleness state

Refreshing

Question availability

Public benchmark set

RefreshingDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (1 models)

#1
93.2%

FAQ

What does CharXiv measure?

A scientific chart reasoning benchmark that tests whether models can understand, interpret, and reason about complex scientific visualizations including plots, diagrams, and data charts.

Which model scores highest on CharXiv?

Claude Mythos Preview by Anthropic currently leads with a score of 93.2% on CharXiv.

How many models are evaluated on CharXiv?

1 AI models have been evaluated on CharXiv on BenchLM.

Last updated: April 7, 2026 · BenchLM version CharXiv 2024

Weekly LLM Benchmark Digest

Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.

Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.