Skip to main content

CharXiv Reasoning without tools (CharXiv w/o tools)

Tool-free variant of CharXiv that isolates raw visual reasoning ability without code execution or tool augmentation.

Top models on CharXiv w/o tools — May 22, 2026

As of May 22, 2026, Claude Mythos Preview leads the CharXiv w/o tools leaderboard with 86.1% , followed by Claude Opus 4.7 (Adaptive) (82.1%).

2 modelsMultimodal & Grounded5% of category scoreRefreshingUpdated May 22, 2026

About CharXiv w/o tools

Year

2024

Tasks

Scientific chart reasoning (tool-free)

Format

Chart understanding without tools

Difficulty

Scientific visualization reasoning

The tool-free CharXiv variant measures pure multimodal reasoning. Mythos Preview scores 86.1% without tools vs 93.2% with tools, demonstrating strong baseline chart reasoning.

BenchLM freshness & provenance

Version

CharXiv w/o tools 2024

Refresh cadence

Annual

Staleness state

Refreshing

Question availability

Public benchmark set

Refreshing

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Leaderboard (2 models)

1
86.1%
2
82.1%

FAQ

What does CharXiv w/o tools measure?

Tool-free variant of CharXiv that isolates raw visual reasoning ability without code execution or tool augmentation.

Which model scores highest on CharXiv w/o tools?

Claude Mythos Preview by Anthropic currently leads with a score of 86.1% on CharXiv w/o tools.

How many models are evaluated on CharXiv w/o tools?

2 AI models have been evaluated on CharXiv w/o tools on BenchLM.

Compare Top Models on CharXiv w/o tools

Last updated: May 22, 2026 · BenchLM version CharXiv w/o tools 2024

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.