Skip to main content

CharXiv Reasoning without tools (CharXiv w/o tools)

Tool-free variant of CharXiv that isolates raw visual reasoning ability without code execution or tool augmentation.

Benchmark score on CharXiv w/o tools — April 10, 2026

BenchLM mirrors the published score view for CharXiv w/o tools. Claude Mythos Preview leads the public snapshot at 86.1%. BenchLM does not use these results to rank models overall.

1 modelsMultimodal & GroundedRefreshingDisplay onlyUpdated April 10, 2026

About CharXiv w/o tools

Year

2024

Tasks

Scientific chart reasoning (tool-free)

Format

Chart understanding without tools

Difficulty

Scientific visualization reasoning

The tool-free CharXiv variant measures pure multimodal reasoning. Mythos Preview scores 86.1% without tools vs 93.2% with tools, demonstrating strong baseline chart reasoning.

BenchLM freshness & provenance

Version

CharXiv w/o tools 2024

Refresh cadence

Annual

Staleness state

Refreshing

Question availability

Public benchmark set

RefreshingDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (1 models)

1
86.1%

FAQ

What does CharXiv w/o tools measure?

Tool-free variant of CharXiv that isolates raw visual reasoning ability without code execution or tool augmentation.

Which model scores highest on CharXiv w/o tools?

Claude Mythos Preview by Anthropic currently leads with a score of 86.1% on CharXiv w/o tools.

How many models are evaluated on CharXiv w/o tools?

1 AI models have been evaluated on CharXiv w/o tools on BenchLM.

Last updated: April 10, 2026 · BenchLM version CharXiv w/o tools 2024

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.