CharXiv Reasoning without tools (CharXiv w/o tools)

Tool-free variant of CharXiv that isolates raw visual reasoning ability without code execution or tool augmentation.

Benchmark score on CharXiv w/o tools — April 7, 2026

BenchLM mirrors the published score view for CharXiv w/o tools. Claude Mythos Preview leads the public snapshot at 86.1%. BenchLM does not use these results to rank models overall.

1 modelsMultimodal & GroundedRefreshingDisplay onlyUpdated April 7, 2026

About CharXiv w/o tools

Year

2024

Tasks

Scientific chart reasoning (tool-free)

Format

Chart understanding without tools

Difficulty

Scientific visualization reasoning

The tool-free CharXiv variant measures pure multimodal reasoning. Mythos Preview scores 86.1% without tools vs 93.2% with tools, demonstrating strong baseline chart reasoning.

BenchLM freshness & provenance

Version

CharXiv w/o tools 2024

Refresh cadence

Annual

Staleness state

Refreshing

Question availability

Public benchmark set

RefreshingDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (1 models)

#1
86.1%

FAQ

What does CharXiv w/o tools measure?

Tool-free variant of CharXiv that isolates raw visual reasoning ability without code execution or tool augmentation.

Which model scores highest on CharXiv w/o tools?

Claude Mythos Preview by Anthropic currently leads with a score of 86.1% on CharXiv w/o tools.

How many models are evaluated on CharXiv w/o tools?

1 AI models have been evaluated on CharXiv w/o tools on BenchLM.

Last updated: April 7, 2026 · BenchLM version CharXiv w/o tools 2024

Weekly LLM Benchmark Digest

Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.

Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.