Tool-free variant of CharXiv that isolates raw visual reasoning ability without code execution or tool augmentation.
As of May 22, 2026, Claude Mythos Preview leads the CharXiv w/o tools leaderboard with 86.1% , followed by Claude Opus 4.7 (Adaptive) (82.1%).
Claude Mythos Preview
Anthropic
Claude Opus 4.7 (Adaptive)
Anthropic
Year
2024
Tasks
Scientific chart reasoning (tool-free)
Format
Chart understanding without tools
Difficulty
Scientific visualization reasoning
The tool-free CharXiv variant measures pure multimodal reasoning. Mythos Preview scores 86.1% without tools vs 93.2% with tools, demonstrating strong baseline chart reasoning.
Version
CharXiv w/o tools 2024
Refresh cadence
Annual
Staleness state
Refreshing
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
Tool-free variant of CharXiv that isolates raw visual reasoning ability without code execution or tool augmentation.
Claude Mythos Preview by Anthropic currently leads with a score of 86.1% on CharXiv w/o tools.
2 AI models have been evaluated on CharXiv w/o tools on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.