Tool-free variant of CharXiv that isolates raw visual reasoning ability without code execution or tool augmentation.
BenchLM mirrors the published score view for CharXiv w/o tools. Claude Mythos Preview leads the public snapshot at 86.1%. BenchLM does not use these results to rank models overall.
Year
2024
Tasks
Scientific chart reasoning (tool-free)
Format
Chart understanding without tools
Difficulty
Scientific visualization reasoning
The tool-free CharXiv variant measures pure multimodal reasoning. Mythos Preview scores 86.1% without tools vs 93.2% with tools, demonstrating strong baseline chart reasoning.
Version
CharXiv w/o tools 2024
Refresh cadence
Annual
Staleness state
Refreshing
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
Tool-free variant of CharXiv that isolates raw visual reasoning ability without code execution or tool augmentation.
Claude Mythos Preview by Anthropic currently leads with a score of 86.1% on CharXiv w/o tools.
1 AI models have been evaluated on CharXiv w/o tools on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.