Tool-free variant of CharXiv that isolates raw visual reasoning ability without code execution or tool augmentation.
BenchLM mirrors the published score view for CharXiv w/o tools. Claude Mythos Preview leads the public snapshot at 86.1%. BenchLM does not use these results to rank models overall.
Year
2024
Tasks
Scientific chart reasoning (tool-free)
Format
Chart understanding without tools
Difficulty
Scientific visualization reasoning
The tool-free CharXiv variant measures pure multimodal reasoning. Mythos Preview scores 86.1% without tools vs 93.2% with tools, demonstrating strong baseline chart reasoning.
Version
CharXiv w/o tools 2024
Refresh cadence
Annual
Staleness state
Refreshing
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
Tool-free variant of CharXiv that isolates raw visual reasoning ability without code execution or tool augmentation.
Claude Mythos Preview by Anthropic currently leads with a score of 86.1% on CharXiv w/o tools.
1 AI models have been evaluated on CharXiv w/o tools on BenchLM.
Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.
Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.