The BenchLM dataset
BenchLM tracks 257 AI language models across 237 benchmarks, alongside API pricing, context windows, and runtime metrics. Everything rendered on the site is available as machine-readable JSON.
Last verified: June 12, 2026
What the dataset contains
- Benchmark scores — per-model results across coding, agentic, reasoning, knowledge, math, multilingual, instruction-following, and multimodal benchmarks, each tagged with provenance (official, verified, reported, or estimated).
- Two ranking lanes — a provisional leaderboard that includes well-sourced public rows, and a stricter verified leaderboard limited to exact-source coverage.
- Pricing — input/output cost per million tokens for every tracked model with a public API.
- Runtime metrics — median tokens per second and first-answer latency, sourced from Artificial Analysis.
- Release metadata — release dates, model families, variants, and supersession chains.
Machine-readable files
Also available: llms.txt and llms-full.txt for LLM crawlers, RSS / Atom feeds, and updates.json for data-change events.
Update cadence
Benchmark scores, pricing, and runtime metrics are refreshed continuously as sources publish new results — typically several times per week, and within hours for major model launches. Every data file carries the same build timestamp as the site itself, and the visible “Last verified” date on ranking pages reflects the most recent data sync (June 12, 2026).
Licensing & citation
The BenchLM dataset is free to use under the MIT license. You may reproduce tables, scores, and charts in articles, papers, and products. We ask for attribution with a link:
Source: BenchLM.ai (https://benchlm.ai), retrieved June 12, 2026
Underlying benchmark results belong to their original publishers; BenchLM aggregates, normalizes, and verifies them. See the methodology page for how scores are weighted and verified, and benchmark confidence for per-source provenance.
Embedding
Want a live leaderboard in your article or dashboard? Use the embeddable widget — it stays current automatically and includes attribution.