Last verified: July 29, 2026

What the dataset contains

Benchmark scores — per-model results across coding, agentic, reasoning, knowledge, math, multilingual, instruction-following, and multimodal benchmarks, each tagged with provenance (official, verified, reported, or estimated).
Two ranking lanes — a provisional leaderboard that includes well-sourced public rows, and a stricter verified leaderboard limited to exact-source coverage.
Pricing — input/output cost per million tokens for every tracked model with a public API.
Runtime metrics — median tokens per second and first-answer latency, sourced from Artificial Analysis.
Release metadata — release dates, model families, variants, and supersession chains.

Machine-readable files

leaderboard.jsonRanked models with overall and per-category scores, provisional and verified lanes.

models.jsonFull model catalog: benchmark scores, context windows, release metadata, provenance.

benchmarks.jsonBenchmark definitions, category weights, and per-benchmark metadata.

pricing.jsonAPI pricing per million input/output tokens for every tracked model.

speed.jsonRuntime metrics: tokens per second and first-answer latency.

comparisons.jsonPrecomputed head-to-head model comparison data.

Also available: llms.txt and llms-full.txt for LLM crawlers, RSS / Atom feeds, and updates.json for data-change events.

Update cadence

Benchmark scores, pricing, and runtime metrics are refreshed continuously as sources publish new results — typically several times per week, and within hours for major model launches. Every data file carries the same build timestamp as the site itself, and the visible “Last verified” date on ranking pages reflects the most recent data sync (July 29, 2026).

Licensing & citation

The BenchLM dataset is free to use under the MIT license. You may reproduce tables, scores, and charts in articles, papers, and products. We ask for attribution with a link:

Source: BenchLM.ai (https://benchlm.ai), retrieved July 29, 2026

Underlying benchmark results belong to their original publishers; BenchLM aggregates, normalizes, and verifies them. See the methodology page for how scores are weighted and verified, and benchmark confidence for per-source provenance.

Embedding

Want a live leaderboard in your article or dashboard? Use the embeddable widget — it stays current automatically and includes attribution.