LLM Price vs Performance Chart
Find the most cost-effective AI model. Each dot is an LLM plotted by its provisional benchmark score (higher is better) against output token price (lower is better). Models on the efficiency frontier offer the best value at their price point.
DeepSeek V4 Flash (Max)
Score/$: 271.4 · $0.28/1M out
Claude Mythos Preview
Score: 99 · $125.00/1M out
DeepSeek V4 Flash (Max)
Score: 76 · $0.28/1M out
Top 10 Best Value Models (Overall)
Ranked by Score/$ ratio (benchmark score per dollar of output token cost)
| # | Model | Score | Output $/1M | Score/$ |
|---|---|---|---|---|
| 1 | DeepSeek V4 Flash (Max) DeepSeek | 76 | $0.28 | 271.4 |
| 2 | DeepSeek V4 Flash (High) DeepSeek | 71 | $0.28 | 253.6 |
| 3 | DeepSeek V4 Flash DeepSeek | 59 | $0.28 | 210.7 |
| 4 | Grok 4.1 Fast xAI | 70 | $0.50 | 140.0 |
| 5 | DeepSeek V3.2 DeepSeek | 58 | $0.42 | 138.1 |
| 6 | GPT-4o mini OpenAI | 50 | $0.60 | 83.3 |
| 7 | GPT-4.1 nano OpenAI | 27 | $0.40 | 67.5 |
| 8 | MiniMax M2.7 MiniMax | 62 | $1.20 | 51.7 |
| 9 | DeepSeek V3 DeepSeek | 36 | $1.10 | 32.7 |
| 10 | Mistral Large 3 Mistral | 49 | $1.50 | 32.7 |
Compare all LLM API prices side by side
Cost-adjusted coding model rankings
Cost-adjusted agentic model rankings
Frequently Asked Questions
What is the LLM price-performance chart?
This chart plots each AI model by its benchmark score (vertical axis) against its API output price per million tokens (horizontal axis). Models in the upper-left quadrant offer the best value — high performance at low cost. The efficiency frontier line connects the best-value models at each price point.
What is the efficiency frontier?
The efficiency frontier (Pareto frontier) connects models where no other model offers both a higher score and a lower price. Models on this line represent the optimal price-performance tradeoff. If a model is below and to the right of the frontier, there exists a cheaper model with a better score.
Which LLM has the best price-to-performance ratio?
Currently, DeepSeek V4 Flash (Max) by DeepSeek offers the best overall value with a Score/$ ratio of 271.4. This means you get 271.4 benchmark points per dollar of output token cost.
How are scores calculated?
Overall scores shown in this chart use BenchLM's provisional ranking lane: a normalized weighted average across 8 benchmark categories, with bounded external calibration. The verified leaderboard is stricter and sourced-only, but this price-performance surface intentionally stays broader so value comparisons cover more models.
The AI models change fast. We track them for you.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.