Raw benchmark scores only tell half the story. This ranking divides each model's weighted coding score by its output token price, surfacing models that deliver the most coding capability per dollar spent. A model scoring 70 at $1/1M tokens outranks one scoring 80 at $15/1M tokens here — because for the same budget you get far more coding work done. Use this alongside the standard coding leaderboard to find the sweet spot between performance and cost for your coding workflows.
Unless noted otherwise, ranking surfaces on this page use BenchLM's provisional leaderboard lane rather than the stricter sourced-only verified leaderboard.
Bottom line: Gemini 3.1 Flash-Lite dominates coding value at $0.40/1M output. DeepSeek Coder 2.0 offers the best absolute coding performance per dollar among serious coding models.
According to BenchLM.ai, Gemini 3.1 Flash-Lite leads this ranking with a score of 71.38, followed by DeepSeek Coder 2.0 (56.49) and MiniMax M2.7 (49.73). There is a significant gap between the leading models and the rest of the field.
The best open-weight option is DeepSeek Coder 2.0 (ranked #2 with a score of 56.49). Open-weight models are highly competitive in this category — self-hosting is a viable alternative to proprietary APIs.
This ranking is based on provisional weighted averages across the scoring benchmarks in coding tracked by BenchLM.ai. For detailed model profiles, click any model name below. To compare two specific models head-to-head, use the "vs #" links.
Gemini 3.1 Flash-Lite
Google · 1M
Score: 28.6 · $0.4/1M
Best coding value. Extremely low cost at $0.40/1M output tokens.
DeepSeek Coder 2.0
DeepSeek · 128K
Score: 62.1 · $1.1/1M
Best value among serious coding models. Strong raw coding scores.
MiniMax M2.7
MiniMax · 200K
Score: 59.7 · $1.2/1M
Gemini 3.1 Flash-Lite leads coding value — most coding capability per dollar spent.
DeepSeek Coder 2.0 best value among dedicated coding models with strong raw scores.
Gemini 2.5 Flash good coding value with broader model capabilities.
Get notified when models move. One email a week with what changed and why.
Free. No spam. Unsubscribe anytime.
The best value model is Gemini 3.1 Flash-Lite by Google with a provisional Score/$ ratio of 71.38 (score: 28.6, output: $0.4/1M tokens).
The best open-weight model is DeepSeek Coder 2.0 at position #2.
25 models are included in this ranking.
Value scores divide the weighted coding score by output token price (per 1M tokens). Higher means more capability per dollar. Models with no listed price are excluded.
Value rankings favor cheap models even if absolute performance is modest. A model scoring half as well at one-tenth the price wins on value — but may not meet your quality bar. Always check raw scores alongside value rankings.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.