Best Value LLM for Coding in 2026 — Cost-Adjusted Rankings

Raw benchmark scores only tell half the story. This ranking divides each model's weighted coding score by its output token price, surfacing models that deliver the most coding capability per dollar spent. A model scoring 70 at $1/1M tokens outranks one scoring 80 at $15/1M tokens here — because for the same budget you get far more coding work done. Use this alongside the standard coding leaderboard to find the sweet spot between performance and cost for your coding workflows.

Unless noted otherwise, ranking surfaces on this page use BenchLM's provisional leaderboard lane rather than the stricter sourced-only verified leaderboard.

Bottom line: Gemini 3.1 Flash-Lite dominates coding value at $0.40/1M output. DeepSeek Coder 2.0 offers the best absolute coding performance per dollar among serious coding models.

According to BenchLM.ai, DeepSeek V4 Flash (High) leads this ranking with a score of 300.15, followed by DeepSeek V4 Flash (Max) (297.96) and DeepSeek V4 Flash (228.21). There is a significant gap between the leading models and the rest of the field.

The best open-weight option is DeepSeek V4 Flash (High) (ranked #1 with a score of 300.15). Open-weight models are highly competitive in this category — self-hosting is a viable alternative to proprietary APIs.

This ranking is based on provisional weighted averages across the scoring benchmarks in coding tracked by BenchLM.ai. For detailed model profiles, click any model name below. To compare two specific models head-to-head, use the "vs #" links.

1Open

DeepSeek V4 Flash (High)

DeepSeek · 1M

300.15Score/$

Score: 84 · $0.28/1M

2Open

DeepSeek V4 Flash (Max)

DeepSeek · 1M

297.96Score/$

Score: 83.4 · $0.28/1M

3Open

DeepSeek V4 Flash

DeepSeek · 1M

228.21Score/$

Score: 63.9 · $0.28/1M

What changed

Gemini 3.1 Flash-Lite leads coding value — most coding capability per dollar spent.

DeepSeek Coder 2.0 best value among dedicated coding models with strong raw scores.

Gemini 2.5 Flash good coding value with broader model capabilities.

How to choose

Maximum coding per dollar?

Gemini 3.1 Flash-Lite — lowest cost with decent coding

Strong coding at good value?

DeepSeek Coder 2.0 — best raw coding scores per dollar

Best raw coding regardless of cost?

See the standard coding leaderboard

Full Rankings (52 models)

DeepSeek V4 Flash (High)

DeepSeek·Open Weight·1M

300.15

Score/$

Score: 84 · $0.28/1M

vs #2

DeepSeek V4 Flash (Max)

DeepSeek·Open Weight·1M

297.96

Score/$

Score: 83.4 · $0.28/1M

vs #3

DeepSeek V4 Flash

DeepSeek·Open Weight·1M

228.21

Score/$

Score: 63.9 · $0.28/1M

vs #4

DeepSeek V3.2

DeepSeek·Open Weight·128K

140.1

Score/$

Score: 58.8 · $0.42/1M

vs #5

Grok 4.1 Fast

xAI·Proprietary·1M

122.04

Score/$

Score: 61 · $0.5/1M

vs #6

MiniMax M2.7

MiniMax·Open Weight·200K

49.06

Score/$

Score: 58.9 · $1.2/1M

vs #7

Grok Code Fast 1

xAI·Proprietary·256K

42.66

Score/$

Score: 64 · $1.5/1M

vs #8

Kimi K2.5 (Reasoning)

Moonshot AI·Proprietary·128K

Score/$

Score: 87 · $3/1M

vs #9

Kimi K2.5

Moonshot AI·Open Weight·256K

27.52

Score/$

Score: 82.6 · $3/1M

vs #10

Mistral Large 3

Mistral·Proprietary·128K

26.13

Score/$

Score: 39.2 · $1.5/1M

vs #11

DeepSeek V4 Pro (Max)

DeepSeek·Open Weight·1M

Score/$

Score: 90.5 · $3.48/1M

vs #12

DeepSeek V4 Pro (High)

DeepSeek·Open Weight·1M

25.05

Score/$

Score: 87.2 · $3.48/1M

vs #13

DeepSeek V3.2 (Thinking)

DeepSeek·Open Weight·128K

24.39

Score/$

Score: 53.4 · $2.19/1M

vs #14

GLM-5

Z.AI·Open Weight·200K

24.07

Score/$

Score: 77 · $3.2/1M

vs #15

GLM-5 (Reasoning)

Z.AI·Open Weight·200K

23.36

Score/$

Score: 74.7 · $3.2/1M

vs #16

Kimi K2.6

Moonshot AI·Open Weight·256K

22.28

Score/$

Score: 89.1 · $4/1M

vs #17

Gemini 3 Flash

Google·Proprietary·1M

19.41

Score/$

Score: 58.2 · $3/1M

vs #18

DeepSeek V4 Pro

DeepSeek·Open Weight·1M

19.11

Score/$

Score: 66.5 · $3.48/1M

vs #19

GLM-5.1

Z.AI·Open Weight·203K

19.06

Score/$

Score: 83.9 · $4.4/1M

vs #20

Qwen3.5 397B

Alibaba·Open Weight·128K

18.98

Score/$

Score: 68.3 · $3.6/1M

vs #21

Gemini 3.1 Flash-Lite

Google·Proprietary·1M

18.02

Score/$

Score: 27 · $1.5/1M

vs #22

DeepSeek-R1

DeepSeek·Open Weight·128K

12.65

Score/$

Score: 27.7 · $2.19/1M

vs #23

GLM-4.5-Air

Z.AI·Proprietary·128K

11.9

Score/$

Score: 13.1 · $1.1/1M

vs #24

Claude Haiku 4.5

Anthropic·Proprietary·200K

10.86

Score/$

Score: 54.3 · $5/1M

vs #25

OpenAI·Proprietary·200K

8.33

Score/$

Score: 66.7 · $8/1M

vs #26

GPT-5.1

OpenAI·Proprietary·200K

7.87

Score/$

Score: 78.7 · $10/1M

vs #27

Gemini 3.1 Pro

Google·Proprietary·1M

7.82

Score/$

Score: 93.9 · $12/1M

vs #28

GPT-5 (high)

OpenAI·Proprietary·128K

7.18

Score/$

Score: 71.8 · $10/1M

vs #29

Gemini 2.5 Flash

Google·Proprietary·1M

6.82

Score/$

Score: 17.1 · $2.5/1M

vs #30

Claude 3 Haiku

Anthropic·Proprietary·200K

6.6

Score/$

Score: 8.2 · $1.25/1M

vs #31

GPT-5.3 Codex

OpenAI·Proprietary·400K

6.34

Score/$

Score: 88.7 · $14/1M

vs #32

Gemini 3 Pro

Google·Proprietary·2M

6.17

Score/$

Score: 74.1 · $12/1M

vs #33

GPT-5.4

OpenAI·Proprietary·1.05M

5.95

Score/$

Score: 89.3 · $15/1M

vs #34

GPT-5.2

OpenAI·Proprietary·400K

5.89

Score/$

Score: 82.4 · $14/1M

vs #35

GPT-5.2-Codex

OpenAI·Proprietary·400K

5.68

Score/$

Score: 79.5 · $14/1M

vs #36

Claude Sonnet 4.6

Anthropic·Proprietary·200K

5.54

Score/$

Score: 83.1 · $15/1M

vs #37

Claude Sonnet 4.5

Anthropic·Proprietary·200K

5.29

Score/$

Score: 79.3 · $15/1M

vs #38

Gemini 2.5 Pro

Google·Proprietary·1M

Score/$

Score: 50 · $10/1M

vs #39

GLM-4.5

Z.AI·Proprietary·128K

4.26

Score/$

Score: 9.4 · $2.2/1M

vs #40

Claude Opus 4.7 (Adaptive)

Anthropic·Proprietary·1M

3.81

Score/$

Score: 95.3 · $25/1M

vs #41

Claude 4 Sonnet

Anthropic·Proprietary·200K

3.65

Score/$

Score: 54.7 · $15/1M

vs #42

Claude Opus 4.6

Anthropic·Proprietary·1M

3.47

Score/$

Score: 86.9 · $25/1M

vs #43

Gemini 1.5 Pro

Google·Proprietary·2M

3.22

Score/$

Score: 16.1 · $5/1M

vs #44

Claude Opus 4.5

Anthropic·Proprietary·200K

3.11

Score/$

Score: 77.6 · $25/1M

vs #45

Claude 3.5 Sonnet

Anthropic·Proprietary·200K

2.78

Score/$

Score: 41.7 · $15/1M

vs #46

GPT-4o

OpenAI·Proprietary·128K

2.6

Score/$

Score: 26 · $10/1M

vs #47

o1-preview

OpenAI·Proprietary·200K

1.34

Score/$

Score: 80.5 · $60/1M

vs #48

Claude Mythos Preview

Anthropic·Proprietary·1M

0.8

Score/$

Score: 100 · $125/1M

vs #49

Claude 4.1 Opus

Anthropic·Proprietary·200K

0.77

Score/$

Score: 57.6 · $75/1M

vs #50

o3-pro

OpenAI·Proprietary·200K

0.7

Score/$

Score: 55.8 · $80/1M

vs #51

GPT-4 Turbo

OpenAI·Proprietary·128K

0.43

Score/$

Score: 13 · $30/1M

vs #52

Claude 3 Opus

Anthropic·Proprietary·200K

0.19

Score/$

Score: 14.2 · $75/1M

These rankings update weekly

Get notified when models move. One email a week with what changed and why.

Free. No spam. Unsubscribe anytime.

Key Takeaways

The best value model is DeepSeek V4 Flash (High) by DeepSeek with a provisional Score/$ ratio of 300.15 (score: 84, output: $0.28/1M tokens).

The best open-weight model is DeepSeek V4 Flash (High) at position #1.

52 models are included in this ranking.

Score in Context

What these scores mean

Value scores divide the weighted coding score by output token price (per 1M tokens). Higher means more capability per dollar. Models with no listed price are excluded.

Known limitations

Value rankings favor cheap models even if absolute performance is modest. A model scoring half as well at one-tenth the price wins on value — but may not meet your quality bar. Always check raw scores alongside value rankings.

Explore More

Price vs Performance Chart Compare Pricing Which LLM Should I Use? Benchmark Explainers

Last updated: May 13, 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.