Best Value LLM for Reasoning in 2026 — Cost-Adjusted Rankings

Reasoning models tend to be the most expensive tier — they use chain-of-thought, produce more output tokens, and are priced accordingly. This ranking divides each model's weighted reasoning score by output token price, revealing which models deliver the best abstract reasoning, long-context comprehension, and multi-step logic per dollar. For applications that need strong reasoning without frontier-model budgets, the value leaders here are worth serious consideration.

Unless noted otherwise, ranking surfaces on this page use BenchLM's provisional leaderboard lane rather than the stricter sourced-only verified leaderboard.

Bottom line: Reasoning models are expensive — chain-of-thought generates more output tokens. GPT-4.1 nano and Gemini 3.1 Flash-Lite offer the best reasoning per dollar.

According to BenchLM.ai, Grok 4.1 Fast leads this ranking with a score of 177.42, followed by GPT-4.1 nano (169.45) and DeepSeek V3.2 (114.56). There is a significant gap between the leading models and the rest of the field.

The best open-weight option is DeepSeek V3.2 (ranked #3 with a score of 114.56). Open-weight models are highly competitive in this category — self-hosting is a viable alternative to proprietary APIs.

This ranking is based on provisional weighted averages across the scoring benchmarks in reasoning tracked by BenchLM.ai. For detailed model profiles, click any model name below. To compare two specific models head-to-head, use the "vs #" links.

1Closed

Grok 4.1 Fast

xAI · 1M

177.42Score/$

Score: 88.7 · $0.5/1M

2Closed

GPT-4.1 nano

OpenAI · 1M

169.45Score/$

Score: 67.8 · $0.4/1M

Best reasoning value. Strong reasoning scores at $0.40/1M output.

3Open

DeepSeek V3.2

DeepSeek · 128K

114.56Score/$

Score: 48.1 · $0.42/1M

What changed

GPT-4.1 nano leads reasoning value — best reasoning capability per dollar.

Gemini 3.1 Flash-Lite close second on reasoning value at the lowest price point.

Gemini 2.5 Flash good reasoning value with broader capabilities.

How to choose

Maximum reasoning per dollar?

GPT-4.1 nano — best reasoning value score

Reasoning on the tightest budget?

Gemini 3.1 Flash-Lite — lowest output price

Best raw reasoning regardless of cost?

See the standard reasoning leaderboard

Full Rankings (52 models)

Grok 4.1 Fast

xAI·Proprietary·1M

177.42

Score/$

Score: 88.7 · $0.5/1M

vs #2

GPT-4.1 nano

OpenAI·Proprietary·1M

169.45

Score/$

Score: 67.8 · $0.4/1M

vs #3

DeepSeek V3.2

DeepSeek·Open Weight·128K

114.56

Score/$

Score: 48.1 · $0.42/1M

vs #4

GPT-4.1 mini

OpenAI·Proprietary·1M

50.46

Score/$

Score: 80.7 · $1.6/1M

vs #5

Gemini 3.1 Flash-Lite

Google·Proprietary·1M

39.69

Score/$

Score: 59.5 · $1.5/1M

vs #6

Mistral Large 3

Mistral·Proprietary·128K

35.01

Score/$

Score: 52.5 · $1.5/1M

vs #7

GPT-4o mini

OpenAI·Proprietary·128K

34.64

Score/$

Score: 20.8 · $0.6/1M

vs #8

Claude 3 Haiku

Anthropic·Proprietary·200K

30.07

Score/$

Score: 37.6 · $1.25/1M

vs #9

Grok Code Fast 1

xAI·Proprietary·256K

28.93

Score/$

Score: 43.4 · $1.5/1M

vs #10

GLM-5 (Reasoning)

Z.AI·Open Weight·200K

27.52

Score/$

Score: 88.1 · $3.2/1M

vs #11

DeepSeek V3.2 (Thinking)

DeepSeek·Open Weight·128K

25.34

Score/$

Score: 55.5 · $2.19/1M

vs #12

Kimi K2.5 (Reasoning)

Moonshot AI·Proprietary·128K

23.26

Score/$

Score: 69.8 · $3/1M

vs #13

Qwen3.5 397B (Reasoning)

Alibaba·Open Weight·128K

22.82

Score/$

Score: 82.2 · $3.6/1M

vs #14

Gemini 3 Flash

Google·Proprietary·1M

22.14

Score/$

Score: 66.4 · $3/1M

vs #15

GLM-5

Z.AI·Open Weight·200K

Score/$

Score: 60.8 · $3.2/1M

vs #16

Kimi K2.5

Moonshot AI·Open Weight·256K

18.28

Score/$

Score: 54.8 · $3/1M

vs #17

Gemini 2.5 Flash

Google·Proprietary·1M

17.8

Score/$

Score: 44.5 · $2.5/1M

vs #18

Qwen3.5 397B

Alibaba·Open Weight·128K

16.4

Score/$

Score: 59 · $3.6/1M

vs #19

o3-mini

OpenAI·Proprietary·200K

15.57

Score/$

Score: 68.5 · $4.4/1M

vs #20

GLM-5.1

Z.AI·Open Weight·203K

14.69

Score/$

Score: 64.6 · $4.4/1M

vs #21

GLM-4.5-Air

Z.AI·Proprietary·128K

13.69

Score/$

Score: 15.1 · $1.1/1M

vs #22

Claude Haiku 4.5

Anthropic·Proprietary·200K

11.84

Score/$

Score: 59.2 · $5/1M

vs #23

DeepSeek-R1

DeepSeek·Open Weight·128K

10.79

Score/$

Score: 23.6 · $2.19/1M

vs #24

Gemini 1.5 Pro

Google·Proprietary·2M

10.66

Score/$

Score: 53.3 · $5/1M

vs #25

GPT-4.1

OpenAI·Proprietary·1M

9.67

Score/$

Score: 77.4 · $8/1M

vs #26

GLM-4.5

Z.AI·Proprietary·128K

9.25

Score/$

Score: 20.3 · $2.2/1M

vs #27

Gemini 3.5 Flash

Google·Proprietary·1M

Score/$

Score: 81 · $9/1M

vs #28

GPT-5.1-Codex-Max

OpenAI·Proprietary·400K

8.87

Score/$

Score: 88.7 · $10/1M

vs #29

Gemini 3.1 Pro

Google·Proprietary·1M

8.09

Score/$

Score: 97.1 · $12/1M

vs #30

GPT-5 (high)

OpenAI·Proprietary·128K

7.73

Score/$

Score: 77.3 · $10/1M

vs #31

OpenAI·Proprietary·200K

6.96

Score/$

Score: 55.7 · $8/1M

vs #32

Gemini 3 Pro

Google·Proprietary·2M

6.92

Score/$

Score: 83 · $12/1M

vs #33

GPT-5.1

OpenAI·Proprietary·200K

6.74

Score/$

Score: 67.4 · $10/1M

vs #34

GPT-5.3 Codex

OpenAI·Proprietary·400K

6.67

Score/$

Score: 93.4 · $14/1M

vs #35

GPT-5.4

OpenAI·Proprietary·1.05M

6.37

Score/$

Score: 95.6 · $15/1M

vs #36

GPT-5.2-Codex

OpenAI·Proprietary·400K

6.32

Score/$

Score: 88.4 · $14/1M

vs #37

GPT-5.2

OpenAI·Proprietary·400K

5.99

Score/$

Score: 83.9 · $14/1M

vs #38

Gemini 2.5 Pro

Google·Proprietary·1M

5.96

Score/$

Score: 59.6 · $10/1M

vs #39

Claude Sonnet 4.6

Anthropic·Proprietary·200K

5.55

Score/$

Score: 83.2 · $15/1M

vs #40

GPT-4o

OpenAI·Proprietary·128K

4.7

Score/$

Score: 47 · $10/1M

vs #41

Claude Sonnet 4.5

Anthropic·Proprietary·200K

4.17

Score/$

Score: 62.5 · $15/1M

vs #42

Claude 4 Sonnet

Anthropic·Proprietary·200K

3.67

Score/$

Score: 55 · $15/1M

vs #43

Claude Opus 4.6

Anthropic·Proprietary·1M

3.54

Score/$

Score: 88.4 · $25/1M

vs #44

Claude 3.5 Sonnet

Anthropic·Proprietary·200K

3.34

Score/$

Score: 50.1 · $15/1M

vs #45

Claude Opus 4.5

Anthropic·Proprietary·200K

2.81

Score/$

Score: 70.2 · $25/1M

vs #46

o1-preview

OpenAI·Proprietary·200K

1.48

Score/$

Score: 88.9 · $60/1M

vs #47

GPT-4 Turbo

OpenAI·Proprietary·128K

1.3

Score/$

Score: 38.9 · $30/1M

vs #48

OpenAI·Proprietary·200K

1.26

Score/$

Score: 75.4 · $60/1M

vs #49

o3-pro

OpenAI·Proprietary·200K

0.89

Score/$

Score: 71.3 · $80/1M

vs #50

Claude 4.1 Opus

Anthropic·Proprietary·200K

0.75

Score/$

Score: 56.1 · $75/1M

vs #51

Claude 3 Opus

Anthropic·Proprietary·200K

0.58

Score/$

Score: 43.5 · $75/1M

vs #52

o1-pro

OpenAI·Proprietary·200K

0.06

Score/$

Score: 33.7 · $600/1M

These rankings update weekly

Get notified when models move. One email a week with what changed and why.

Free. No spam. Unsubscribe anytime.

Key Takeaways

The best value model is Grok 4.1 Fast by xAI with a provisional Score/$ ratio of 177.42 (score: 88.7, output: $0.5/1M tokens).

The best open-weight model is DeepSeek V3.2 at position #3.

52 models are included in this ranking.

Score in Context

What these scores mean

Value scores divide the weighted reasoning score by output token price (per 1M tokens). Higher means more capability per dollar. Models with no listed price are excluded.

Known limitations

Value rankings favor cheap models even if absolute performance is modest. A model scoring half as well at one-tenth the price wins on value — but may not meet your quality bar. Always check raw scores alongside value rankings.

Explore More

Price vs Performance Chart Compare Pricing Which LLM Should I Use? Benchmark Explainers

Last updated: May 20, 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.