Best Value LLM for Reasoning in 2026 — Cost-Adjusted Rankings

Reasoning models tend to be the most expensive tier — they use chain-of-thought, produce more output tokens, and are priced accordingly. This ranking divides each model's weighted reasoning score by output token price, revealing which models deliver the best abstract reasoning, long-context comprehension, and multi-step logic per dollar. For applications that need strong reasoning without frontier-model budgets, the value leaders here are worth serious consideration.

According to BenchLM.ai, GPT-4.1 nano leads this ranking with a score of 185.23, followed by Gemini 3.1 Flash-Lite (168.5) and Gemini 2.5 Flash (103.56). There is a significant gap between the leading models and the rest of the field.

The best open-weight option is DeepSeek Coder 2.0 (ranked #5 with a score of 66.48). While proprietary models lead, open-weight options are within striking distance for teams willing to trade a few points of performance for full model control.

This ranking is based on weighted averages across the scoring benchmarks in reasoning tracked by BenchLM.ai. For detailed model profiles, click any model name below. To compare two specific models head-to-head, use the "vs #" links.

GPT-4.1 nano
OpenAIProprietary1M

185.23

Score/$

Score: 74.1 · $0.4/1M

Gemini 3.1 Flash-Lite
GoogleProprietary1M

168.5

Score/$

Score: 67.4 · $0.4/1M

Gemini 2.5 Flash
GoogleProprietary1M

103.56

Score/$

Score: 62.1 · $0.6/1M

4
GPT-4o mini
OpenAIProprietary128K

82.42

Score/$

Score: 49.5 · $0.6/1M

5
DeepSeek Coder 2.0
DeepSeekOpen Weight128K

66.48

Score/$

Score: 73.1 · $1.1/1M

6
GPT-4.1 mini
OpenAIProprietary1M

50.57

Score/$

Score: 80.9 · $1.6/1M

7
Gemini 3 Flash
GoogleProprietary1M

24.22

Score/$

Score: 72.7 · $3/1M

8
Kimi K2.5
Moonshot AIOpen Weight128K

23.9

Score/$

Score: 66.9 · $2.8/1M

9
o3-mini
OpenAIProprietary200K

18.43

Score/$

Score: 81.1 · $4.4/1M

10
DeepSeek-R1
DeepSeekOpen Weight128K

18.25

Score/$

Score: 40 · $2.19/1M

11
Gemini 3.1 Pro
GoogleProprietary1M

17.66

Score/$

Score: 88.3 · $5/1M

12
Claude Haiku 4.5
AnthropicProprietary200K

17.23

Score/$

Score: 68.9 · $4/1M

13
Gemini 2.5 Pro
GoogleProprietary1M

12.36

Score/$

Score: 61.8 · $5/1M

14
GPT-5.1
OpenAIProprietary200K

11.47

Score/$

Score: 68.8 · $6/1M

15
GPT-5.1-Codex-Max
OpenAIProprietary400K

11.44

Score/$

Score: 91.5 · $8/1M

16
GPT-5.2-Codex
OpenAIProprietary400K

11.39

Score/$

Score: 91.1 · $8/1M

17
Mistral Large 3
MistralProprietary128K

11.34

Score/$

Score: 68.1 · $6/1M

18
GPT-5.2
OpenAIProprietary400K

10.3

Score/$

Score: 82.4 · $8/1M

19
GPT-4.1
OpenAIProprietary1M

10.11

Score/$

Score: 80.9 · $8/1M

20
GPT-5.3 Codex
OpenAIProprietary400K

9.26

Score/$

Score: 92.6 · $10/1M

21
GPT-4o
OpenAIProprietary128K

6.23

Score/$

Score: 62.3 · $10/1M

22
Grok 4.1
xAIProprietary1M

6.03

Score/$

Score: 90.5 · $15/1M

23
GPT-5.4
OpenAIProprietary1.05M

5.99

Score/$

Score: 89.9 · $15/1M

24
Claude Sonnet 4.6
AnthropicProprietary200K

5.8

Score/$

Score: 87 · $15/1M

25
Claude Sonnet 4.5
AnthropicProprietary200K

4.02

Score/$

Score: 60.3 · $15/1M

26
o3
OpenAIProprietary200K

1.55

Score/$

Score: 62 · $40/1M

27
o1
OpenAIProprietary200K

1.3

Score/$

Score: 78.1 · $60/1M

28
Claude Opus 4.6
AnthropicProprietary1M

1.1

Score/$

Score: 82.4 · $75/1M

29
GPT-5.4 Pro
OpenAIProprietary1.05M

0.53

Score/$

Score: 95.7 · $180/1M

30
o1-pro
OpenAIProprietary200K

0.09

Score/$

Score: 56.3 · $600/1M

Key Takeaways

  • The best value model is GPT-4.1 nano by OpenAI with a Score/$ ratio of 185.23 (score: 74.1, output: $0.4/1M tokens).
  • The best open-weight model in this ranking is DeepSeek Coder 2.0 at position #5.
  • 30 models are included in this ranking.
Last updated: April 3, 2026

Weekly LLM Benchmark Digest

Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.

Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.