Best Value LLM for Knowledge in 2026 — Cost-Adjusted Rankings

Knowledge benchmarks test factual accuracy, expert-level science, and professional comprehension. This ranking divides each model's weighted knowledge score (GPQA, SuperGPQA, MMLU-Pro, HLE, FrontierScience, SimpleQA) by output token price. For RAG pipelines, research assistants, and Q&A systems where accurate knowledge retrieval matters but API costs add up, the value leaders here offer the best accuracy per dollar.

Unless noted otherwise, ranking surfaces on this page use BenchLM's provisional leaderboard lane rather than the stricter sourced-only verified leaderboard.

Bottom line: Knowledge tasks don't require expensive reasoning models. Gemini 3.1 Flash-Lite leads value, and DeepSeek Coder 2.0 offers strong raw knowledge at low cost.

According to BenchLM.ai, DeepSeek V4 Flash (Max) leads this ranking with a score of 240.61, followed by DeepSeek V4 Flash (High) (223.12) and DeepSeek V4 Flash (158.78). There is a significant gap between the leading models and the rest of the field.

The best open-weight option is DeepSeek V4 Flash (Max) (ranked #1 with a score of 240.61). Open-weight models are highly competitive in this category — self-hosting is a viable alternative to proprietary APIs.

This ranking is based on provisional weighted averages across the scoring benchmarks in knowledge tracked by BenchLM.ai. For detailed model profiles, click any model name below. To compare two specific models head-to-head, use the "vs #" links.

1Open

DeepSeek V4 Flash (Max)

DeepSeek · 1M

240.61Score/$

Score: 67.4 · $0.28/1M

2Open

DeepSeek V4 Flash (High)

DeepSeek · 1M

223.12Score/$

Score: 62.5 · $0.28/1M

3Open

DeepSeek V4 Flash

DeepSeek · 1M

158.78Score/$

Score: 44.5 · $0.28/1M

What changed

Gemini 3.1 Flash-Lite leads knowledge value — most knowledge capability per dollar.

DeepSeek Coder 2.0 strong raw knowledge scores at very low cost.

Gemini 2.5 Flash good knowledge value with broader model capabilities.

How to choose

RAG pipeline on a budget?

Gemini 3.1 Flash-Lite — most knowledge per dollar

Strong knowledge at low cost?

DeepSeek Coder 2.0 — good raw scores, cheap

Best raw knowledge regardless of cost?

See the standard knowledge leaderboard

Full Rankings (53 models)

DeepSeek V4 Flash (Max)

DeepSeek·Open Weight·1M

240.61

Score/$

Score: 67.4 · $0.28/1M

vs #2

DeepSeek V4 Flash (High)

DeepSeek·Open Weight·1M

223.12

Score/$

Score: 62.5 · $0.28/1M

vs #3

DeepSeek V4 Flash

DeepSeek·Open Weight·1M

158.78

Score/$

Score: 44.5 · $0.28/1M

vs #4

Grok 4.1 Fast

xAI·Proprietary·1M

150.57

Score/$

Score: 75.3 · $0.5/1M

vs #5

DeepSeek V3.2

DeepSeek·Open Weight·128K

142.23

Score/$

Score: 59.7 · $0.42/1M

vs #6

DeepSeek V3

DeepSeek·Open Weight·128K

38.21

Score/$

Score: 42 · $1.1/1M

vs #7

DeepSeek V3.2 (Thinking)

DeepSeek·Open Weight·128K

30.46

Score/$

Score: 66.7 · $2.19/1M

vs #8

Mistral Large 3

Mistral·Proprietary·128K

27.37

Score/$

Score: 41.1 · $1.5/1M

vs #9

GLM-5

Z.AI·Open Weight·200K

25.88

Score/$

Score: 82.8 · $3.2/1M

vs #10

GLM-5 (Reasoning)

Z.AI·Open Weight·200K

25.52

Score/$

Score: 81.7 · $3.2/1M

vs #11

Kimi K2.5 (Reasoning)

Moonshot AI·Proprietary·128K

24.77

Score/$

Score: 74.3 · $3/1M

vs #12

Kimi K2.5

Moonshot AI·Open Weight·256K

23.62

Score/$

Score: 70.9 · $3/1M

vs #13

Gemini 3.1 Flash-Lite

Google·Proprietary·1M

23.27

Score/$

Score: 34.9 · $1.5/1M

vs #14

Grok Code Fast 1

xAI·Proprietary·256K

22.72

Score/$

Score: 34.1 · $1.5/1M

vs #15

Kimi K2

Moonshot AI·Proprietary·128K

22.56

Score/$

Score: 56.4 · $2.5/1M

vs #16

DeepSeek V4 Pro (Max)

DeepSeek·Open Weight·1M

22.21

Score/$

Score: 77.3 · $3.48/1M

vs #17

Qwen3.5 397B (Reasoning)

Alibaba·Open Weight·128K

21.72

Score/$

Score: 78.2 · $3.6/1M

vs #18

DeepSeek V4 Pro (High)

DeepSeek·Open Weight·1M

20.54

Score/$

Score: 71.5 · $3.48/1M

vs #19

Qwen3.5 397B

Alibaba·Open Weight·128K

19.94

Score/$

Score: 71.8 · $3.6/1M

vs #20

Claude 3 Haiku

Anthropic·Proprietary·200K

19.71

Score/$

Score: 24.6 · $1.25/1M

vs #21

GLM-5.1

Z.AI·Open Weight·203K

19.19

Score/$

Score: 84.4 · $4.4/1M

vs #22

DeepSeek-R1

DeepSeek·Open Weight·128K

17.75

Score/$

Score: 38.9 · $2.19/1M

vs #23

Gemini 3 Flash

Google·Proprietary·1M

15.22

Score/$

Score: 45.7 · $3/1M

vs #24

DeepSeek V4 Pro

DeepSeek·Open Weight·1M

14.47

Score/$

Score: 50.4 · $3.48/1M

vs #25

Gemini 2.5 Flash

Google·Proprietary·1M

10.43

Score/$

Score: 26.1 · $2.5/1M

vs #26

Claude Haiku 4.5

Anthropic·Proprietary·200K

9.62

Score/$

Score: 48.1 · $5/1M

vs #27

GPT-5.1

OpenAI·Proprietary·200K

8.18

Score/$

Score: 81.8 · $10/1M

vs #28

OpenAI·Proprietary·200K

8.17

Score/$

Score: 65.4 · $8/1M

vs #29

GPT-5 (high)

OpenAI·Proprietary·128K

7.92

Score/$

Score: 79.2 · $10/1M

vs #30

Gemini 3.1 Pro

Google·Proprietary·1M

7.89

Score/$

Score: 94.7 · $12/1M

vs #31

GPT-5.1-Codex-Max

OpenAI·Proprietary·400K

7.85

Score/$

Score: 78.5 · $10/1M

vs #32

GLM-4.5-Air

Z.AI·Proprietary·128K

7.29

Score/$

Score: 8 · $1.1/1M

vs #33

Gemini 1.5 Pro

Google·Proprietary·2M

6.88

Score/$

Score: 34.4 · $5/1M

vs #34

Gemini 3 Pro

Google·Proprietary·2M

6.87

Score/$

Score: 82.5 · $12/1M

vs #35

GPT-5.3 Codex

OpenAI·Proprietary·400K

6.64

Score/$

Score: 93 · $14/1M

vs #36

GPT-5.4

OpenAI·Proprietary·1.05M

6.62

Score/$

Score: 99.3 · $15/1M

vs #37

GPT-5.2

OpenAI·Proprietary·400K

6.56

Score/$

Score: 91.8 · $14/1M

vs #38

Gemini 2.5 Pro

Google·Proprietary·1M

6.48

Score/$

Score: 64.8 · $10/1M

vs #39

GPT-5.2-Codex

OpenAI·Proprietary·400K

5.55

Score/$

Score: 77.8 · $14/1M

vs #40

Claude Sonnet 4.6

Anthropic·Proprietary·200K

5.5

Score/$

Score: 82.5 · $15/1M

vs #41

GLM-4.5

Z.AI·Proprietary·128K

4.96

Score/$

Score: 10.9 · $2.2/1M

vs #42

Claude Sonnet 4.5

Anthropic·Proprietary·200K

4.93

Score/$

Score: 73.9 · $15/1M

vs #43

Claude Opus 4.7 (Adaptive)

Anthropic·Proprietary·1M

3.96

Score/$

Score: 99.1 · $25/1M

vs #44

Claude Opus 4.6

Anthropic·Proprietary·1M

3.62

Score/$

Score: 90.4 · $25/1M

vs #45

GPT-4o

OpenAI·Proprietary·128K

3.47

Score/$

Score: 34.7 · $10/1M

vs #46

Claude Opus 4.5

Anthropic·Proprietary·200K

3.34

Score/$

Score: 83.4 · $25/1M

vs #47

Claude 4 Sonnet

Anthropic·Proprietary·200K

3.32

Score/$

Score: 49.8 · $15/1M

vs #48

Claude 3.5 Sonnet

Anthropic·Proprietary·200K

2.58

Score/$

Score: 38.7 · $15/1M

vs #49

o1-preview

OpenAI·Proprietary·200K

1.33

Score/$

Score: 79.9 · $60/1M

vs #50

GPT-4 Turbo

OpenAI·Proprietary·128K

0.91

Score/$

Score: 27.2 · $30/1M

vs #51

o3-pro

OpenAI·Proprietary·200K

0.83

Score/$

Score: 66.5 · $80/1M

vs #52

Claude 4.1 Opus

Anthropic·Proprietary·200K

0.69

Score/$

Score: 51.5 · $75/1M

vs #53

Claude 3 Opus

Anthropic·Proprietary·200K

0.41

Score/$

Score: 30.8 · $75/1M

These rankings update weekly

Get notified when models move. One email a week with what changed and why.

Free. No spam. Unsubscribe anytime.

Key Takeaways

The best value model is DeepSeek V4 Flash (Max) by DeepSeek with a provisional Score/$ ratio of 240.61 (score: 67.4, output: $0.28/1M tokens).

The best open-weight model is DeepSeek V4 Flash (Max) at position #1.

53 models are included in this ranking.

Score in Context

What these scores mean

Value scores divide the weighted knowledge score by output token price (per 1M tokens). Higher means more capability per dollar. Models with no listed price are excluded.

Known limitations

Value rankings favor cheap models even if absolute performance is modest. A model scoring half as well at one-tenth the price wins on value — but may not meet your quality bar. Always check raw scores alongside value rankings.

Explore More

Price vs Performance Chart Compare Pricing Which LLM Should I Use? Benchmark Explainers

Last updated: May 20, 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.