Best Value Agentic AI Model in 2026 — Cost-Adjusted Rankings

Agentic workloads are token-intensive — agents loop, retry, and chain multiple calls. That makes cost-per-token a critical factor alongside raw capability. This ranking divides each model's weighted agentic score (Terminal-Bench 2.0, BrowseComp, OSWorld-Verified) by its output token price. The result shows which models give you the most agent capability per dollar. If you're building production AI agents with budget constraints, this is where you start.

Unless noted otherwise, ranking surfaces on this page use BenchLM's provisional leaderboard lane rather than the stricter sourced-only verified leaderboard.

Bottom line: Agentic tasks are token-intensive — value matters more here than almost anywhere. Gemini 3.1 Flash-Lite leads, with GPT-4o mini offering competitive agentic value.

According to BenchLM.ai, Grok 4.1 Fast leads this ranking with a score of 126.91, followed by DeepSeek V3.2 (125.34) and GPT-4o mini (67.34). There is a significant gap between the leading models and the rest of the field.

The best open-weight option is DeepSeek V3.2 (ranked #2 with a score of 125.34). Open-weight models are highly competitive in this category — self-hosting is a viable alternative to proprietary APIs.

This ranking is based on provisional weighted averages across the scoring benchmarks in agentic tracked by BenchLM.ai. For detailed model profiles, click any model name below. To compare two specific models head-to-head, use the "vs #" links.

1Closed

Grok 4.1 Fast

xAI · 1M

126.91Score/$

Score: 63.5 · $0.5/1M

2Open

DeepSeek V3.2

DeepSeek · 128K

125.34Score/$

Score: 52.6 · $0.42/1M

3Closed

GPT-4o mini

OpenAI · 128K

67.34Score/$

Score: 40.4 · $0.6/1M

Best OpenAI value for agentic tasks. Solid raw scores.

What changed

Gemini 3.1 Flash-Lite leads agentic value — most agent capability per dollar.

GPT-4o mini strong agentic value in OpenAI's lineup.

Gemini 2.5 Flash good agentic performance at Flash-tier pricing.

How to choose

Cheapest agentic model?

Gemini 3.1 Flash-Lite — most agent loops per dollar

Agentic value in OpenAI ecosystem?

GPT-4o mini — best OpenAI value for agents

Best raw agentic performance?

See the standard agentic leaderboard

Full Rankings (53 models)

Grok 4.1 Fast

xAI·Proprietary·1M

126.91

Score/$

Score: 63.5 · $0.5/1M

vs #2

DeepSeek V3.2

DeepSeek·Open Weight·128K

125.34

Score/$

Score: 52.6 · $0.42/1M

vs #3

GPT-4o mini

OpenAI·Proprietary·128K

67.34

Score/$

Score: 40.4 · $0.6/1M

vs #4

GPT-4.1 nano

OpenAI·Proprietary·1M

59.24

Score/$

Score: 23.7 · $0.4/1M

vs #5

DeepSeek V3.2 (Thinking)

DeepSeek·Open Weight·128K

29.02

Score/$

Score: 63.6 · $2.19/1M

vs #6

Gemini 3.1 Flash-Lite

Google·Proprietary·1M

28.43

Score/$

Score: 42.6 · $1.5/1M

vs #7

GPT-4.1 mini

OpenAI·Proprietary·1M

27.72

Score/$

Score: 44.3 · $1.6/1M

vs #8

Grok Code Fast 1

xAI·Proprietary·256K

27.7

Score/$

Score: 41.6 · $1.5/1M

vs #9

Mistral Large 3

Mistral·Proprietary·128K

25.81

Score/$

Score: 38.7 · $1.5/1M

vs #10

GLM-5 (Reasoning)

Z.AI·Open Weight·200K

25.65

Score/$

Score: 82.1 · $3.2/1M

vs #11

Kimi K2.6

Moonshot AI·Open Weight·256K

22.03

Score/$

Score: 88.1 · $4/1M

vs #12

Kimi K2.5 (Reasoning)

Moonshot AI·Proprietary·128K

21.63

Score/$

Score: 64.9 · $3/1M

vs #13

Qwen3.5 397B (Reasoning)

Alibaba·Open Weight·128K

20.01

Score/$

Score: 72 · $3.6/1M

vs #14

Gemini 3 Flash

Google·Proprietary·1M

18.2

Score/$

Score: 54.6 · $3/1M

vs #15

GLM-4.5-Air

Z.AI·Proprietary·128K

16.45

Score/$

Score: 18.1 · $1.1/1M

vs #16

GLM-5

Z.AI·Open Weight·200K

15.87

Score/$

Score: 50.8 · $3.2/1M

vs #17

Kimi K2.5

Moonshot AI·Open Weight·256K

15.8

Score/$

Score: 47.4 · $3/1M

vs #18

Qwen3.5 397B

Alibaba·Open Weight·128K

15.67

Score/$

Score: 56.4 · $3.6/1M

vs #19

Claude 3 Haiku

Anthropic·Proprietary·200K

15.34

Score/$

Score: 19.2 · $1.25/1M

vs #20

DeepSeek-R1

DeepSeek·Open Weight·128K

14.78

Score/$

Score: 32.4 · $2.19/1M

vs #21

o3-mini

OpenAI·Proprietary·200K

14.51

Score/$

Score: 63.8 · $4.4/1M

vs #22

Gemini 2.5 Flash

Google·Proprietary·1M

13.81

Score/$

Score: 34.5 · $2.5/1M

vs #23

GLM-4.5

Z.AI·Proprietary·128K

11.04

Score/$

Score: 24.3 · $2.2/1M

vs #24

Claude Haiku 4.5

Anthropic·Proprietary·200K

9.46

Score/$

Score: 47.3 · $5/1M

vs #25

GPT-5 (high)

OpenAI·Proprietary·128K

7.8

Score/$

Score: 78 · $10/1M

vs #26

GPT-5.1

OpenAI·Proprietary·200K

7.7

Score/$

Score: 77 · $10/1M

vs #27

OpenAI·Proprietary·200K

7.35

Score/$

Score: 58.8 · $8/1M

vs #28

Gemini 3.1 Pro

Google·Proprietary·1M

7.27

Score/$

Score: 87.3 · $12/1M

vs #29

GPT-4.1

OpenAI·Proprietary·1M

7.15

Score/$

Score: 57.2 · $8/1M

vs #30

Gemini 1.5 Pro

Google·Proprietary·2M

7.14

Score/$

Score: 35.7 · $5/1M

vs #31

Gemini 3 Pro

Google·Proprietary·2M

6.09

Score/$

Score: 73.1 · $12/1M

vs #32

Gemini 2.5 Pro

Google·Proprietary·1M

5.92

Score/$

Score: 59.2 · $10/1M

vs #33

GPT-5.4

OpenAI·Proprietary·1.05M

5.84

Score/$

Score: 87.6 · $15/1M

vs #34

GPT-5.3 Codex

OpenAI·Proprietary·400K

5.71

Score/$

Score: 79.9 · $14/1M

vs #35

Claude Sonnet 4.6

Anthropic·Proprietary·200K

5.43

Score/$

Score: 81.4 · $15/1M

vs #36

GPT-5.2

OpenAI·Proprietary·400K

4.44

Score/$

Score: 62.1 · $14/1M

vs #37

GPT-4o

OpenAI·Proprietary·128K

4.1

Score/$

Score: 41 · $10/1M

vs #38

Claude Sonnet 4.5

Anthropic·Proprietary·200K

3.79

Score/$

Score: 56.9 · $15/1M

vs #39

Claude Opus 4.7 (Adaptive)

Anthropic·Proprietary·1M

3.77

Score/$

Score: 94.3 · $25/1M

vs #40

Claude Opus 4.6

Anthropic·Proprietary·1M

3.39

Score/$

Score: 84.7 · $25/1M

vs #41

GPT-5.5

OpenAI·Proprietary·1M

3.28

Score/$

Score: 98.3 · $30/1M

vs #42

Claude 4 Sonnet

Anthropic·Proprietary·200K

3.08

Score/$

Score: 46.2 · $15/1M

vs #43

Claude Opus 4.5

Anthropic·Proprietary·200K

3.06

Score/$

Score: 76.6 · $25/1M

vs #44

Claude 3.5 Sonnet

Anthropic·Proprietary·200K

2.66

Score/$

Score: 40 · $15/1M

vs #45

o1-preview

OpenAI·Proprietary·200K

1.5

Score/$

Score: 90.2 · $60/1M

vs #46

OpenAI·Proprietary·200K

0.96

Score/$

Score: 57.6 · $60/1M

vs #47

GPT-4 Turbo

OpenAI·Proprietary·128K

0.82

Score/$

Score: 24.6 · $30/1M

vs #48

Claude Mythos Preview

Anthropic·Proprietary·1M

0.8

Score/$

Score: 100 · $125/1M

vs #49

o3-pro

OpenAI·Proprietary·200K

0.79

Score/$

Score: 63.3 · $80/1M

vs #50

Claude 4.1 Opus

Anthropic·Proprietary·200K

0.55

Score/$

Score: 41.3 · $75/1M

vs #51

GPT-5.4 Pro

OpenAI·Proprietary·1.05M

0.51

Score/$

Score: 91.8 · $180/1M

vs #52

Claude 3 Opus

Anthropic·Proprietary·200K

0.47

Score/$

Score: 35.5 · $75/1M

vs #53

o1-pro

OpenAI·Proprietary·200K

0.03

Score/$

Score: 17.8 · $600/1M

These rankings update weekly

Get notified when models move. One email a week with what changed and why.

Free. No spam. Unsubscribe anytime.

Key Takeaways

The best value model is Grok 4.1 Fast by xAI with a provisional Score/$ ratio of 126.91 (score: 63.5, output: $0.5/1M tokens).

The best open-weight model is DeepSeek V3.2 at position #2.

53 models are included in this ranking.

Score in Context

What these scores mean

Value scores divide the weighted agentic score by output token price (per 1M tokens). Higher means more capability per dollar. Models with no listed price are excluded.

Known limitations

Value rankings favor cheap models even if absolute performance is modest. A model scoring half as well at one-tenth the price wins on value — but may not meet your quality bar. Always check raw scores alongside value rankings.

Explore More

Price vs Performance Chart Compare Pricing Which LLM Should I Use? Benchmark Explainers

Last updated: May 13, 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.