Best Value Agentic AI Model in 2026 — Cost-Adjusted Rankings

Agentic workloads are token-intensive — agents loop, retry, and chain multiple calls. That makes cost-per-token a critical factor alongside raw capability. This ranking divides each model's weighted agentic score (Terminal-Bench 2.0, BrowseComp, OSWorld-Verified) by its output token price. The result shows which models give you the most agent capability per dollar. If you're building production AI agents with budget constraints, this is where you start.

According to BenchLM.ai, Gemini 3.1 Flash-Lite leads this ranking with a score of 123.03, followed by GPT-4.1 nano (87.33) and GPT-4o mini (82.35). There is a significant gap between the leading models and the rest of the field.

The best open-weight option is DeepSeek Coder 2.0 (ranked #5 with a score of 57.15). While proprietary models lead, open-weight options are within striking distance for teams willing to trade a few points of performance for full model control.

This ranking is based on weighted averages across the scoring benchmarks in agentic tracked by BenchLM.ai. For detailed model profiles, click any model name below. To compare two specific models head-to-head, use the "vs #" links.

Gemini 3.1 Flash-Lite
GoogleProprietary1M

123.03

Score/$

Score: 49.2 · $0.4/1M

GPT-4.1 nano
OpenAIProprietary1M

87.33

Score/$

Score: 34.9 · $0.4/1M

GPT-4o mini
OpenAIProprietary128K

82.35

Score/$

Score: 49.4 · $0.6/1M

4
Gemini 2.5 Flash
GoogleProprietary1M

70.73

Score/$

Score: 42.4 · $0.6/1M

5
DeepSeek Coder 2.0
DeepSeekOpen Weight128K

57.15

Score/$

Score: 62.9 · $1.1/1M

6
GPT-5.4 nano
OpenAIProprietary400K

44.42

Score/$

Score: 55.5 · $1.25/1M

7
GPT-4.1 mini
OpenAIProprietary1M

31.67

Score/$

Score: 50.7 · $1.6/1M

8
Kimi K2.5
Moonshot AIOpen Weight128K

23.63

Score/$

Score: 66.2 · $2.8/1M

9
Gemini 3 Flash
GoogleProprietary1M

19.57

Score/$

Score: 58.7 · $3/1M

10
DeepSeek-R1
DeepSeekOpen Weight128K

19.23

Score/$

Score: 42.1 · $2.19/1M

11
Gemini 3.1 Pro
GoogleProprietary1M

16.28

Score/$

Score: 81.4 · $5/1M

12
GPT-5.4 mini
OpenAIProprietary400K

16.03

Score/$

Score: 72.2 · $4.5/1M

13
o3-mini
OpenAIProprietary200K

14.56

Score/$

Score: 64.1 · $4.4/1M

14
Claude Haiku 4.5
AnthropicProprietary200K

13.47

Score/$

Score: 53.9 · $4/1M

15
GPT-5.1
OpenAIProprietary200K

12.45

Score/$

Score: 74.7 · $6/1M

16
Gemini 2.5 Pro
GoogleProprietary1M

11.93

Score/$

Score: 59.6 · $5/1M

17
GPT-5.2-Codex
OpenAIProprietary400K

10.17

Score/$

Score: 81.3 · $8/1M

18
GPT-5.1-Codex-Max
OpenAIProprietary400K

9.65

Score/$

Score: 77.2 · $8/1M

19
GPT-5.2
OpenAIProprietary400K

9.28

Score/$

Score: 74.3 · $8/1M

20
GPT-5.3 Codex
OpenAIProprietary400K

7.96

Score/$

Score: 79.6 · $10/1M

21
GPT-4.1
OpenAIProprietary1M

7.26

Score/$

Score: 58 · $8/1M

22
Grok 4.1
xAIProprietary1M

5.34

Score/$

Score: 80.1 · $15/1M

23
GPT-5.4
OpenAIProprietary1.05M

4.82

Score/$

Score: 72.3 · $15/1M

24
GPT-4o
OpenAIProprietary128K

3.91

Score/$

Score: 39.1 · $10/1M

25
Claude Sonnet 4.5
AnthropicProprietary200K

3.87

Score/$

Score: 58.1 · $15/1M

26
o3
OpenAIProprietary200K

1.53

Score/$

Score: 61.1 · $40/1M

27
Claude Opus 4.6
AnthropicProprietary1M

1.11

Score/$

Score: 83 · $75/1M

28
o1
OpenAIProprietary200K

0.97

Score/$

Score: 58.3 · $60/1M

29
GPT-5.4 Pro
OpenAIProprietary1.05M

0.49

Score/$

Score: 88.9 · $180/1M

30
o1-pro
OpenAIProprietary200K

0.07

Score/$

Score: 39.7 · $600/1M

Key Takeaways

  • The best value model is Gemini 3.1 Flash-Lite by Google with a Score/$ ratio of 123.03 (score: 49.2, output: $0.4/1M tokens).
  • The best open-weight model in this ranking is DeepSeek Coder 2.0 at position #5.
  • 30 models are included in this ranking.
Last updated: April 3, 2026

Weekly LLM Benchmark Digest

Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.

Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.