Best Value LLM for Math in 2026 — Cost-Adjusted Rankings

Math is the most saturated benchmark category — top models all score 95%+ on competition math. That makes price the main differentiator for math-heavy workloads. This ranking divides each model's weighted math score by output token price. If you need strong math reasoning (AIME, BRUMO, MATH-500) and the top 10 models all deliver similar accuracy, the value ranking here helps you pick the most cost-effective option.

According to BenchLM.ai, Gemini 3.1 Flash-Lite leads this ranking with a score of 162.62, followed by Gemini 2.5 Flash (88.77) and DeepSeek Coder 2.0 (73.68). There is a significant gap between the leading models and the rest of the field.

The best open-weight option is DeepSeek Coder 2.0 (ranked #3 with a score of 73.68). Open-weight models are highly competitive in this category — self-hosting is a viable alternative to proprietary APIs.

This ranking is based on weighted averages across the scoring benchmarks in math tracked by BenchLM.ai. For detailed model profiles, click any model name below. To compare two specific models head-to-head, use the "vs #" links.

Gemini 3.1 Flash-Lite
GoogleProprietary1M

162.62

Score/$

Score: 65.1 · $0.4/1M

Gemini 2.5 Flash
GoogleProprietary1M

88.77

Score/$

Score: 53.3 · $0.6/1M

DeepSeek Coder 2.0
DeepSeekOpen Weight128K

73.68

Score/$

Score: 81.1 · $1.1/1M

4
Kimi K2.5
Moonshot AIOpen Weight128K

27.96

Score/$

Score: 78.3 · $2.8/1M

5
DeepSeek-R1
DeepSeekOpen Weight128K

25.25

Score/$

Score: 55.3 · $2.19/1M

6
Gemini 3 Flash
GoogleProprietary1M

24.78

Score/$

Score: 74.3 · $3/1M

7
Gemini 3.1 Pro
GoogleProprietary1M

19.39

Score/$

Score: 97 · $5/1M

8
Claude Haiku 4.5
AnthropicProprietary200K

17.83

Score/$

Score: 71.3 · $4/1M

9
Gemini 2.5 Pro
GoogleProprietary1M

16.48

Score/$

Score: 82.4 · $5/1M

10
GPT-5.1
OpenAIProprietary200K

15.18

Score/$

Score: 91.1 · $6/1M

11
Mistral Large 3
MistralProprietary128K

12.48

Score/$

Score: 74.9 · $6/1M

12
GPT-5.2-Codex
OpenAIProprietary400K

12.04

Score/$

Score: 96.3 · $8/1M

13
GPT-5.1-Codex-Max
OpenAIProprietary400K

12.01

Score/$

Score: 96 · $8/1M

14
GPT-5.2
OpenAIProprietary400K

11.72

Score/$

Score: 93.7 · $8/1M

15
GPT-5.3 Codex
OpenAIProprietary400K

9.75

Score/$

Score: 97.6 · $10/1M

16
GPT-4o
OpenAIProprietary128K

6.61

Score/$

Score: 66.1 · $10/1M

17
GPT-5.4
OpenAIProprietary1.05M

6.45

Score/$

Score: 96.7 · $15/1M

18
Claude Sonnet 4.6
AnthropicProprietary200K

6.39

Score/$

Score: 95.8 · $15/1M

19
Claude Sonnet 4.5
AnthropicProprietary200K

6.18

Score/$

Score: 92.7 · $15/1M

20
Grok 4.1
xAIProprietary1M

6.16

Score/$

Score: 92.3 · $15/1M

21
o3
OpenAIProprietary200K

2.2

Score/$

Score: 88.1 · $40/1M

22
Claude Opus 4.6
AnthropicProprietary1M

1.3

Score/$

Score: 97.4 · $75/1M

23
GPT-5.4 Pro
OpenAIProprietary1.05M

0.55

Score/$

Score: 98.5 · $180/1M

Key Takeaways

  • The best value model is Gemini 3.1 Flash-Lite by Google with a Score/$ ratio of 162.62 (score: 65.1, output: $0.4/1M tokens).
  • The best open-weight model in this ranking is DeepSeek Coder 2.0 at position #3.
  • 23 models are included in this ranking.
Last updated: April 3, 2026

Weekly LLM Benchmark Digest

Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.

Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.