Skip to main content
budgetcomparisonpricingguideranking

Best Budget LLMs in 2026: GPT-5.4 Mini, Nano, MiniMax M2.7, and Every Cheap Model Ranked

Which budget LLM should you use in 2026? We rank GPT-5.4 mini, GPT-5.4 nano, MiniMax M2.7, Claude Haiku 4.5, Gemini Flash, DeepSeek, and more by benchmarks and price.

Glevd·Published March 18, 2026·12 min read

Share This Report

Copy the link, post it, or save a PDF version.

Share on XShare on LinkedIn

GPT-5.4 mini and nano just landed alongside MiniMax M2.7 — three new budget models in 48 hours. The capability floor keeps rising while prices drop. GPT-5.4 mini brings reasoning-class intelligence to $0.75/M input. MiniMax M2.7 quietly beats it on SWE-bench Pro at less than half the price.

This guide ranks every major LLM under $1.50 per million input tokens by benchmark performance, with pricing breakdowns and use-case recommendations. All scores from the BenchLM.ai leaderboard and pricing page.

The budget tier landscape (March 2026)

There are now more than 15 models priced under $1.50/M input tokens. The quality range is enormous — from GPT-5 nano at $0.05/M input to Gemini 3.1 Pro at $1.25/M scoring 94 overall.

Ultra-budget: under $0.50/M input

Model Creator Input/Output Context Overall Score Type
GPT-5 nano OpenAI $0.05/$0.40 400K 36 Reasoning
Seed 1.6 Flash ByteDance $0.08/$0.30 256K
Gemini 3.1 Flash-Lite Google $0.10/$0.40 1M
Step 3.5 Flash StepFun $0.10/$0.30 256K
GPT-5.4 nano OpenAI $0.20/$1.25 400K 58 Reasoning
Mercury 2 Inception $0.25/$0.75 128K
DeepSeek V3 DeepSeek $0.27/$1.10 128K 49 Non-Reasoning
DeepSeek Coder 2.0 DeepSeek $0.27/$1.10 128K 62 Non-Reasoning
MiniMax M2.7 MiniMax $0.30/$1.20 200K 60* Non-Reasoning
Grok 3 Mini xAI $0.30/$0.50 128K 49* Non-Reasoning

*MiniMax M2.7 and Grok 3 Mini still have sparse coverage relative to the best-supported frontier rows, so treat their overall scores as directional rather than definitive.

Budget-frontier: $0.50–$1.50/M input

Model Creator Input/Output Context Overall Score Type
Gemini 3 Flash Google $0.50/$3.00 1M 67 Non-Reasoning
Kimi K2.5 Moonshot $0.50/$2.80 128K 72 Non-Reasoning
DeepSeek R1 DeepSeek $0.55/$2.19 128K 45 Reasoning
GPT-5.4 mini OpenAI $0.75/$4.50 400K 73 Reasoning
Claude Haiku 4.5 Anthropic $1.00/$5.00 200K 63 Non-Reasoning
GLM-5-Turbo Zhipu $1.20/$4.00 200K
Gemini 3.1 Pro Google $1.25/$5.00 1M 94 Non-Reasoning

For reference, the full GPT-5.4 costs $2.50/$15.00 and scores 94 overall. The jump from $1.25 (Gemini 3.1 Pro, score 94) to $2.50 (GPT-5.4, score 94) now actually favors Gemini on overall score — though GPT-5.4 still holds strong frontier-class individual benchmarks.

GPT-5.4 mini: reasoning on a budget

GPT-5.4 mini is OpenAI's reasoning model at budget pricing — $0.75/M input, 3.3x cheaper than GPT-5.4. It now scores 73 overall with a 400K context window.

Where mini stands out:

  • Agentic tasks: OSWorld-Verified 72.2 (vs full GPT-5.4's 85), Terminal-Bench 2.0 at 60, Tau2Bench 93.4. These are strong agentic scores for a budget model — OSWorld 72.2 beats Claude Haiku 4.5 (57) and Gemini 3 Flash (53) comfortably.
  • Knowledge: GPQA 88 is solid — above Gemini 3 Flash (69) and Claude Haiku 4.5 (67), though below Gemini 3.1 Pro (97). HLE 41.5 is a standout — this is a hard benchmark where most budget models score in single digits.
  • Multimodal: MMMU-Pro 76.6 is competitive, trailing only Gemini 3.1 Pro (95) and Claude Haiku 4.5 (82) in the budget tier.

Where mini falls short:

  • Coding: SWE-bench Pro 54.4 is a big step down from GPT-5.4's 85. For a coding-focused workload, the quality gap is real.
  • Reasoning depth: MRCR v2 at 40.7 shows the long-context reasoning ceiling. Full GPT-5.4 scores 97 here.
  • Sparse math data: No AIME or HMMT scores published yet, making it hard to evaluate math capability.

The pitch: GPT-5.4 mini makes sense when you need a reasoning model with agentic capability at budget pricing. For pure knowledge or coding tasks, Gemini 3.1 Pro at $1.25 is stronger across the board.

GPT-5.4 nano: how low can you go?

GPT-5.4 nano costs $0.20/M input — 12.5x cheaper than full GPT-5.4. It now lands in the high-50s on BenchLM's overall score and materially outperforms the older GPT-5 nano budget row, but with a different capability profile.

Key scores:

Benchmark GPT-5.4 nano GPT-5 nano GPT-5.4 mini
GPQA 82.8 71.2 88
HLE 37.7 41.5
SWE-bench Pro 52.4 22 54.4
Terminal-Bench 2.0 46.3 38 60
OSWorld-Verified 39 30 72.1
MMMU-Pro 66.1 58 76.6

GPT-5.4 nano beats GPT-5 nano on every available benchmark — especially coding (SWE-bench Pro 52.4 vs 22) and knowledge (GPQA 82.8 vs 71.2). The gap is large enough that GPT-5.4 nano effectively replaces GPT-5 nano for anything beyond the cheapest possible classification tasks.

The cost math: At $0.20/M input, nano processes 5 million input tokens per dollar. For a classification pipeline handling 100M tokens/month, GPT-5.4 nano costs $20/month. GPT-5.4 mini would cost $75/month for the same volume. That 3.75x multiplier matters at scale.

Where nano makes sense: High-volume tasks where cost dominates — classification, tagging, simple extraction, content filtering. For anything requiring strong reasoning or coding, the step up to mini ($0.75) is worth the extra cost.

MiniMax M2.7: the coding wildcard

MiniMax M2.7 is the surprise of this batch. At $0.30/M input — cheaper than both GPT-5.4 mini and nano for quality coding — it posts the highest SWE-bench Pro score in the budget tier: 56.22.

Benchmark MiniMax M2.7 GPT-5.4 mini GPT-5.4 nano Claude Haiku 4.5
SWE-bench Pro 56.22 54.4 52.4 46
Terminal-Bench 2.0 57 60 46.3 53
SWE-Multilingual 76.5
MLE-Bench-Lite 66.6
Toolathlon 46.3 42.9 35.5

MiniMax M2.7 beats GPT-5.4 mini on SWE-bench Pro by nearly 2 points while costing 2.5x less on input tokens. On SWE-Multilingual (76.5) and MLE-Bench-Lite (66.6), it shows strong coding breadth that the OpenAI budget models haven't been tested on yet.

The caveat: MiniMax M2.7 still has sparse coverage relative to the best-supported frontier rows. Its 60 overall score is directionally useful, but it still rests on a narrow slice of coding and agentic evidence rather than broad cross-category coverage.

200K context is another differentiator. At $0.30/M input, feeding large codebases into M2.7 is dramatically cheaper than any alternative with comparable SWE-bench scores.

Head-to-head: which budget model wins?

Coding

Model SWE-bench Pro LiveCodeBench Price (in/out)
MiniMax M2.7 56.22 $0.30/$1.20
GPT-5.4 mini 54.4 $0.75/$4.50
GPT-5.4 nano 52.4 $0.20/$1.25
Claude Haiku 4.5 46 36 $1.00/$5.00
Gemini 3 Flash 44 36 $0.50/$3.00
DeepSeek V3 37.6 $0.27/$1.10

MiniMax M2.7 leads. For budget coding workloads — code review, bug fixing, refactors — it's the best value option in the tier. GPT-5.4 mini is close behind with the added benefit of being a reasoning model.

Agentic tasks

Model Terminal-Bench 2.0 OSWorld-Verified Price (in/out)
GPT-5.4 mini 60 72.2 $0.75/$4.50
MiniMax M2.7 57 $0.30/$1.20
Gemini 3 Flash 56 53 $0.50/$3.00
Claude Haiku 4.5 41 57 $1.00/$5.00
GPT-5.4 nano 46.3 39 $0.20/$1.25
GPT-5 nano 38 30 $0.05/$0.40

GPT-5.4 mini dominates agentic benchmarks in this tier. OSWorld-Verified 72.2 is a standout — closer to full GPT-5.4 (85) than any other budget model gets to its flagship sibling. If you're building an agent on a budget, mini is the pick.

Knowledge

Model GPQA HLE Price (in/out)
GPT-5.4 mini 88 41.5 $0.75/$4.50
GPT-5.4 nano 82.8 37.7 $0.20/$1.25
GPT-5 nano 71.2 $0.05/$0.40
Gemini 3 Flash 69 6 $0.50/$3.00
Claude Haiku 4.5 67 11 $1.00/$5.00
DeepSeek V3 59.1 $0.27/$1.10

GPT-5.4 mini and nano dominate knowledge benchmarks in the budget tier. HLE scores of 41.5 and 37.7 are particularly impressive — Claude Haiku 4.5 scores 11 and Gemini 3 Flash scores 6 on the same benchmark.

Multimodal

Model MMMU-Pro Price (in/out)
Claude Haiku 4.5 82 $1.00/$5.00
Gemini 3 Flash 80 $0.50/$3.00
GPT-5.4 mini 76.6 $0.75/$4.50
GPT-5.4 nano 66.1 $0.20/$1.25
GPT-5 nano 58 $0.05/$0.40

Claude Haiku 4.5 and Gemini 3 Flash lead the budget tier on multimodal. MiniMax M2.7 has no MMMU-Pro score — another gap in its benchmark coverage.

When to use what

High-volume classification and tagging — GPT-5 nano ($0.05/$0.40) or GPT-5.4 nano ($0.20/$1.25). If you're processing millions of tokens daily on simple tasks, nano-tier pricing is hard to argue with. GPT-5.4 nano is substantially better on quality if the 4x price increase fits your budget.

Budget coding assistant — MiniMax M2.7 ($0.30/$1.20). Highest SWE-bench Pro in the tier (56.22) at the second-lowest price. The 200K context window handles large codebases well. The caveat: limited benchmark coverage outside coding, so evaluate on your specific tasks.

Budget AI agent — GPT-5.4 mini ($0.75/$4.50). OSWorld-Verified 72.2 and Terminal-Bench 60 are the best agentic scores in the budget tier by a wide margin. The reasoning capability helps with multi-step agent workflows.

Long-context workloads — Gemini 3 Flash ($0.50/$3.00) with 1M context, or GPT-5.4 mini ($0.75/$4.50) with 400K. If you need 1M tokens of context at the cheapest possible price, Gemini 3 Flash is the only option. Gemini 3.1 Pro ($1.25/$5.00) also offers 1M context with much stronger benchmark scores.

Best budget all-rounder — Gemini 3.1 Pro ($1.25/$5.00). At 94 overall, it still scores higher than every other model in this guide. Full benchmark coverage across all categories, 1M context, and $1.25/M input. If you can afford $1.25 instead of $0.30, this is still the safest choice.

Cheapest reasoning model — GPT-5.4 nano ($0.20/$1.25). The only reasoning model under $0.50/M input with broad benchmark coverage. GPQA 82.8 and HLE 37.7 show real reasoning capability at an ultra-budget price.

The data gap problem

MiniMax M2.7's current overall score still does not tell the full story. BenchLM.ai's ranking methodology requires breadth of benchmark coverage to produce a reliable overall score. MiniMax has a much better coding row than its general-purpose confidence level suggests.

On the benchmarks that do exist, M2.7 is competitive with or better than GPT-5.4 mini. SWE-bench Pro 56.22 and Terminal-Bench 57 are strong numbers. But without GPQA, HLE, AIME, MMMU-Pro, or instruction-following scores, it's impossible to rank M2.7 fairly against models with full coverage.

This is a recurring problem in AI benchmarking. As we covered in Are AI Benchmarks Reliable?, benchmark coverage and provenance matter as much as the scores themselves. A model with 10 strong scores and 20 unknowns is a riskier choice than a model with 25 moderate scores.

The practical takeaway: If your workload is coding or agentic tasks, MiniMax M2.7's published scores justify trying it. For general-purpose use, stick with models that have full benchmark coverage until more M2.7 data is available.

What this means for the market

Three takeaways from this week's releases:

1. The reasoning gap is closing at the bottom. GPT-5.4 mini and nano bring reasoning-class capability to the budget tier. A year ago, reasoning models started at $2.50/M input. Now you can get HLE 37.7 for $0.20/M input.

2. Chinese models keep punching above on coding. MiniMax M2.7 posting the highest SWE-bench Pro score in the budget tier — above both GPT-5.4 mini and nano — continues the trend of Chinese labs producing strong coding models at aggressive price points.

3. Budget doesn't mean weak anymore. GPT-5.4 mini's OSWorld-Verified 72.2 would have been a frontier-class score 12 months ago. The models that cost $0.30–$0.75/M input today are materially better than the $15/M models of early 2025.

Check the BenchLM.ai leaderboard for the latest scores as more benchmarks roll in for these models. Prices and capabilities shift fast — what's budget today is obsolete tomorrow.

New models drop every week. We send one email a week with what moved and why.