budgetcomparisonpricingguideranking

Best Budget LLMs in 2026: GPT-5.4 Mini, Nano, MiniMax M2.7, and Every Cheap Model Ranked

Which budget LLM should you use in 2026? We rank GPT-5.4 mini, GPT-5.4 nano, MiniMax M2.7, Claude Haiku 4.5, Gemini Flash, DeepSeek, and more by benchmarks and price.

Glevd·March 18, 2026·12 min read

GPT-5.4 mini and nano just landed alongside MiniMax M2.7 — three new budget models in 48 hours. The capability floor keeps rising while prices drop. GPT-5.4 mini brings reasoning-class intelligence to $0.75/M input. MiniMax M2.7 quietly beats it on SWE-bench Pro at less than half the price.

This guide ranks every major LLM under $1.50 per million input tokens by benchmark performance, with pricing breakdowns and use-case recommendations. All scores from the BenchLM.ai leaderboard and pricing page.

The budget tier landscape (March 2026)

There are now more than 15 models priced under $1.50/M input tokens. The quality range is enormous — from GPT-5 nano at $0.05/M input to Gemini 3.1 Pro at $1.25/M scoring 84 overall.

Ultra-budget: under $0.50/M input

Model Creator Input/Output Context Overall Score Type
GPT-5 nano OpenAI $0.05/$0.40 400K 40 Reasoning
Seed 1.6 Flash ByteDance $0.08/$0.30 256K
Gemini 3.1 Flash-Lite Google $0.10/$0.40 1M
Step 3.5 Flash StepFun $0.10/$0.30 256K
GPT-5.4 nano OpenAI $0.20/$1.25 400K 41 Reasoning
Mercury 2 Inception $0.25/$0.75 128K
DeepSeek V3 DeepSeek $0.27/$1.10 128K 31 Non-Reasoning
DeepSeek Coder 2.0 DeepSeek $0.27/$1.10 128K
MiniMax M2.7 MiniMax $0.30/$1.20 200K 24* Non-Reasoning
Grok 3 Mini xAI $0.30/$0.50 128K 21 Non-Reasoning

*MiniMax M2.7 overall score based on only 10 benchmarks — low confidence.

Budget-frontier: $0.50–$1.50/M input

Model Creator Input/Output Context Overall Score Type
Gemini 3 Flash Google $0.50/$3.00 1M 62 Non-Reasoning
Kimi K2.5 Moonshot $0.50/$2.80 128K
DeepSeek R1 DeepSeek $0.55/$2.19 128K Reasoning
GPT-5.4 mini OpenAI $0.75/$4.50 400K 49 Reasoning
Claude Haiku 4.5 Anthropic $0.80/$4.00 200K 62 Non-Reasoning
GLM-5-Turbo Zhipu $1.20/$4.00 200K
Gemini 3.1 Pro Google $1.25/$5.00 1M 84 Non-Reasoning

For reference, the full GPT-5.4 costs $2.50/$15.00 and scores 90 overall. The jump from $1.25 (Gemini 3.1 Pro, score 84) to $2.50 (GPT-5.4, score 90) is where the budget tier ends and the production-frontier begins.

GPT-5.4 mini: reasoning on a budget

GPT-5.4 mini is OpenAI's reasoning model at budget pricing — $0.75/M input, 3.3x cheaper than GPT-5.4. It scores 49 overall with a 400K context window.

Where mini stands out:

  • Agentic tasks: OSWorld-Verified 72.1 (vs full GPT-5.4's 85), Terminal-Bench 2.0 at 60, Tau2Bench 93.4. These are strong agentic scores for a budget model — OSWorld 72.1 beats Claude Haiku 4.5 (57) and Gemini 3 Flash (53) comfortably.
  • Knowledge: GPQA 88 is solid — above Gemini 3 Flash (69) and Claude Haiku 4.5 (67), though below Gemini 3.1 Pro (97). HLE 41.5 is a standout — this is a hard benchmark where most budget models score in single digits.
  • Multimodal: MMMU-Pro 76.6 is competitive, trailing only Gemini 3.1 Pro (95) and Claude Haiku 4.5 (82) in the budget tier.

Where mini falls short:

  • Coding: SWE-bench Pro 54.4 is a big step down from GPT-5.4's 85. For a coding-focused workload, the quality gap is real.
  • Reasoning depth: MRCR v2 at 40.7 shows the long-context reasoning ceiling. Full GPT-5.4 scores 97 here.
  • Sparse math data: No AIME or HMMT scores published yet, making it hard to evaluate math capability.

The pitch: GPT-5.4 mini makes sense when you need a reasoning model with agentic capability at budget pricing. For pure knowledge or coding tasks, Gemini 3.1 Pro at $1.25 is stronger across the board.

GPT-5.4 nano: how low can you go?

GPT-5.4 nano costs $0.20/M input — 12.5x cheaper than full GPT-5.4. It scores 41 overall, roughly matching GPT-5 nano (40) but with a different capability profile.

Key scores:

Benchmark GPT-5.4 nano GPT-5 nano GPT-5.4 mini
GPQA 82.8 71.2 88
HLE 37.7 41.5
SWE-bench Pro 52.4 22 54.4
Terminal-Bench 2.0 46.3 38 60
OSWorld-Verified 39 30 72.1
MMMU-Pro 66.1 58 76.6

GPT-5.4 nano beats GPT-5 nano on every available benchmark — especially coding (SWE-bench Pro 52.4 vs 22) and knowledge (GPQA 82.8 vs 71.2). The gap is large enough that GPT-5.4 nano effectively replaces GPT-5 nano for anything beyond the cheapest possible classification tasks.

The cost math: At $0.20/M input, nano processes 5 million input tokens per dollar. For a classification pipeline handling 100M tokens/month, GPT-5.4 nano costs $20/month. GPT-5.4 mini would cost $75/month for the same volume. That 3.75x multiplier matters at scale.

Where nano makes sense: High-volume tasks where cost dominates — classification, tagging, simple extraction, content filtering. For anything requiring strong reasoning or coding, the step up to mini ($0.75) is worth the extra cost.

MiniMax M2.7: the coding wildcard

MiniMax M2.7 is the surprise of this batch. At $0.30/M input — cheaper than both GPT-5.4 mini and nano for quality coding — it posts the highest SWE-bench Pro score in the budget tier: 56.22.

Benchmark MiniMax M2.7 GPT-5.4 mini GPT-5.4 nano Claude Haiku 4.5
SWE-bench Pro 56.22 54.4 52.4 46
Terminal-Bench 2.0 57 60 46.3 53
SWE-Multilingual 76.5
MLE-Bench-Lite 66.6
Toolathlon 46.3 42.9 35.5

MiniMax M2.7 beats GPT-5.4 mini on SWE-bench Pro by nearly 2 points while costing 2.5x less on input tokens. On SWE-Multilingual (76.5) and MLE-Bench-Lite (66.6), it shows strong coding breadth that the OpenAI budget models haven't been tested on yet.

The caveat: MiniMax M2.7 has only 10 benchmark results published. There's no GPQA, no HLE, no MMMU-Pro, no AIME data. The 24 overall score reflects sparse coverage, not necessarily weak performance. For coding and agentic tasks specifically, the data that exists is strong. For everything else — knowledge, math, reasoning, instruction following — we simply don't have enough signal.

200K context is another differentiator. At $0.30/M input, feeding large codebases into M2.7 is dramatically cheaper than any alternative with comparable SWE-bench scores.

Head-to-head: which budget model wins?

Coding

Model SWE-bench Pro LiveCodeBench Price (in/out)
MiniMax M2.7 56.22 $0.30/$1.20
GPT-5.4 mini 54.4 $0.75/$4.50
GPT-5.4 nano 52.4 $0.20/$1.25
Claude Haiku 4.5 46 36 $0.80/$4.00
Gemini 3 Flash 44 36 $0.50/$3.00
DeepSeek V3 37.6 $0.27/$1.10

MiniMax M2.7 leads. For budget coding workloads — code review, bug fixing, refactors — it's the best value option in the tier. GPT-5.4 mini is close behind with the added benefit of being a reasoning model.

Agentic tasks

Model Terminal-Bench 2.0 OSWorld-Verified Price (in/out)
GPT-5.4 mini 60 72.1 $0.75/$4.50
MiniMax M2.7 57 $0.30/$1.20
Gemini 3 Flash 56 53 $0.50/$3.00
Claude Haiku 4.5 53 57 $0.80/$4.00
GPT-5.4 nano 46.3 39 $0.20/$1.25
GPT-5 nano 38 30 $0.05/$0.40

GPT-5.4 mini dominates agentic benchmarks in this tier. OSWorld-Verified 72.1 is a standout — closer to full GPT-5.4 (85) than any other budget model gets to its flagship sibling. If you're building an agent on a budget, mini is the pick.

Knowledge

Model GPQA HLE Price (in/out)
GPT-5.4 mini 88 41.5 $0.75/$4.50
GPT-5.4 nano 82.8 37.7 $0.20/$1.25
GPT-5 nano 71.2 $0.05/$0.40
Gemini 3 Flash 69 6 $0.50/$3.00
Claude Haiku 4.5 67 11 $0.80/$4.00
DeepSeek V3 59.1 $0.27/$1.10

GPT-5.4 mini and nano dominate knowledge benchmarks in the budget tier. HLE scores of 41.5 and 37.7 are particularly impressive — Claude Haiku 4.5 scores 11 and Gemini 3 Flash scores 6 on the same benchmark.

Multimodal

Model MMMU-Pro Price (in/out)
Claude Haiku 4.5 82 $0.80/$4.00
Gemini 3 Flash 80 $0.50/$3.00
GPT-5.4 mini 76.6 $0.75/$4.50
GPT-5.4 nano 66.1 $0.20/$1.25
GPT-5 nano 58 $0.05/$0.40

Claude Haiku 4.5 and Gemini 3 Flash lead the budget tier on multimodal. MiniMax M2.7 has no MMMU-Pro score — another gap in its benchmark coverage.

When to use what

High-volume classification and tagging — GPT-5 nano ($0.05/$0.40) or GPT-5.4 nano ($0.20/$1.25). If you're processing millions of tokens daily on simple tasks, nano-tier pricing is hard to argue with. GPT-5.4 nano is substantially better on quality if the 4x price increase fits your budget.

Budget coding assistant — MiniMax M2.7 ($0.30/$1.20). Highest SWE-bench Pro in the tier (56.22) at the second-lowest price. The 200K context window handles large codebases well. The caveat: limited benchmark coverage outside coding, so evaluate on your specific tasks.

Budget AI agent — GPT-5.4 mini ($0.75/$4.50). OSWorld-Verified 72.1 and Terminal-Bench 60 are the best agentic scores in the budget tier by a wide margin. The reasoning capability helps with multi-step agent workflows.

Long-context workloads — Gemini 3 Flash ($0.50/$3.00) with 1M context, or GPT-5.4 mini ($0.75/$4.50) with 400K. If you need 1M tokens of context at the cheapest possible price, Gemini 3 Flash is the only option. Gemini 3.1 Pro ($1.25/$5.00) also offers 1M context with much stronger benchmark scores.

Best budget all-rounder — Gemini 3.1 Pro ($1.25/$5.00). At 84 overall, it scores higher than every other model in this guide combined. Full benchmark coverage across all categories, 1M context, and $1.25/M input. If you can afford $1.25 instead of $0.30, this is the safest choice.

Cheapest reasoning model — GPT-5.4 nano ($0.20/$1.25). The only reasoning model under $0.50/M input with broad benchmark coverage. GPQA 82.8 and HLE 37.7 show real reasoning capability at an ultra-budget price.

The data gap problem

MiniMax M2.7's 24 overall score doesn't tell the full story. BenchLM.ai's ranking methodology requires breadth of benchmark coverage to produce a reliable overall score. With only 10 benchmarks — all concentrated in coding and agentic tasks — the 24 reflects data sparsity, not necessarily model quality.

On the benchmarks that do exist, M2.7 is competitive with or better than GPT-5.4 mini. SWE-bench Pro 56.22 and Terminal-Bench 57 are strong numbers. But without GPQA, HLE, AIME, MMMU-Pro, or instruction-following scores, it's impossible to rank M2.7 fairly against models with full coverage.

This is a recurring problem in AI benchmarking. As we covered in Are AI Benchmarks Reliable?, benchmark coverage and provenance matter as much as the scores themselves. A model with 10 strong scores and 20 unknowns is a riskier choice than a model with 25 moderate scores.

The practical takeaway: If your workload is coding or agentic tasks, MiniMax M2.7's published scores justify trying it. For general-purpose use, stick with models that have full benchmark coverage until more M2.7 data is available.

What this means for the market

Three takeaways from this week's releases:

1. The reasoning gap is closing at the bottom. GPT-5.4 mini and nano bring reasoning-class capability to the budget tier. A year ago, reasoning models started at $2.50/M input. Now you can get HLE 37.7 for $0.20/M input.

2. Chinese models keep punching above on coding. MiniMax M2.7 posting the highest SWE-bench Pro score in the budget tier — above both GPT-5.4 mini and nano — continues the trend of Chinese labs producing strong coding models at aggressive price points.

3. Budget doesn't mean weak anymore. GPT-5.4 mini's OSWorld-Verified 72.1 would have been a frontier-class score 12 months ago. The models that cost $0.30–$0.75/M input today are materially better than the $15/M models of early 2025.

Check the BenchLM.ai leaderboard for the latest scores as more benchmarks roll in for these models. Prices and capabilities shift fast — what's budget today is obsolete tomorrow.

Enjoyed this post?

Get weekly benchmark updates in your inbox.

Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.

Weekly LLM Updates

New model releases, benchmark scores, and leaderboard changes. Every Friday.

Free. Your signup is stored with a derived country code for compliance routing.