What is the best budget LLM in 2026?

For the best overall score at budget pricing, Gemini 3.1 Pro ($1.25/$5) still leads at 94. For coding specifically, MiniMax M2.7 remains compelling at $0.30/$1.20, but the gap is narrower than earlier snapshots suggested because the coding methodology now gives real weight to SWE-Rebench as well.

How does GPT-5.4 mini compare to GPT-5.4?

GPT-5.4 mini now scores 71 overall vs GPT-5.4's 88. On coding, mini still trails the full model, but where it shines is agentic work: 72.2 on OSWorld-Verified and 60 on Terminal-Bench 2.0. At $0.75/$4.50 vs $2.50/$15, mini is still 3.3x cheaper on input tokens.

Is GPT-5.4 nano good enough for production?

GPT-5.4 nano now scores 58 overall and still makes sense for classification, summarization, and simple Q&A at $0.20/$1.25. For coding or complex reasoning, the quality gap vs mini or full GPT-5.4 is still real.

Is MiniMax M2.7 available outside China?

MiniMax M2.7 is available via API with international access. At $0.30/$1.20 per million tokens with a 200K context window, it offers competitive pricing. The main caveat is sparse benchmark coverage — only 10 benchmarks vs 20+ for GPT models — making it harder to assess across all use cases.

What is the cheapest LLM API in 2026?

GPT-5 nano is the cheapest major LLM API at $0.05 per million input tokens and $0.40 per million output tokens. Seed 1.6 Flash ($0.08/$0.30) and Gemini 3.1 Flash-Lite ($0.10/$0.40) are close behind. These ultra-budget models are best suited for high-volume, lower-stakes tasks where cost matters more than peak quality.

Should I use Gemini 3.1 Pro or GPT-5.4 mini?

Gemini 3.1 Pro ($1.25/$5) scores 92 overall vs GPT-5.4 mini's 71. Gemini is still the better broad default. GPT-5.4 mini's advantages are lower input cost ($0.75 vs $1.25) and a more reasoning-first profile with stronger agentic rows.

Best Budget LLMs in 2026: GPT-5.4 Mini, Nano, MiniMax M2.7, and Every Cheap Model Ranked

GPT-5.4 mini and nano just landed alongside MiniMax M2.7 — three new budget models in 48 hours. The capability floor keeps rising while prices drop. GPT-5.4 mini brings reasoning-class intelligence to $0.75/M input. MiniMax M2.7 quietly beats it on SWE-bench Pro at less than half the price.

This guide ranks every major LLM under $1.50 per million input tokens by benchmark performance, with pricing breakdowns and use-case recommendations. All scores from the BenchLM.ai leaderboard and pricing page.

The budget tier landscape (March 2026)

There are now more than 15 models priced under $1.50/M input tokens. The quality range is enormous — from GPT-5 nano at $0.05/M input to Gemini 3.1 Pro at $1.25/M scoring 94 overall.

Ultra-budget: under $0.50/M input

Model	Creator	Input/Output	Context	Overall Score	Type
GPT-5 nano	OpenAI	$0.05/$0.40	400K	36	Reasoning
Seed 1.6 Flash	ByteDance	$0.08/$0.30	256K	—
Gemini 3.1 Flash-Lite	Google	$0.10/$0.40	1M	—
Step 3.5 Flash	StepFun	$0.10/$0.30	256K	—
GPT-5.4 nano	OpenAI	$0.20/$1.25	400K	58	Reasoning
Mercury 2	Inception	$0.25/$0.75	128K	—
DeepSeek V3	DeepSeek	$0.27/$1.10	128K	49	Non-Reasoning
DeepSeek Coder 2.0	DeepSeek	$0.27/$1.10	128K	62	Non-Reasoning
MiniMax M2.7	MiniMax	$0.30/$1.20	200K	60*	Non-Reasoning
Grok 3 Mini	xAI	$0.30/$0.50	128K	49*	Non-Reasoning

*MiniMax M2.7 and Grok 3 Mini still have sparse coverage relative to the best-supported frontier rows, so treat their overall scores as directional rather than definitive.

Budget-frontier: $0.50–$1.50/M input

Model	Creator	Input/Output	Context	Overall Score	Type
Gemini 3 Flash	Google	$0.50/$3.00	1M	67	Non-Reasoning
Kimi K2.5	Moonshot	$0.50/$2.80	256K	68	Non-Reasoning
DeepSeek R1	DeepSeek	$0.55/$2.19	128K	45	Reasoning
GPT-5.4 mini	OpenAI	$0.75/$4.50	400K	71	Reasoning
Claude Haiku 4.5	Anthropic	$1.00/$5.00	200K	63	Non-Reasoning
GLM-5-Turbo	Zhipu	$1.20/$4.00	200K	—
Gemini 3.1 Pro	Google	$1.25/$5.00	1M	94	Non-Reasoning

For reference, the full GPT-5.4 costs $2.50/$15.00 and scores 88 overall. The jump from $1.25 (Gemini 3.1 Pro, score 94) to $2.50 (GPT-5.4, score 88) now actually favors Gemini on overall score — though GPT-5.4 still holds strong frontier-class individual benchmarks.

GPT-5.4 mini: reasoning on a budget

GPT-5.4 mini is OpenAI's reasoning model at budget pricing — $0.75/M input, 3.3x cheaper than GPT-5.4. It now scores 71 overall with a 400K context window.

Where mini stands out:

Agentic tasks: OSWorld-Verified 72.2 (vs full GPT-5.4's 85), Terminal-Bench 2.0 at 60, Tau2Bench 93.4. These are strong agentic scores for a budget model — OSWorld 72.2 beats Claude Haiku 4.5 (57) and Gemini 3 Flash (53) comfortably.
Knowledge: GPQA 88 is solid — above Gemini 3 Flash (69) and Claude Haiku 4.5 (67), though below Gemini 3.1 Pro (97). HLE 41.5 is a standout — this is a hard benchmark where most budget models score in single digits.
Multimodal: MMMU-Pro 76.6 is competitive, trailing only Gemini 3.1 Pro (95) and Claude Haiku 4.5 (82) in the budget tier.

Where mini falls short:

Coding: SWE-bench Pro 54.4 is a big step down from GPT-5.4's 85. For a coding-focused workload, the quality gap is real.
Reasoning depth: MRCR v2 at 40.7 shows the long-context reasoning ceiling. Full GPT-5.4 scores 97 here.
Sparse math data: No AIME or HMMT scores published yet, making it hard to evaluate math capability.

The pitch: GPT-5.4 mini makes sense when you need a reasoning model with agentic capability at budget pricing. For pure knowledge or coding tasks, Gemini 3.1 Pro at $1.25 is stronger across the board.

GPT-5.4 nano: how low can you go?

GPT-5.4 nano costs $0.20/M input — 12.5x cheaper than full GPT-5.4. It now lands in the high-50s on BenchLM's overall score and materially outperforms the older GPT-5 nano budget row, but with a different capability profile.

Key scores:

Benchmark	GPT-5.4 nano	GPT-5 nano	GPT-5.4 mini
GPQA	82.8	71.2	88
HLE	37.7	—	41.5
SWE-bench Pro	52.4	22	54.4
Terminal-Bench 2.0	46.3	38	60
OSWorld-Verified	39	30	72.1
MMMU-Pro	66.1	58	76.6

GPT-5.4 nano beats GPT-5 nano on every available benchmark — especially coding (SWE-bench Pro 52.4 vs 22) and knowledge (GPQA 82.8 vs 71.2). The gap is large enough that GPT-5.4 nano effectively replaces GPT-5 nano for anything beyond the cheapest possible classification tasks.

The cost math: At $0.20/M input, nano processes 5 million input tokens per dollar. For a classification pipeline handling 100M tokens/month, GPT-5.4 nano costs $20/month. GPT-5.4 mini would cost $75/month for the same volume. That 3.75x multiplier matters at scale.

Where nano makes sense: High-volume tasks where cost dominates — classification, tagging, simple extraction, content filtering. For anything requiring strong reasoning or coding, the step up to mini ($0.75) is worth the extra cost.

MiniMax M2.7: the coding wildcard

MiniMax M2.7 is the surprise of this batch. At $0.30/M input — cheaper than both GPT-5.4 mini and nano for quality coding — it posts the highest SWE-bench Pro score in the budget tier: 56.22.

Benchmark	MiniMax M2.7	GPT-5.4 mini	GPT-5.4 nano	Claude Haiku 4.5
SWE-bench Pro	56.22	54.4	52.4	46
Terminal-Bench 2.0	57	60	46.3	53
SWE-Multilingual	76.5	—	—	—
MLE-Bench-Lite	66.6	—	—	—
Toolathlon	46.3	42.9	35.5	—

MiniMax M2.7 beats GPT-5.4 mini on SWE-bench Pro by nearly 2 points while costing 2.5x less on input tokens. On SWE-Multilingual (76.5) and MLE-Bench-Lite (66.6), it shows strong coding breadth that the OpenAI budget models haven't been tested on yet.

The caveat: MiniMax M2.7 still has sparse coverage relative to the best-supported frontier rows. Its 60 overall score is directionally useful, but it still rests on a narrow slice of coding and agentic evidence rather than broad cross-category coverage.

200K context is another differentiator. At $0.30/M input, feeding large codebases into M2.7 is dramatically cheaper than any alternative with comparable SWE-bench scores.

Head-to-head: which budget model wins?

Coding

Model	SWE-bench Pro	LiveCodeBench	Price (in/out)
MiniMax M2.7	56.22	—	$0.30/$1.20
GPT-5.4 mini	54.4	—	$0.75/$4.50
GPT-5.4 nano	52.4	—	$0.20/$1.25
Claude Haiku 4.5	46	36	$1.00/$5.00
Gemini 3 Flash	44	36	$0.50/$3.00
DeepSeek V3	—	37.6	$0.27/$1.10

MiniMax M2.7 leads. For budget coding workloads — code review, bug fixing, refactors — it's the best value option in the tier. GPT-5.4 mini is close behind with the added benefit of being a reasoning model.

Agentic tasks

Model	Terminal-Bench 2.0	OSWorld-Verified	Price (in/out)
GPT-5.4 mini	60	72.2	$0.75/$4.50
MiniMax M2.7	57	—	$0.30/$1.20
Gemini 3 Flash	56	53	$0.50/$3.00
Claude Haiku 4.5	41	57	$1.00/$5.00
GPT-5.4 nano	46.3	39	$0.20/$1.25
GPT-5 nano	38	30	$0.05/$0.40

GPT-5.4 mini dominates agentic benchmarks in this tier. OSWorld-Verified 72.2 is a standout — closer to full GPT-5.4 (85) than any other budget model gets to its flagship sibling. If you're building an agent on a budget, mini is the pick.

Knowledge

Model	GPQA	HLE	Price (in/out)
GPT-5.4 mini	88	41.5	$0.75/$4.50
GPT-5.4 nano	82.8	37.7	$0.20/$1.25
GPT-5 nano	71.2	—	$0.05/$0.40
Gemini 3 Flash	69	6	$0.50/$3.00
Claude Haiku 4.5	67	11	$1.00/$5.00
DeepSeek V3	59.1	—	$0.27/$1.10

GPT-5.4 mini and nano dominate knowledge benchmarks in the budget tier. HLE scores of 41.5 and 37.7 are particularly impressive — Claude Haiku 4.5 scores 11 and Gemini 3 Flash scores 6 on the same benchmark.

Multimodal

Model	MMMU-Pro	Price (in/out)
Claude Haiku 4.5	82	$1.00/$5.00
Gemini 3 Flash	80	$0.50/$3.00
GPT-5.4 mini	76.6	$0.75/$4.50
GPT-5.4 nano	66.1	$0.20/$1.25
GPT-5 nano	58	$0.05/$0.40

Claude Haiku 4.5 and Gemini 3 Flash lead the budget tier on multimodal. MiniMax M2.7 has no MMMU-Pro score — another gap in its benchmark coverage.

When to use what

High-volume classification and tagging — GPT-5 nano ($0.05/$0.40) or GPT-5.4 nano ($0.20/$1.25). If you're processing millions of tokens daily on simple tasks, nano-tier pricing is hard to argue with. GPT-5.4 nano is substantially better on quality if the 4x price increase fits your budget.

Budget coding assistant — MiniMax M2.7 ($0.30/$1.20). Highest SWE-bench Pro in the tier (56.22) at the second-lowest price. The 200K context window handles large codebases well. The caveat: limited benchmark coverage outside coding, so evaluate on your specific tasks.

Budget AI agent — GPT-5.4 mini ($0.75/$4.50). OSWorld-Verified 72.2 and Terminal-Bench 60 are the best agentic scores in the budget tier by a wide margin. The reasoning capability helps with multi-step agent workflows.

Long-context workloads — Gemini 3 Flash ($0.50/$3.00) with 1M context, or GPT-5.4 mini ($0.75/$4.50) with 400K. If you need 1M tokens of context at the cheapest possible price, Gemini 3 Flash is the only option. Gemini 3.1 Pro ($1.25/$5.00) also offers 1M context with much stronger benchmark scores.

Best budget all-rounder — Gemini 3.1 Pro ($1.25/$5.00). At 94 overall, it still scores higher than every other model in this guide. Full benchmark coverage across all categories, 1M context, and $1.25/M input. If you can afford $1.25 instead of $0.30, this is still the safest choice.

Cheapest reasoning model — GPT-5.4 nano ($0.20/$1.25). The only reasoning model under $0.50/M input with broad benchmark coverage. GPQA 82.8 and HLE 37.7 show real reasoning capability at an ultra-budget price.

The data gap problem

MiniMax M2.7's current overall score still does not tell the full story. BenchLM.ai's ranking methodology requires breadth of benchmark coverage to produce a reliable overall score. MiniMax has a much better coding row than its general-purpose confidence level suggests.

On the benchmarks that do exist, M2.7 is competitive with or better than GPT-5.4 mini. SWE-bench Pro 56.22 and Terminal-Bench 57 are strong numbers. But without GPQA, HLE, AIME, MMMU-Pro, or instruction-following scores, it's impossible to rank M2.7 fairly against models with full coverage.

This is a recurring problem in AI benchmarking. As we covered in Are AI Benchmarks Reliable?, benchmark coverage and provenance matter as much as the scores themselves. A model with 10 strong scores and 20 unknowns is a riskier choice than a model with 25 moderate scores.

The practical takeaway: If your workload is coding or agentic tasks, MiniMax M2.7's published scores justify trying it. For general-purpose use, stick with models that have full benchmark coverage until more M2.7 data is available.

What this means for the market

Three takeaways from this week's releases:

1. The reasoning gap is closing at the bottom. GPT-5.4 mini and nano bring reasoning-class capability to the budget tier. A year ago, reasoning models started at $2.50/M input. Now you can get HLE 37.7 for $0.20/M input.

2. Chinese models keep punching above on coding. MiniMax M2.7 posting the highest SWE-bench Pro score in the budget tier — above both GPT-5.4 mini and nano — continues the trend of Chinese labs producing strong coding models at aggressive price points.

3. Budget doesn't mean weak anymore. GPT-5.4 mini's OSWorld-Verified 72.2 would have been a frontier-class score 12 months ago. The models that cost $0.30–$0.75/M input today are materially better than the $15/M models of early 2025.

Check the BenchLM.ai leaderboard for the latest scores as more benchmarks roll in for these models. Prices and capabilities shift fast — what's budget today is obsolete tomorrow.