Skip to main content
pricingopenaigpt-5apicostguide

OpenAI API Pricing: GPT-5.4, GPT-5.2, and GPT-5.1 (April 2026)

Current OpenAI API pricing from official docs: GPT-5.4, GPT-5.2, GPT-5.1, cached input rates, Batch API discounts, and the pricing details that actually matter.

Glevd·Published April 13, 2026·14 min read

Share This Report

Copy the link, post it, or save a PDF version.

Share on XShare on LinkedIn

OpenAI's API pricing is simpler than it looks — but most comparison tables miss the detail that matters most. Every GPT-5.4 family model has a cached input rate at 10% of normal pricing, and the Batch API cuts everything by 50%. That means the effective price of GPT-5.4 for a well-architected application is often half or less of the headline rate. This guide covers the real pricing, the decision tree for choosing the right model, and when OpenAI is (and isn't) the best value.

All prices below come from two official OpenAI sources: the live API pricing page for the GPT-5.4 family, and the GPT-5.2 launch page for GPT-5.2, GPT-5.1, and GPT-5 Pro pricing. Use our cost calculator for quick estimates and our token counter to sanity-check prompt size before you ship.

Current OpenAI pricing at a glance

GPT-5.4 family on the live pricing page

Model Input $/M Cached Input $/M Output $/M Batch Input $/M Batch Output $/M
GPT-5.4 $2.50 $0.25 $15.00 $1.25 $7.50
GPT-5.4 mini $0.75 $0.075 $4.50 $0.375 $2.25
GPT-5.4 nano $0.20 $0.02 $1.25 $0.10 $0.625

OpenAI notes that those rates are the standard processing rates for context lengths under 270K. The Batch columns reflect the flat 50% discount OpenAI applies to both input and output on the Batch API.

GPT-5.2 and earlier GPT-5 pricing still published by OpenAI

Model Input $/M Cached Input $/M Output $/M
GPT-5.2 $1.75 $0.175 $14.00
GPT-5.2 Pro $21.00 $168.00
GPT-5.1 $1.25 $0.125 $10.00
GPT-5 Pro $15.00 $120.00

OpenAI says it has no current plans to deprecate GPT-5.1, GPT-5, or GPT-4.1 in the API. Those older rows still matter for real production systems — especially GPT-5.1, which we will come back to later.

The cached input advantage — OpenAI's real pricing

This is the single most important detail that other pricing guides skip over: OpenAI's cached input rate is 10% of the standard input price, and it applies automatically. No code changes. No special API flag. If your request reuses a prefix that OpenAI has already computed, the cached tokens cost 90% less.

How it works: OpenAI stores the computed state of prompt prefixes on their side. When a subsequent request starts with the same bytes — same system prompt, same few-shot examples, same preamble — those tokens hit the cache. The model still processes them, but OpenAI charges you the cached rate instead of the standard rate. The output price is unchanged.

This is not an exotic optimization. It is the default behavior for any application that sends a consistent system prompt.

What effective input pricing actually looks like

The table below shows what you really pay per million input tokens at different cache hit rates. Most production apps with a system prompt and a few examples achieve 40%–70% cache hit rates without any deliberate optimization.

Model Standard Input Cached Input 50% cache hits 80% cache hits
GPT-5.4 $2.50 $0.25 $1.375 $0.70
GPT-5.4 mini $0.75 $0.075 $0.413 $0.21
GPT-5.4 nano $0.20 $0.02 $0.11 $0.056

The math: effective input = (cache hit rate x cached price) + ((1 - cache hit rate) x standard price). At 50% hits on GPT-5.4: (0.5 x $0.25) + (0.5 x $2.50) = $1.375. At 80% hits: (0.8 x $0.25) + (0.2 x $2.50) = $0.70.

The key insight most teams miss

At 80% cache hits, GPT-5.4's effective input cost ($0.70/M) is cheaper than GPT-5.1's standard input rate ($1.25/M). Caching can make the newer, better model cheaper than the older one. That is a counterintuitive result that breaks the usual "newer = more expensive" assumption.

At 50% hits, GPT-5.4 nano's effective input cost ($0.11/M) approaches the marginal cost of running a local model — except you get OpenAI's infrastructure, uptime, and no GPU management.

What drives cache hit rates

The following patterns naturally produce high cache hit rates:

  • Consistent system prompts. If every request starts with the same 500-token system prompt, that prefix is cached after the first call. Most chat applications hit 30%–50% cache rates from the system prompt alone.
  • Few-shot examples. Stable few-shot blocks that do not change between requests are perfect cache candidates.
  • Shared document context. RAG applications that retrieve the same popular documents repeatedly see high hit rates on the document prefix.
  • Multi-turn conversations. Each turn reuses the full history of the conversation, which is already cached from the previous turn.

You do not need to architect for caching. You need to avoid defeating it — for example, by randomizing prompt order or injecting unique tokens at the start of every request.

The model decision tree

Most teams pick the wrong OpenAI model because they start at the top. The right process is bottom-up: start cheap, step up only when your eval set demands it.

Step 1: Start with GPT-5.4 nano ($0.20 / $1.25)

GPT-5.4 nano is OpenAI's cheapest current-generation model. Run your eval set against it first. For classification, entity extraction, tagging, simple summarization, and structured output tasks — nano is often good enough. At $0.20/$1.25, the cost floor is so low that quality has to be measurably bad before stepping up makes financial sense.

Step 2: Step up to GPT-5.4 mini ($0.75 / $4.50)

If nano misses on quality — if your eval set shows accuracy drops or reasoning failures — move to GPT-5.4 mini. This is the sweet spot for most production tasks that need real reasoning without frontier costs. Mini is 3.75x more expensive than nano on input and 3.6x on output, so make sure the quality gain justifies the jump.

Step 3: Step up to GPT-5.4 ($2.50 / $15.00)

GPT-5.4 is the flagship. Do not default here — earn your way up. Use GPT-5.4 when mini falls short on your evals, when you need the strongest coding performance (SWE-bench Verified: 84), the best math reasoning (AIME 2025: 99), or the highest long-context accuracy (MRCRv2: 97). For tasks where you have measured a clear win, the 3.3x price jump from mini is justified.

Step 4: Consider GPT-5.1 for value ($1.25 / $10.00)

GPT-5.1 is not deprecated. If you need a cheaper base model and GPT-5.4's latest capabilities are not required, GPT-5.1 at $1.25/$10.00 is still actively supported. With cached input at $0.125/M, it is absurdly cheap for tasks it handles well. We cover this in more depth below.

Step 5: GPT-5.2 Pro / GPT-5 Pro — frontier premium

GPT-5.2 Pro at $21/$168 and GPT-5 Pro at $15/$120 are the premium reasoning tiers. Only reach for these when you have measured a clear quality gain that justifies the 8–10x price jump over the flagship. If you are not running competition-level math or frontier research tasks, you almost certainly do not need Pro pricing.

The anti-pattern

The most common mistake: teams that start with GPT-5.4, never test cheaper models, and overspend by 3–12x. We see this constantly. A classification task running on GPT-5.4 at $375/month that would perform identically on nano at $31/month. Run your evals before you commit to a tier.

Benchmark-adjusted cost — is GPT-5.4 actually good value?

Raw pricing does not tell you whether a model is a good deal. What matters is how much capability you get per dollar. The table below uses BenchLM's overall score to calculate output cost per score point — a rough but useful proxy for value.

Model BenchLM Score Input $/M Output $/M Output cost per point
GPT-5.4 84 $2.50 $15.00 $0.179
Gemini 3.1 Pro Preview 83 $2.00 $12.00 $0.145
Claude Opus 4.6 80 $5.00 $25.00 $0.313
DeepSeek V3.2 (chat) 62 $0.28 $0.42 $0.007

On aggregate cost-per-benchmark-point, Gemini 3.1 Pro Preview is a little cheaper than GPT-5.4 while sitting just behind it in BenchLM's current overall score. DeepSeek V3.2 is in a different league on raw cost efficiency, though the quality gap is still material.

Where GPT-5.4 wins on specific dimensions

Aggregate scores can be misleading. GPT-5.4's value proposition is strongest when your workload maps to specific benchmark dimensions where it leads:

  • SWE-bench Verified: 84 — strong coding performance, especially on real-world software engineering tasks
  • AIME 2025: 99 — the strongest published math reasoning score among mainstream models
  • MRCRv2: 97 — long-context accuracy, meaning GPT-5.4 reliably finds and uses information across very large inputs
  • BrowseComp: 82.7 — agentic web browsing tasks, relevant for autonomous agent workloads

If your workload maps to coding, math reasoning, long-context retrieval, or agentic tasks, GPT-5.4 is the right choice regardless of aggregate cost-per-point. If your workload is more general-purpose and price-sensitive, Gemini 3.1 Pro Preview sits close on overall quality at a lower price — see our Gemini API pricing guide for details.

Real cost examples — with and without caching

Example 1: Coding assistant

Assume a coding assistant handling 1,000 requests per day, with 2,000 input tokens and 500 output tokens per request, running on GPT-5.4.

Baseline (no caching, no Batch):

  • Daily input: 1,000 x 2,000 x $2.50 / 1M = $5.00
  • Daily output: 1,000 x 500 x $15.00 / 1M = $7.50
  • Monthly total: ~$375

With 60% cache hits:

  • Effective input rate: (0.6 x $0.25) + (0.4 x $2.50) = $1.15/M
  • Daily input: 1,000 x 2,000 x $1.15 / 1M = $2.30
  • Daily output: unchanged at $7.50
  • Monthly total: ~$294 (22% savings from caching alone)

With Batch API (no caching):

  • Daily input: 1,000 x 2,000 x $1.25 / 1M = $2.50
  • Daily output: 1,000 x 500 x $7.50 / 1M = $3.75
  • Monthly total: ~$187.50

With both 60% cache hits + Batch API:

  • Effective batch input rate: (0.6 x $0.125) + (0.4 x $1.25) = $0.575/M
  • Daily input: 1,000 x 2,000 x $0.575 / 1M = $1.15
  • Daily output: 1,000 x 500 x $7.50 / 1M = $3.75
  • Monthly total: ~$147 (61% savings vs. baseline)

The same application goes from $375/month to $147/month just by enabling two features that require zero prompt changes. That is the power of stacking caching with Batch.

Example 2: RAG application with high cache hits

Assume a RAG-powered customer support bot handling 5,000 requests per day, with 4,000 input tokens (system prompt + retrieved documents + user query) and 300 output tokens per response, running on GPT-5.4 mini.

RAG applications tend to have higher cache hit rates because the system prompt is large and stable, and popular documents get retrieved repeatedly. Assume 70% cache hits.

Baseline (no caching):

  • Daily input: 5,000 x 4,000 x $0.75 / 1M = $15.00
  • Daily output: 5,000 x 300 x $4.50 / 1M = $6.75
  • Monthly total: ~$652.50

With 70% cache hits:

  • Effective input rate: (0.7 x $0.075) + (0.3 x $0.75) = $0.2775/M
  • Daily input: 5,000 x 4,000 x $0.2775 / 1M = $5.55
  • Daily output: unchanged at $6.75
  • Monthly total: ~$369 (43% savings)

That is a $283/month saving on a single workload, from a feature that requires no code changes. For input-heavy RAG workloads, caching is the single highest-leverage cost optimization available on OpenAI's platform.

What about GPT-5.4 nano?

The same coding assistant on GPT-5.4 nano — no caching, no Batch:

  • Daily input: 1,000 x 2,000 x $0.20 / 1M = $0.40
  • Daily output: 1,000 x 500 x $1.25 / 1M = $0.625
  • Monthly total: ~$30.75

That is 12x cheaper than GPT-5.4's baseline. If your eval set shows nano handles your task, the cost difference is hard to argue with.

The legacy model opportunity

There is a common assumption that once a newer model ships, older models become irrelevant. On OpenAI's platform, that assumption costs money.

GPT-5.1 at $1.25/$10.00 is still actively supported. OpenAI has explicitly said it has no current plans to deprecate it. With cached input at $0.125/M, GPT-5.1 is one of the cheapest capable models available from any major provider.

When to stay on GPT-5.1

  • Your eval set shows comparable quality. If GPT-5.1 handles your task as well as GPT-5.4, there is no reason to pay 2x for input and 1.5x for output.
  • You do not need GPT-5.4's coding and reasoning improvements. GPT-5.4's strongest gains are on SWE-bench, AIME, and long-context benchmarks. If your workload is straightforward text processing, GPT-5.1 may perform identically.
  • Your budget does not justify the upgrade. For teams processing tens of millions of tokens per month, the gap between $1.25 and $2.50 per million input tokens adds up fast.

When to upgrade from GPT-5.1

  • Your eval set shows measurable quality regressions on GPT-5.1.
  • You need GPT-5.4's improved agentic capabilities (BrowseComp, Terminal-Bench).
  • You are building for coding tasks where SWE-bench Verified matters.
  • You want the latest safety and instruction-following improvements.

The bottom line: do not confuse "newer" with "necessary." GPT-5.1 is still a strong model at a great price. Test before you upgrade.

The data residency surcharge

OpenAI's pricing page says data residency and Regional Processing endpoints add 10% for all models released after March 5, 2026. This affects the entire GPT-5.4 family.

What that means in practice:

Model Standard Input $/M With Data Residency (+10%)
GPT-5.4 $2.50 $2.75
GPT-5.4 mini $0.75 $0.825
GPT-5.4 nano $0.20 $0.22

The 10% surcharge applies to both input and output prices. For high-volume workloads with EU or regulatory compliance requirements, this adds up.

One underappreciated detail: GPT-5.1 and earlier are not affected by the data residency surcharge, because they were released before the March 5, 2026 cutoff. If you have compliance-driven routing requirements and GPT-5.1 handles your task, the combination of lower base pricing and no surcharge makes it even more attractive.

When OpenAI isn't the right choice

No pricing guide is honest if it only argues for its subject. Here is where other providers beat OpenAI on specific dimensions.

Best writing quality

Claude Opus 4.6 — Arena IF score of 1500 vs. GPT-5.4's 1470. For long-form writing, editing, and prose where the output quality ceiling matters more than cost, Claude is still the preferred choice for many teams. The $5/$25 pricing is higher, but if your product is the output text, Opus often justifies the premium. See our Claude API pricing guide.

Cheapest frontier-tier model

Gemini 2.5 Pro at $1.25/$10 for standard context under 200K tokens. Google's pricing is aggressive, and Gemini 3.1 Pro Preview stays close to GPT-5.4 on BenchLM's current overall score while costing less. If you do not need OpenAI-specific features like cached-input pricing and the Batch API workflow, Gemini is a strong value option. See our Gemini API pricing guide.

Maximum cost efficiency

DeepSeek at $0.028/M for cache hits. If your workload is high-volume and quality requirements are moderate, DeepSeek's pricing is in a different category entirely. The quality gap relative to GPT-5.4 is still large in BenchLM's current data (62 vs. 84), but for tasks where DeepSeek's quality is sufficient, nothing else comes close on cost.

Free tier for prototyping

Google AI Studio offers ongoing free access to Gemini models. Unlike OpenAI's credit-based trial (which expires), Google's free tier is a genuine zero-cost option for prototyping and low-volume experimentation.

The practical takeaway

If you just want the shortest path to a sane OpenAI bill:

  1. Test GPT-5.4 nano first. At $0.20/$1.25, the cost floor is low enough that you should always start here and step up based on eval results.
  2. Step up to GPT-5.4 mini before the flagship. Mini at $0.75/$4.50 handles most production tasks that need real reasoning.
  3. Use GPT-5.4 when your evals prove the upgrade is worth it. Do not default to the flagship because it is the newest model.
  4. Do not forget GPT-5.1. At $1.25/$10.00 with no deprecation plans, it is the best value in OpenAI's lineup for workloads it handles well.
  5. Enable cached inputs and the Batch API before you start hunting for smaller prompt wins. These two features can cut your bill by 50%+ with zero prompt engineering effort.
  6. Factor in data residency if you have compliance requirements — +10% on GPT-5.4 family models, but GPT-5.1 and earlier are unaffected.

The real story of OpenAI's pricing is not the headline rate. It is the combination of cached input (10% of normal), Batch API (50% off), and a model lineup deep enough to match the right tier to your workload. Most teams that complain about OpenAI's cost have never tested nano, never checked their cache hit rate, and never tried Batch. Start there before you optimize anything else.

For a provider-level comparison across multiple vendors, see our LLM pricing overview. For deeper dives into other providers, see our guides on Claude API pricing, Gemini API pricing, and DeepSeek API pricing. To understand how token pricing works at a fundamental level, see how LLM token pricing works.

Model pricing changes frequently. We send one email a week with what moved and why.