Current OpenAI API pricing from official docs: GPT-5.4, GPT-5.2, GPT-5.1, cached input rates, Batch API discounts, and the pricing details that actually matter.
Share This Report
Copy the link, post it, or save a PDF version.
OpenAI's API pricing is simpler than it looks — but most comparison tables miss the detail that matters most. Every GPT-5.4 family model has a cached input rate at 10% of normal pricing, and the Batch API cuts everything by 50%. That means the effective price of GPT-5.4 for a well-architected application is often half or less of the headline rate. This guide covers the real pricing, the decision tree for choosing the right model, and when OpenAI is (and isn't) the best value.
All prices below come from two official OpenAI sources: the live API pricing page for the GPT-5.4 family, and the GPT-5.2 launch page for GPT-5.2, GPT-5.1, and GPT-5 Pro pricing. Use our cost calculator for quick estimates and our token counter to sanity-check prompt size before you ship.
| Model | Input $/M | Cached Input $/M | Output $/M | Batch Input $/M | Batch Output $/M |
|---|---|---|---|---|---|
| GPT-5.4 | $2.50 | $0.25 | $15.00 | $1.25 | $7.50 |
| GPT-5.4 mini | $0.75 | $0.075 | $4.50 | $0.375 | $2.25 |
| GPT-5.4 nano | $0.20 | $0.02 | $1.25 | $0.10 | $0.625 |
OpenAI notes that those rates are the standard processing rates for context lengths under 270K. The Batch columns reflect the flat 50% discount OpenAI applies to both input and output on the Batch API.
| Model | Input $/M | Cached Input $/M | Output $/M |
|---|---|---|---|
| GPT-5.2 | $1.75 | $0.175 | $14.00 |
| GPT-5.2 Pro | $21.00 | — | $168.00 |
| GPT-5.1 | $1.25 | $0.125 | $10.00 |
| GPT-5 Pro | $15.00 | — | $120.00 |
OpenAI says it has no current plans to deprecate GPT-5.1, GPT-5, or GPT-4.1 in the API. Those older rows still matter for real production systems — especially GPT-5.1, which we will come back to later.
This is the single most important detail that other pricing guides skip over: OpenAI's cached input rate is 10% of the standard input price, and it applies automatically. No code changes. No special API flag. If your request reuses a prefix that OpenAI has already computed, the cached tokens cost 90% less.
How it works: OpenAI stores the computed state of prompt prefixes on their side. When a subsequent request starts with the same bytes — same system prompt, same few-shot examples, same preamble — those tokens hit the cache. The model still processes them, but OpenAI charges you the cached rate instead of the standard rate. The output price is unchanged.
This is not an exotic optimization. It is the default behavior for any application that sends a consistent system prompt.
The table below shows what you really pay per million input tokens at different cache hit rates. Most production apps with a system prompt and a few examples achieve 40%–70% cache hit rates without any deliberate optimization.
| Model | Standard Input | Cached Input | 50% cache hits | 80% cache hits |
|---|---|---|---|---|
| GPT-5.4 | $2.50 | $0.25 | $1.375 | $0.70 |
| GPT-5.4 mini | $0.75 | $0.075 | $0.413 | $0.21 |
| GPT-5.4 nano | $0.20 | $0.02 | $0.11 | $0.056 |
The math: effective input = (cache hit rate x cached price) + ((1 - cache hit rate) x standard price). At 50% hits on GPT-5.4: (0.5 x $0.25) + (0.5 x $2.50) = $1.375. At 80% hits: (0.8 x $0.25) + (0.2 x $2.50) = $0.70.
At 80% cache hits, GPT-5.4's effective input cost ($0.70/M) is cheaper than GPT-5.1's standard input rate ($1.25/M). Caching can make the newer, better model cheaper than the older one. That is a counterintuitive result that breaks the usual "newer = more expensive" assumption.
At 50% hits, GPT-5.4 nano's effective input cost ($0.11/M) approaches the marginal cost of running a local model — except you get OpenAI's infrastructure, uptime, and no GPU management.
The following patterns naturally produce high cache hit rates:
You do not need to architect for caching. You need to avoid defeating it — for example, by randomizing prompt order or injecting unique tokens at the start of every request.
Most teams pick the wrong OpenAI model because they start at the top. The right process is bottom-up: start cheap, step up only when your eval set demands it.
GPT-5.4 nano is OpenAI's cheapest current-generation model. Run your eval set against it first. For classification, entity extraction, tagging, simple summarization, and structured output tasks — nano is often good enough. At $0.20/$1.25, the cost floor is so low that quality has to be measurably bad before stepping up makes financial sense.
If nano misses on quality — if your eval set shows accuracy drops or reasoning failures — move to GPT-5.4 mini. This is the sweet spot for most production tasks that need real reasoning without frontier costs. Mini is 3.75x more expensive than nano on input and 3.6x on output, so make sure the quality gain justifies the jump.
GPT-5.4 is the flagship. Do not default here — earn your way up. Use GPT-5.4 when mini falls short on your evals, when you need the strongest coding performance (SWE-bench Verified: 84), the best math reasoning (AIME 2025: 99), or the highest long-context accuracy (MRCRv2: 97). For tasks where you have measured a clear win, the 3.3x price jump from mini is justified.
GPT-5.1 is not deprecated. If you need a cheaper base model and GPT-5.4's latest capabilities are not required, GPT-5.1 at $1.25/$10.00 is still actively supported. With cached input at $0.125/M, it is absurdly cheap for tasks it handles well. We cover this in more depth below.
GPT-5.2 Pro at $21/$168 and GPT-5 Pro at $15/$120 are the premium reasoning tiers. Only reach for these when you have measured a clear quality gain that justifies the 8–10x price jump over the flagship. If you are not running competition-level math or frontier research tasks, you almost certainly do not need Pro pricing.
The most common mistake: teams that start with GPT-5.4, never test cheaper models, and overspend by 3–12x. We see this constantly. A classification task running on GPT-5.4 at $375/month that would perform identically on nano at $31/month. Run your evals before you commit to a tier.
Raw pricing does not tell you whether a model is a good deal. What matters is how much capability you get per dollar. The table below uses BenchLM's overall score to calculate output cost per score point — a rough but useful proxy for value.
| Model | BenchLM Score | Input $/M | Output $/M | Output cost per point |
|---|---|---|---|---|
| GPT-5.4 | 84 | $2.50 | $15.00 | $0.179 |
| Gemini 3.1 Pro Preview | 83 | $2.00 | $12.00 | $0.145 |
| Claude Opus 4.6 | 80 | $5.00 | $25.00 | $0.313 |
| DeepSeek V3.2 (chat) | 62 | $0.28 | $0.42 | $0.007 |
On aggregate cost-per-benchmark-point, Gemini 3.1 Pro Preview is a little cheaper than GPT-5.4 while sitting just behind it in BenchLM's current overall score. DeepSeek V3.2 is in a different league on raw cost efficiency, though the quality gap is still material.
Aggregate scores can be misleading. GPT-5.4's value proposition is strongest when your workload maps to specific benchmark dimensions where it leads:
If your workload maps to coding, math reasoning, long-context retrieval, or agentic tasks, GPT-5.4 is the right choice regardless of aggregate cost-per-point. If your workload is more general-purpose and price-sensitive, Gemini 3.1 Pro Preview sits close on overall quality at a lower price — see our Gemini API pricing guide for details.
Assume a coding assistant handling 1,000 requests per day, with 2,000 input tokens and 500 output tokens per request, running on GPT-5.4.
Baseline (no caching, no Batch):
With 60% cache hits:
With Batch API (no caching):
With both 60% cache hits + Batch API:
The same application goes from $375/month to $147/month just by enabling two features that require zero prompt changes. That is the power of stacking caching with Batch.
Assume a RAG-powered customer support bot handling 5,000 requests per day, with 4,000 input tokens (system prompt + retrieved documents + user query) and 300 output tokens per response, running on GPT-5.4 mini.
RAG applications tend to have higher cache hit rates because the system prompt is large and stable, and popular documents get retrieved repeatedly. Assume 70% cache hits.
Baseline (no caching):
With 70% cache hits:
That is a $283/month saving on a single workload, from a feature that requires no code changes. For input-heavy RAG workloads, caching is the single highest-leverage cost optimization available on OpenAI's platform.
The same coding assistant on GPT-5.4 nano — no caching, no Batch:
That is 12x cheaper than GPT-5.4's baseline. If your eval set shows nano handles your task, the cost difference is hard to argue with.
There is a common assumption that once a newer model ships, older models become irrelevant. On OpenAI's platform, that assumption costs money.
GPT-5.1 at $1.25/$10.00 is still actively supported. OpenAI has explicitly said it has no current plans to deprecate it. With cached input at $0.125/M, GPT-5.1 is one of the cheapest capable models available from any major provider.
The bottom line: do not confuse "newer" with "necessary." GPT-5.1 is still a strong model at a great price. Test before you upgrade.
OpenAI's pricing page says data residency and Regional Processing endpoints add 10% for all models released after March 5, 2026. This affects the entire GPT-5.4 family.
What that means in practice:
| Model | Standard Input $/M | With Data Residency (+10%) |
|---|---|---|
| GPT-5.4 | $2.50 | $2.75 |
| GPT-5.4 mini | $0.75 | $0.825 |
| GPT-5.4 nano | $0.20 | $0.22 |
The 10% surcharge applies to both input and output prices. For high-volume workloads with EU or regulatory compliance requirements, this adds up.
One underappreciated detail: GPT-5.1 and earlier are not affected by the data residency surcharge, because they were released before the March 5, 2026 cutoff. If you have compliance-driven routing requirements and GPT-5.1 handles your task, the combination of lower base pricing and no surcharge makes it even more attractive.
No pricing guide is honest if it only argues for its subject. Here is where other providers beat OpenAI on specific dimensions.
Claude Opus 4.6 — Arena IF score of 1500 vs. GPT-5.4's 1470. For long-form writing, editing, and prose where the output quality ceiling matters more than cost, Claude is still the preferred choice for many teams. The $5/$25 pricing is higher, but if your product is the output text, Opus often justifies the premium. See our Claude API pricing guide.
Gemini 2.5 Pro at $1.25/$10 for standard context under 200K tokens. Google's pricing is aggressive, and Gemini 3.1 Pro Preview stays close to GPT-5.4 on BenchLM's current overall score while costing less. If you do not need OpenAI-specific features like cached-input pricing and the Batch API workflow, Gemini is a strong value option. See our Gemini API pricing guide.
DeepSeek at $0.028/M for cache hits. If your workload is high-volume and quality requirements are moderate, DeepSeek's pricing is in a different category entirely. The quality gap relative to GPT-5.4 is still large in BenchLM's current data (62 vs. 84), but for tasks where DeepSeek's quality is sufficient, nothing else comes close on cost.
Google AI Studio offers ongoing free access to Gemini models. Unlike OpenAI's credit-based trial (which expires), Google's free tier is a genuine zero-cost option for prototyping and low-volume experimentation.
If you just want the shortest path to a sane OpenAI bill:
The real story of OpenAI's pricing is not the headline rate. It is the combination of cached input (10% of normal), Batch API (50% off), and a model lineup deep enough to match the right tier to your workload. Most teams that complain about OpenAI's cost have never tested nano, never checked their cache hit rate, and never tried Batch. Start there before you optimize anything else.
For a provider-level comparison across multiple vendors, see our LLM pricing overview. For deeper dives into other providers, see our guides on Claude API pricing, Gemini API pricing, and DeepSeek API pricing. To understand how token pricing works at a fundamental level, see how LLM token pricing works.
Model pricing changes frequently. We send one email a week with what moved and why.
Share This Report
Copy the link, post it, or save a PDF version.
On this page
Which models moved up, what’s new, and what it costs. One email a week, 3-min read.
Free. One email per week.
Current Anthropic Claude API pricing from official model pages and the Claude Opus 4.7 launch announcement, including prompt caching, batch discounts, and current long-context notes.
Current DeepSeek API pricing from the official docs: deepseek-chat and deepseek-reasoner, cache-hit vs cache-miss pricing, output pricing, and the current V3.2 endpoint mapping.
Current Gemini API pricing from Google's official docs: 3.1 Pro Preview, 3.1 Flash-Lite Preview, 3 Flash Preview, 2.5 Flash, 2.5 Pro, plus Batch and Flex pricing.