OpenAI API Pricing (July 2026): GPT-5.5 & 5.6 Sol Rates

OpenAI API pricing as of July 2026: the new GPT-5.6 Sol flagship costs $5/$30 per million input/output tokens, GPT-5.6 Terra $2.50/$15, and GPT-5.6 Luna $1/$6, with cached-input reads discounted 90%. GPT-5.5 matches Sol at $5/$30; the Batch API halves everything.

Last synced: July 12, 2026 — the pricing tables on this page rebuild from BenchLM's pricing dataset on every site build.

OpenAI's API pricing is simpler than it looks — but most comparison tables miss the detail that matters most. Every current OpenAI model has a cached input rate at 10% of normal pricing (GPT-5.6 additionally bills cache writes at 1.25x with a 30-minute minimum cache life), and the Batch API cuts everything by 50%. That means the effective price of a well-architected application is often half or less of the headline rate. This guide covers the real pricing, the decision tree for choosing the right model, and when OpenAI is (and isn't) the best value.

All prices below come from official OpenAI sources: the live API pricing page, the July 9, 2026 GPT-5.6 GA launch (Sol, Terra, Luna), and the GPT-5.5 launch page. Use our cost calculator for quick estimates and our token counter to sanity-check prompt size before you ship. For the full per-token comparison, see the new OpenAI API pricing hub.

Current OpenAI prices (auto-updated)

The table and changelog below render from our pricing dataset on every site build — last refreshed July 12, 2026. The analysis further down references provider announcements with their original dates; when the two disagree, this table is current.

Model	Creator	Input	Output	Context	Overall Score
GPT-5.6 Luna	OpenAI	$1	$6	1M	73
GPT-5 (high)	OpenAI	$1.25	$10	400K	72
GPT-5.1-Codex-Max	OpenAI	$1.25	$10	400K	72
GPT-5.2-Codex	OpenAI	$1.75	$14	400K	72
GPT-5.3 Codex	OpenAI	$1.75	$14	400K	65
GPT-5.6 Terra	OpenAI	$2.5	$15	1M	76
GPT-5.4	OpenAI	$2.5	$15	1.05M	66
GPT-5.6 Sol	OpenAI	$5	$30	1M	78
GPT-5.5	OpenAI	$5	$30	1M	74
o1-preview	OpenAI	$15	$60	200K	78
GPT-5.4 Pro	OpenAI	$30	$180	1.05M	82
GPT-5.5 Pro	OpenAI	$30	$180	1M	77

Recent OpenAI price changes in our registry:

2026-07-02: o3 cut from $10/$40 to $2/$8 per 1M tokens
2026-07-02: GPT-5.1 raised from $1.5/$6 to $1.25/$10 per 1M tokens
2026-07-02: GPT-5.2 raised from $2/$8 to $1.75/$14 per 1M tokens
2026-04-01: GPT-5.4 listed at $2.5/$15 per 1M tokens
2026-04-01: GPT-5.4 Pro listed at $30/$180 per 1M tokens
2026-04-01: GPT-5.4 mini listed at $0.75/$4.5 per 1M tokens
2026-02-15: GPT-5.2 listed at $2/$8 per 1M tokens

Current OpenAI pricing at a glance

GPT-5.6 and GPT-5.5: the current flagships (July 2026)

Model	Input $/M	Cached Input $/M	Output $/M	Notes
GPT-5.6 Sol	$5.00	$0.50	$30.00	Flagship; cache writes 1.25x, 30-min minimum cache life
GPT-5.6 Terra	$2.50	$0.25	$15.00	Balanced mid-tier, roughly half of Sol
GPT-5.6 Luna	$1.00	$0.10	$6.00	Fast, low-cost tier
GPT-5.5	$5.00	$0.50	$30.00	Batch & Flex at half rate; Priority at 2.5x
GPT-5.5 Pro	$30.00	—	$180.00	Highest-effort reasoning tier

The GPT-5.6 family went GA on July 9, 2026 with a 1M-token context window across all three tiers. Cached-input reads keep the standard 90% discount.

GPT-5.4 family on the live pricing page

Model	Input $/M	Cached Input $/M	Output $/M	Batch Input $/M	Batch Output $/M
GPT-5.4	$2.50	$0.25	$15.00	$1.25	$7.50
GPT-5.4 mini	$0.75	$0.075	$4.50	$0.375	$2.25
GPT-5.4 nano	$0.20	$0.02	$1.25	$0.10	$0.625

OpenAI notes that those rates are the standard processing rates for context lengths under 270K. The Batch columns reflect the flat 50% discount OpenAI applies to both input and output on the Batch API.

GPT-5.2 and earlier GPT-5 pricing still published by OpenAI

Model	Input $/M	Cached Input $/M	Output $/M
GPT-5.2	$1.75	$0.175	$14.00
GPT-5.2 Pro	$21.00	—	$168.00
GPT-5.1	$1.25	$0.125	$10.00
GPT-5 Pro	$15.00	—	$120.00

OpenAI says it has no current plans to deprecate GPT-5.1, GPT-5, or GPT-4.1 in the API. Those older rows still matter for real production systems — especially GPT-5.1, which we will come back to later.

The cached input advantage: OpenAI's real pricing

This is the single most important detail that other pricing guides skip over: OpenAI's cached input rate is 10% of the standard input price, and it applies automatically. No code changes. No special API flag. If your request reuses a prefix that OpenAI has already computed, the cached tokens cost 90% less.

How it works: OpenAI stores the computed state of prompt prefixes on their side. When a subsequent request starts with the same bytes — same system prompt, same few-shot examples, same preamble — those tokens hit the cache. The model still processes them, but OpenAI charges you the cached rate instead of the standard rate. The output price is unchanged.

This is not an exotic optimization. It is the default behavior for any application that sends a consistent system prompt.

What effective input pricing actually looks like

The table below shows what you really pay per million input tokens at different cache hit rates. Most production apps with a system prompt and a few examples achieve 40%–70% cache hit rates without any deliberate optimization.

Model	Standard Input	Cached Input	50% cache hits	80% cache hits
GPT-5.4	$2.50	$0.25	$1.375	$0.70
GPT-5.4 mini	$0.75	$0.075	$0.413	$0.21
GPT-5.4 nano	$0.20	$0.02	$0.11	$0.056

The math: effective input = (cache hit rate x cached price) + ((1 - cache hit rate) x standard price). At 50% hits on GPT-5.4: (0.5 x $0.25) + (0.5 x $2.50) = $1.375. At 80% hits: (0.8 x $0.25) + (0.2 x $2.50) = $0.70.

The key insight most teams miss

At 80% cache hits, GPT-5.4's effective input cost ($0.70/M) is cheaper than GPT-5.1's standard input rate ($1.25/M). Caching can make the newer, better model cheaper than the older one. That is a counterintuitive result that breaks the usual "newer = more expensive" assumption.

At 50% hits, GPT-5.4 nano's effective input cost ($0.11/M) approaches the marginal cost of running a local model — except you get OpenAI's infrastructure, uptime, and no GPU management.

What drives cache hit rates

The following patterns naturally produce high cache hit rates:

Four workload shapes cache well. Consistent system prompts are the biggest: if every request starts with the same 500-token prefix, it's cached after the first call, and most chat applications hit 30-50% cache rates from the system prompt alone. Stable few-shot blocks that don't change between requests are perfect candidates. RAG applications that retrieve the same popular documents repeatedly see high hit rates on the document prefix. And multi-turn conversations reuse the full history each turn, which is already cached from the turn before.

You do not need to architect for caching. You need to avoid defeating it — for example, by randomizing prompt order or injecting unique tokens at the start of every request.

The model decision tree

Most teams pick the wrong OpenAI model because they start at the top. The right process is bottom-up: start cheap, step up only when your eval set demands it.

Step 1: Start with GPT-5.4 nano ($0.20 / $1.25)

GPT-5.4 nano is OpenAI's cheapest current-generation model. Run your eval set against it first. For classification, entity extraction, tagging, simple summarization, and structured output tasks — nano is often good enough. At $0.20/$1.25, the cost floor is so low that quality has to be measurably bad before stepping up makes financial sense.

Step 2: Step up to GPT-5.4 mini ($0.75 / $4.50)

If nano misses on quality — if your eval set shows accuracy drops or reasoning failures — move to GPT-5.4 mini. This is the sweet spot for most production tasks that need real reasoning without frontier costs. Mini is 3.75x more expensive than nano on input and 3.6x on output, so make sure the quality gain justifies the jump.

Step 3: Step up to GPT-5.4 ($2.50 / $15.00)

GPT-5.4 is the flagship. Do not default here — earn your way up. Use GPT-5.4 when mini falls short on your evals, when you need the strongest coding performance (SWE-bench Verified: 84), the best math reasoning (AIME 2025: 99), or the highest long-context accuracy (MRCRv2: 97). For tasks where you have measured a clear win, the 3.3x price jump from mini is justified.

Step 4: Consider GPT-5.1 for value ($1.25 / $10.00)

GPT-5.1 is not deprecated. If you need a cheaper base model and GPT-5.4's latest capabilities are not required, GPT-5.1 at $1.25/$10.00 is still actively supported. With cached input at $0.125/M, it is absurdly cheap for tasks it handles well. We cover this in more depth below.

Step 5: GPT-5.2 Pro / GPT-5 Pro: frontier premium

GPT-5.2 Pro at $21/$168 and GPT-5 Pro at $15/$120 are the premium reasoning tiers. Only reach for these when you have measured a clear quality gain that justifies the 8–10x price jump over the flagship. If you are not running competition-level math or frontier research tasks, you almost certainly do not need Pro pricing.

The anti-pattern

The most common mistake: teams that start with GPT-5.4, never test cheaper models, and overspend by 3–12x. We see this constantly. A classification task running on GPT-5.4 at $375/month that would perform identically on nano at $31/month. Run your evals before you commit to a tier.

Benchmark-adjusted cost: is GPT-5.4 actually good value?

Raw pricing does not tell you whether a model is a good deal. What matters is how much capability you get per dollar. The table below uses our overall score to calculate output cost per score point — a rough but useful proxy for value.

Model	Overall Score	Input $/M	Output $/M	Output cost per point
GPT-5.4	84	$2.50	$15.00	$0.179
Gemini 3.1 Pro Preview	83	$2.00	$12.00	$0.145
Claude Opus 4.6	80	$5.00	$25.00	$0.313
DeepSeek V3.2 (chat)	62	$0.28	$0.42	$0.007

On aggregate cost-per-benchmark-point, Gemini 3.1 Pro Preview is a little cheaper than GPT-5.4 while sitting just behind it on the current overall score. DeepSeek V3.2 is in a different league on raw cost efficiency, though the quality gap is still material.

Where GPT-5.4 wins on specific dimensions

Aggregate scores can be misleading. GPT-5.4's value proposition is strongest when your workload maps to specific benchmark dimensions where it leads:

GPT-5.4 posts SWE-bench Verified: 84 (strong real-world software engineering), AIME 2025: 99 (the strongest published math reasoning score among mainstream models), MRCRv2: 97 (it reliably finds and uses information across very large inputs), and BrowseComp: 82.7 (agentic web browsing, relevant for autonomous agent workloads).

If your workload maps to coding, math reasoning, long-context retrieval, or agentic tasks, GPT-5.4 is the right choice regardless of aggregate cost-per-point. If your workload is more general-purpose and price-sensitive, Gemini 3.1 Pro Preview sits close on overall quality at a lower price — see our Gemini API pricing guide for details.

Real cost examples: with and without caching

Example 1: Coding assistant

Assume a coding assistant handling 1,000 requests per day, with 2,000 input tokens and 500 output tokens per request, running on GPT-5.4.

Configuration	Effective input rate	Daily input	Daily output	Monthly
Baseline (no caching, no Batch)	$2.50/M	$5.00	$7.50	~$375
60% cache hits	$1.15/M	$2.30	$7.50	~$294
Batch API only	$1.25/M	$2.50	$3.75	~$187.50
60% cache hits + Batch API	$0.575/M	$1.15	$3.75	~$147

The blended rates come straight from the discount math: a 60% cache-hit rate mixes $0.25/M cached input with $2.50/M uncached (0.6 x $0.25 + 0.4 x $2.50 = $1.15/M), and Batch halves both sides of the meter.

The same application goes from $375/month to $147/month just by enabling two features that require zero prompt changes. That is the power of stacking caching with Batch.

Example 2: RAG application with high cache hits

Assume a RAG-powered customer support bot handling 5,000 requests per day, with 4,000 input tokens (system prompt + retrieved documents + user query) and 300 output tokens per response, running on GPT-5.4 mini.

RAG applications tend to have higher cache hit rates because the system prompt is large and stable, and popular documents get retrieved repeatedly. Assume 70% cache hits.

Configuration	Effective input rate	Daily input	Daily output	Monthly
Baseline (no caching)	$0.75/M	$15.00	$6.75	~$652.50
70% cache hits	$0.2775/M	$5.55	$6.75	~$369

That is a $283/month saving on a single workload, from a feature that requires no code changes. For input-heavy RAG workloads, caching is the single highest-leverage cost optimization available on OpenAI's platform.

What about GPT-5.4 nano?

The same coding assistant on GPT-5.4 nano, with no caching and no Batch, runs $0.40/day of input (1,000 x 2,000 x $0.20 / 1M) plus $0.625/day of output (1,000 x 500 x $1.25 / 1M): ~$30.75/month.

That is 12x cheaper than GPT-5.4's baseline. If your eval set shows nano handles your task, the cost difference is hard to argue with.

The legacy model opportunity

There is a common assumption that once a newer model ships, older models become irrelevant. On OpenAI's platform, that assumption costs money.

GPT-5.1 at $1.25/$10.00 is still actively supported. OpenAI has explicitly said it has no current plans to deprecate it. With cached input at $0.125/M, GPT-5.1 is one of the cheapest capable models available from any major provider.

When to stay on GPT-5.1

Stay on GPT-5.1 when your eval set shows comparable quality: if it handles your task as well as GPT-5.4, there is no reason to pay 2x for input and 1.5x for output. GPT-5.4's strongest gains are on SWE-bench, AIME, and long-context benchmarks, so straightforward text processing may perform identically on the older model. And at tens of millions of tokens per month, the gap between $1.25 and $2.50 per million input tokens adds up fast.

When to upgrade from GPT-5.1

Your eval set shows measurable quality regressions on GPT-5.1.
You need GPT-5.4's improved agentic capabilities (BrowseComp, Terminal-Bench).
You are building for coding tasks where SWE-bench Verified matters.
You want the latest safety and instruction-following improvements.

The bottom line: do not confuse "newer" with "necessary." GPT-5.1 is still a strong model at a great price. Test before you upgrade.

The data residency surcharge

OpenAI's pricing page says data residency and Regional Processing endpoints add 10% for all models released after March 5, 2026. This affects the entire GPT-5.4 family.

What that means in practice:

Model	Standard Input $/M	With Data Residency (+10%)
GPT-5.4	$2.50	$2.75
GPT-5.4 mini	$0.75	$0.825
GPT-5.4 nano	$0.20	$0.22

The 10% surcharge applies to both input and output prices. For high-volume workloads with EU or regulatory compliance requirements, this adds up.

One underappreciated detail: GPT-5.1 and earlier are not affected by the data residency surcharge, because they were released before the March 5, 2026 cutoff. If you have compliance-driven routing requirements and GPT-5.1 handles your task, the combination of lower base pricing and no surcharge makes it even more attractive.

When OpenAI isn't the right choice

No pricing guide is honest if it only argues for its subject. Here is where other providers beat OpenAI on specific dimensions.

Best writing quality

Claude Opus 4.6 — Arena IF score of 1500 vs. GPT-5.4's 1470. For long-form writing, editing, and prose where the output quality ceiling matters more than cost, Claude is still the preferred choice for many teams. The $5/$25 pricing is higher, but if your product is the output text, Opus often justifies the premium. See our Claude API pricing guide.

Cheapest frontier-tier model

Gemini 2.5 Pro at $1.25/$10 for standard context under 200K tokens. Google's pricing is aggressive, and Gemini 3.1 Pro Preview stays close to GPT-5.4 on the current overall score while costing less. If you do not need OpenAI-specific features like cached-input pricing and the Batch API workflow, Gemini is a strong value option. See our Gemini API pricing guide.

Maximum cost efficiency

DeepSeek at $0.028/M for cache hits. If your workload is high-volume and quality requirements are moderate, DeepSeek's pricing is in a different category entirely. The quality gap relative to GPT-5.4 is still large in our current data (62 vs. 84), but for tasks where DeepSeek's quality is sufficient, nothing else comes close on cost.

Free tier for prototyping

Google AI Studio offers ongoing free access to Gemini models. Unlike OpenAI's credit-based trial (which expires), Google's free tier is a genuine zero-cost option for prototyping and low-volume experimentation.

The shortest path to a sane OpenAI bill

If you only read one section, make it this one:

Test GPT-5.4 nano first. At $0.20/$1.25, the cost floor is low enough that you should always start here and step up based on eval results.
Step up to GPT-5.4 mini before the flagship. Mini at $0.75/$4.50 handles most production tasks that need real reasoning.
Use GPT-5.4 when your evals prove the upgrade is worth it. Do not default to the flagship because it is the newest model.
Do not forget GPT-5.1. At $1.25/$10.00 with no deprecation plans, it is the best value in OpenAI's lineup for workloads it handles well.
Enable cached inputs and the Batch API before you start hunting for smaller prompt wins. These two features can cut your bill by 50%+ with zero prompt engineering effort.
Factor in data residency if you have compliance requirements: +10% on GPT-5.4 family models, but GPT-5.1 and earlier are unaffected.

The real story of OpenAI's pricing is not the headline rate. It is the combination of cached input (10% of normal), Batch API (50% off), and a model lineup deep enough to match the right tier to your workload. Most teams that complain about OpenAI's cost have never tested nano, never checked their cache hit rate, and never tried Batch. Start there before you optimize anything else.

For a provider-level comparison across multiple vendors, see our LLM pricing overview. For deeper dives into other providers, see our guides on Claude API pricing, Gemini API pricing, and DeepSeek API pricing. To understand how token pricing works at a fundamental level, see how LLM token pricing works.

Reader questions

Frequently asked questions

01How much does the OpenAI API cost right now?

OpenAI's current flagship GPT-5.6 Sol costs $5.00/$30.00 per million input/output tokens, GPT-5.6 Terra $2.50/$15.00, and GPT-5.6 Luna $1.00/$6.00, with cached-input reads discounted 90%. GPT-5.5 matches Sol at $5/$30. The GPT-5.4 family remains available: GPT-5.4 at $2.50/$15.00, mini at $0.75/$4.50, nano at $0.20/$1.25. Batch API cuts prices by 50%.

02What is cached input pricing on OpenAI?

Cached input is OpenAI's automatic discount for repeated prompt prefixes. It costs exactly 10% of normal input pricing — $0.25/M for GPT-5.4, $0.075/M for mini, $0.02/M for nano. If your application reuses a system prompt or shared context, cached tokens automatically cost 90% less with no code changes.

03Which OpenAI model should I use?

Start with GPT-5.4 nano ($0.20/$1.25) and step up only when quality requires it. Most teams overspend by defaulting to the flagship. The decision tree: nano for classification and simple tasks, mini for balanced quality/cost, GPT-5.4 for production workloads, GPT-5.2 Pro or GPT-5 Pro only when you've measured a clear win.

04Is GPT-5.1 still worth using?

Yes. GPT-5.1 at $1.25/$10.00 with cached input at $0.125 is OpenAI's best value for workloads that don't need GPT-5.4's latest capabilities. OpenAI says it has no current plans to deprecate GPT-5.1 in the API. If your eval set shows GPT-5.1 handles your task, there's no reason to pay 2x for GPT-5.4.

05How much does OpenAI Batch API save?

OpenAI's Batch API cuts both input and output prices by 50%. GPT-5.4 batch rates are $1.25/$7.50 per million tokens — cheaper than GPT-5.1's standard rate. If your workload can tolerate async processing (content generation, data extraction, offline analysis), Batch is usually the first optimization to enable.

06Does OpenAI charge extra for data residency?

Yes. OpenAI's live pricing page says data residency and Regional Processing endpoints add 10% for all models released after March 5, 2026. Factor this surcharge into your cost model if you have compliance-driven routing requirements.

Source ledger

External sources linked in this article

01API pricing pageopenai.com

Model pricing changes frequently. We send one email a week with what moved and why.

Share or save

Share on X Share on LinkedIn