How much does Claude API cost?

Anthropic's current pricing lists Claude Haiku 4.5 at $1/$5, Sonnet 4.6 at $3/$15, and the newly launched Opus 4.7 at $5/$25 per million input/output tokens. Anthropic kept Opus pricing unchanged versus Opus 4.6. All three tiers maintain a consistent 5x output-to-input price ratio. Prompt caching saves up to 90% and batch processing saves 50%.

Is Claude worth the price compared to cheaper models?

It depends on the cost of errors. Claude Opus 4.7 is Anthropic's current premium tier and keeps the same $5/$25 pricing as Opus 4.6 while improving software-engineering benchmarks like SWE-bench Pro and SWE-bench Verified in Anthropic's launch chart. If your workflow involves drafting, editing, or code review where getting it right the first time saves rework, the quality premium can pay for itself. For classification or extraction, cheaper models like Haiku or GPT-5.4 nano are better value.

Which Claude model should I start with?

Start with Haiku 4.5 for high-volume or latency-sensitive work, Sonnet 4.6 for the default production tier, and Opus 4.7 only when you have a measured quality gap that justifies the premium. Most teams should live on Sonnet and route simple tasks to Haiku.

How much does Claude prompt caching save?

Anthropic says prompt caching can cut costs by up to 90% on cached input tokens. If your application uses a consistent system prompt or repeated context, the cached portion costs roughly 10% of normal input pricing. Combined with batch processing (50% off), caching is the biggest cost lever before model selection.

Does Claude offer a 1M context window?

Anthropic's current docs explicitly list a 1M token context window for Sonnet 4.6 and Opus 4.6. Anthropic's Opus 4.7 announcement kept Opus pricing unchanged at $5/$25, but did not introduce a separate new long-context price tier. For long-document workflows, always verify the current model-overview docs before assuming 1M support on a newly launched variant.

How does Claude Opus 4.7 compare to GPT-5.4?

Claude Opus 4.7 at $5/$25 costs more than GPT-5.4 at $2.50/$15, especially on input. Anthropic's launch chart shows Opus 4.7 improving over Opus 4.6 on SWE-bench Pro, SWE-bench Verified, Terminal-Bench 2.0, OSWorld-Verified, and GPQA Diamond, while GPT-5.4 still remains cheaper and broadly strong on coding and math. The right choice depends on whether your workload values Claude's instruction-following and writing style enough to justify the premium.

Claude API Pricing: Haiku 4.5, Sonnet 4.6, and Opus 4.7 (April 2026)

Claude's pricing tells a simpler story than most comparison tables suggest. Three tiers, one consistent 5x output-to-input ratio, two discount levers. But the interesting question isn't "what does Claude cost?" — it's "when does Claude's quality premium pay for itself?" Cheaper models exist. Some of them score higher on aggregate benchmarks. The case for Claude has never been about being the cheapest option — it's about whether the quality gap on instruction following, writing, and precision tasks saves you enough rework to justify the price difference.

Anthropic just launched Claude Opus 4.7 on April 16, 2026 and kept Opus pricing unchanged at $5/$25 per million input/output tokens. That makes the pricing story simpler than many third-party tables suggest: Haiku 4.5 at $1/$5, Sonnet 4.6 at $3/$15, and now Opus 4.7 at $5/$25.

This guide uses Anthropic's current public model pages for Haiku 4.5 and Sonnet 4.6, plus the official Claude Opus 4.7 launch announcement, combined with benchmark data from BenchLM.ai and Arena Elo scores, to help you decide whether Claude's pricing makes economic sense for your workload.

Claude pricing at a glance

Model	Input $/M	Output $/M	Notes
Claude Haiku 4.5	$1.00	$5.00	Fastest, cheapest Claude tier; also available in Claude Code
Claude Sonnet 4.6	$3.00	$15.00	Default production tier; 1M context beta on API only
Claude Opus 4.7	$5.00	$25.00	New premium tier; pricing unchanged from Opus 4.6

Every current Claude tier keeps the same 5x output-to-input ratio. That consistency makes back-of-the-envelope budgeting easy: if you know your input cost, multiply by five for output. No other major provider is this predictable — OpenAI's ratios range from 3x to 8x depending on the model, and Gemini's vary by context length tier.

The tier spacing is also clean. Sonnet costs 3x Haiku on both input and output. Opus costs 1.67x Sonnet. That Opus-to-Sonnet gap is worth remembering — it's much smaller than you might expect if you're used to older Claude pricing.

What Anthropic says about discounts

Anthropic repeats the same two cost levers across its current model pages:

Prompt caching: up to 90% cost savings on cached input tokens
Batch processing: 50% cost savings for asynchronous workloads

Prompt caching

If your application sends the same system prompt, few-shot examples, or document prefix with every request, prompt caching is almost certainly your biggest cost lever. The cached portion of your input costs roughly 10% of the standard input rate. For a typical RAG application with a 2,000-token system prompt repeated across thousands of requests, caching that prefix alone can cut your total input cost by 30-50%.

The math: if 60% of your input tokens are cacheable, your effective input rate on Sonnet drops from $3.00/M to roughly $1.50/M — putting it in the same ballpark as Gemini 3.1 Pro's $2.00/M standard pricing, before Gemini's own caching discounts.

Batch processing

If your workload is asynchronous — offline analysis, nightly document processing, bulk classification — the Batch API cuts costs by 50%. This stacks conceptually with prompt caching: cache your repeated context, batch your requests, and the effective per-token cost drops substantially below the list price.

Before you optimize prompt wording or debate model tiers, turn on these two levers. They're the fastest path to a lower Claude bill.

The quality case for Claude — benchmark-adjusted cost

Users can get price tables anywhere. The question that actually matters for budgeting is: how much quality am I getting per dollar?

Here's the comparison that pricing pages don't show — cost efficiency measured against BenchLM's overall score, which weights coding, reasoning, math, knowledge, multimodal, long-context, instruction following, and agentic capabilities:

Model	BenchLM Score	Input $/M	Output $/M	Cost per point (output $/M / score)
Gemini 3.1 Pro	83	$2.00	$12.00	$0.145
GPT-5.4	84	$2.50	$15.00	$0.179
Claude Sonnet 4.6	76	$3.00	$15.00	$0.197
Claude Opus 4.6	80	$5.00	$25.00	$0.313
DeepSeek V3.2	62	$0.28	$0.42	$0.007

BenchLM overall scores from BenchLM.ai. Gemini 3.1 Pro pricing is for standard context (<=200K tokens). Prices per million tokens.

On raw cost-per-benchmark-point, Gemini 3.1 Pro and GPT-5.4 beat Claude in BenchLM's current public data. DeepSeek is still in a different universe on cost efficiency, though its overall quality sits well below the frontier.

So why would anyone pay the Claude premium?

Benchmarks don't capture everything

Aggregate scores miss the specific dimensions where Claude leads. Anthropic's new Claude Opus 4.7 keeps the same premium price as Opus 4.6 while improving on Anthropic's own software-engineering benchmarks, including SWE-bench Pro: 64.3, SWE-bench Verified: 87.6, Terminal-Bench 2.0: 69.4, and OSWorld-Verified: 78.0. Older Arena preference data still shows Claude's strength on instruction following and writing quality more broadly. These aren't niche metrics. Instruction following determines whether the model does what you asked, the way you asked it. Writing quality determines whether the output needs a human rewrite.

For workloads where these dimensions matter — drafting content, editing documents, following complex multi-step instructions, maintaining brand voice — Claude's advantage is concrete, even if it doesn't show up in a single overall number.

GPT-5.4 leads on coding benchmarks (SWE-bench Verified: 84) and competition math (AIME: 99 vs Claude Opus's 98). Gemini 3.1 Pro leads on multimodal tasks (MMMU-Pro: 83.9) and offers the best frontier price. Each model has a domain where it wins. Claude's domain is precision and prose.

The rework cost argument

Here's the economic case for paying more per token. Suppose Claude Opus produces a usable first draft — one that needs minimal human editing — 80% of the time, while a model at half the price produces a usable first draft 60% of the time.

For a content team processing 50 documents per day, that 20-percentage-point gap means 10 fewer documents requiring significant human rework each day. If each rework takes 15 minutes of a $50/hour editor's time, that's $125/day in saved labor — $2,500/month. The Claude API premium over a cheaper model for that same volume might be $200-400/month.

The premium pays for itself five to ten times over. Not because Claude is cheap, but because human time is expensive.

This logic applies to any workflow where output quality translates to downstream labor: code review, legal document drafting, compliance documentation, customer-facing communications. The more expensive the human reviewer, the faster the quality premium pays for itself.

When Claude isn't worth the premium

The rework argument breaks down when quality differences don't matter — or when the cheapest model that clears a quality threshold wins:

High-volume classification and tagging — if you're routing support tickets or categorizing documents, Haiku 4.5 at $1/$5 or GPT-5.4 nano at $0.20/$1.25 will clear the accuracy bar. Don't use Opus for tagging emails.
Simple extraction — pulling structured data from forms or invoices doesn't need frontier-tier instruction following. DeepSeek V3.2 at $0.28/$0.42 handles this affordably.
Tasks with automated evaluation — if you can programmatically verify output quality (unit tests, schema validation, regression checks), you don't need to pay for first-draft perfection. Use a cheaper model and retry failures.

The principle: pay the quality premium only when the cost of a bad output exceeds the cost of the premium.

Choosing the right Claude tier — a decision framework

Don't start with Opus and work down. Start with Haiku and work up. Each tier jump should be justified by a measured quality gap that costs more in rework or lost value than the price increase.

Step 1: Start with Haiku 4.5

Haiku 4.5 at $1/$5 is your baseline. Try it on your actual workload — not hypothetical prompts, but real production queries. Measure the output quality against your acceptance criteria.

If Haiku clears the bar, stop. You've found your model. Many high-volume workloads — chat routing, simple summarization, content moderation, entity extraction — run perfectly well on Haiku.

Step 2: Move to Sonnet when Haiku falls short

Sonnet 4.6 at $3/$15 costs 3x Haiku. The quality jump buys you meaningfully better reasoning, coding, and writing. Sonnet is Anthropic's default production tier for a reason — it handles the middle 80% of workloads where you need more than basic capability but don't need the absolute frontier.

Move to Sonnet when: your Haiku outputs need frequent correction, your tasks require multi-step reasoning, you need reliable code generation, or your users notice quality issues in customer-facing output.

Step 3: Move to Opus only with evidence

Opus 4.7 at $5/$25 costs 1.67x Sonnet — a smaller jump than Haiku to Sonnet. But it should be the exception, not the default. Move to Opus when:

Instruction following precision matters. Legal, compliance, brand voice, and style-constrained tasks where "close enough" isn't good enough. Opus's Arena IF lead (1500 vs Sonnet's 1479) is measurable in these workflows.
You need the extended context beta. Opus offers 1M context on Claude Platform — useful for full codebase analysis or book-length documents.
You've measured the quality gap. Run 50 representative prompts through both Sonnet and Opus. If the Opus outputs are measurably better on your evaluation criteria, the 1.67x premium is easy to justify. If they're roughly equivalent, stay on Sonnet.

The $500/month budget exercise

Most teams don't need a single tier — they need a routing strategy. Here's how a $500/month Claude budget might break down:

60% to Haiku (~$300 worth of requests) — classification, routing, summarization, simple Q&A
35% to Sonnet (~$175 worth) — coding tasks, complex reasoning, customer-facing content
5% to Opus (~$25 worth) — high-stakes drafting, compliance review, difficult edge cases

At Haiku's $1/$5 rate, $300 buys roughly 60M input tokens or 300M output tokens — enough for tens of thousands of simple requests. The Sonnet and Opus allocations handle the smaller volume of complex tasks where quality matters.

The effective blended rate of this routing strategy is significantly lower than running everything on Sonnet, while maintaining Opus-tier quality for the requests that need it most. This is how production Claude deployments should work.

Real cost examples — the same workload across all three tiers

Assume a RAG workflow serving 3,000 requests per day, each with 2,400 input tokens and 350 output tokens.

Haiku 4.5

Daily input: 3,000 x 2,400 x $1.00 / 1M = $7.20
Daily output: 3,000 x 350 x $5.00 / 1M = $5.25
Monthly total: about $373.50

Sonnet 4.6

Daily input: 3,000 x 2,400 x $3.00 / 1M = $21.60
Daily output: 3,000 x 350 x $15.00 / 1M = $15.75
Monthly total: about $1,120.50

Opus 4.7

Daily input: 3,000 x 2,400 x $5.00 / 1M = $36.00
Daily output: 3,000 x 350 x $25.00 / 1M = $26.25
Monthly total: about $1,867.50

With routing

Now apply the tier-routing strategy from above. Assume 60% of those 3,000 daily requests are simple enough for Haiku, 35% need Sonnet, and 5% justify Opus:

1,800 Haiku requests: (1,800 x 2,400 x $1 / 1M) + (1,800 x 350 x $5 / 1M) = $4.32 + $3.15 = $7.47/day
1,050 Sonnet requests: (1,050 x 2,400 x $3 / 1M) + (1,050 x 350 x $15 / 1M) = $7.56 + $5.51 = $13.07/day
150 Opus requests: (150 x 2,400 x $5 / 1M) + (150 x 350 x $25 / 1M) = $1.80 + $1.31 = $3.11/day

Routed monthly total: about $710 — 37% cheaper than running everything on Sonnet, while still using Opus where quality matters most.

Use the cost calculator to model your own workload, or the token counter to estimate token counts from your actual prompts.

The 1M context beta — what it means for cost

Both Sonnet 4.6 and Opus 4.6 now support a 1M token context window in beta — Sonnet on the API, Opus on Claude Platform. This opens use cases that were previously impossible in a single request: full codebase analysis, entire legal contracts, book-length documents, large dataset inspection.

But 1M context has a direct cost implication that's easy to underestimate.

The input cost at scale

Filling the full 1M context window on Sonnet costs $3.00 per request just for input tokens — before any output. On Opus, that's $5.00 per request for input alone. If your application makes dozens of these requests per day, the cost adds up fast.

For comparison, filling 1M context on Gemini 3.1 Pro costs $2.00 at the standard rate (<=200K tokens) or more at the extended context tier. On GPT-5.4, 1M of input costs $2.50.

The practical strategy

Use the 1M window selectively. For most applications, RAG with targeted retrieval is dramatically cheaper than dumping everything into context. Instead of sending an entire codebase as context, retrieve the 5-10 most relevant files and send those.

Reserve the full 1M context for tasks where you genuinely need holistic understanding across the entire document — cross-referencing clauses in a legal contract, understanding architectural patterns across a full codebase, or analyzing narrative structure across a complete manuscript. For everything else, smaller targeted context windows save 90%+ on input costs.

If you do use long context regularly, prompt caching becomes even more critical. Caching a 500K-token document prefix reduces subsequent requests against that document from $1.50 to roughly $0.15 on Sonnet — a 10x saving that makes repeated long-context queries economically viable.

Opus 4.7 pricing in context — it's not what old tables say

If you've seen comparison tables listing Claude Opus at $15/$75, those numbers are stale. The current Opus 4.7 public pricing is $5/$25 — unchanged from Opus 4.6 and still far below the older figures that many third-party sites continue to quote.

This changes the decision calculus significantly:

The Opus-to-Sonnet jump is 1.67x, not 5x. At $15/$75, Opus was a luxury tier reserved for the most quality-sensitive tasks. At $5/$25, it's a reasonable upgrade when you have evidence that the quality gap matters.
Opus is not cheap relative to GPT-5.4. Opus at $5/M input and $25/M output is 2x GPT-5.4's input price and 1.67x its output price. The decision to pay that premium only makes sense when Opus's writing and instruction-following edge reduces enough downstream rework to justify it.
US-only inference at 1.1x pricing. Anthropic's platform pricing policy for post-February 1, 2026 models means US-only inference remains relevant for current Opus-class deployments. For compliance-sensitive workloads in finance, healthcare, or government, this can be a meaningful option that avoids the complexity of self-hosting.

Always verify pricing against the official model pages before budgeting. Third-party comparison sites lag behind pricing changes, sometimes by months.

When not to use Claude

An honest pricing guide tells you when to look elsewhere. Claude is not the best choice for every workload, and pretending otherwise would waste your money.

Ultra-high-volume classification

If you're processing millions of simple classification or routing requests, GPT-5.4 nano at $0.20/$1.25 or DeepSeek V3.2 at $0.28/$0.42 (with cache hits as low as $0.028 on input) will handle the task at a fraction of even Haiku's cost. The quality difference on simple classification tasks is negligible — you're paying for capabilities you won't use.

Budget-constrained prototyping

If you're experimenting with LLM integration and need to keep costs near zero while you iterate on prompts and architecture, Gemini's free tier or DeepSeek's low-cost API are better starting points. Switch to Claude once you've validated your approach and need production-grade quality.

Maximum reasoning benchmark scores

GPT-5.4 leads on AIME (99 vs Opus's 98), BRUMO 2025 (97), and MRCRv2 (97). If your workload is dominated by complex mathematical reasoning or competition-style problem solving, GPT-5.4 has the stronger benchmark profile at a lower price point ($2.50/$15 vs $5/$25).

Large context on a budget

If you regularly process long documents and cost matters more than Claude's instruction-following advantage, Gemini 3.1 Pro at $2.00/$12 (standard context) beats Sonnet's $3/$15 with a comparable 1M context window and a stronger current BenchLM overall score (83 vs 76). For large-context workloads where you're sending hundreds of thousands of tokens per request, that pricing gap compounds quickly.

Coding-dominated workloads

GPT-5.4 leads on SWE-bench Verified (84), and Gemini 3.1 Pro is also strong on coding-adjacent tasks while costing less than Opus. Claude Opus is strong at coding, but if coding is your primary use case and you're not leveraging Claude's writing or instruction-following advantages, the premium is harder to justify.

For a full cross-vendor comparison, see the LLM pricing overview or the token pricing breakdown.

The practical takeaway

Claude's pricing story is straightforward: three tiers with a consistent 5x ratio, two discount levers that can cut costs dramatically, and a quality premium that pays for itself on precision-dependent workloads.

The decision framework:

Start with Haiku 4.5 for high-volume pipelines, routing, and latency-sensitive applications. At $1/$5, it's competitive with budget-tier models from other providers while offering Claude's instruction-following DNA.
Default to Sonnet 4.6 as your standard production tier. At $3/$15, it handles the vast majority of workloads — coding, reasoning, content generation — without the Opus premium. The 1M context beta on the API makes it viable for long-document workflows.
Upgrade to Opus 4.7 only with measured evidence. Run comparative evaluations on your actual tasks. If Opus outputs are measurably better on your criteria, the 1.67x jump from Sonnet is easy to justify — especially at the current $5/$25 pricing, which is far more accessible than the $15/$75 that older tables still quote.
Enable prompt caching and batch processing before you optimize anything else. These two levers — up to 90% and 50% savings respectively — often dwarf the savings from switching tiers or rewriting prompts.
Route across tiers. The most cost-effective Claude deployment isn't one that picks a single tier — it's one that routes simple requests to Haiku, standard work to Sonnet, and high-stakes tasks to Opus. A well-tuned routing strategy can cut costs 30-40% compared to running everything on Sonnet.

Claude is not the cheapest LLM. It's not trying to be. The question isn't whether you can find a lower price per token — you can, easily. The question is whether the quality gap on the tasks you care about saves enough rework, enough editing time, and enough downstream cost to make the premium worth it. For precision-dependent workloads, it usually does.

For a broader vendor comparison, see the LLM pricing overview. For provider-specific deep dives: OpenAI pricing, Gemini pricing, DeepSeek pricing. Use the cost calculator to model your workload or the token counter to estimate token volumes from your prompts.

Pricing from Anthropic's official model pages for Haiku 4.5 and Sonnet 4.6, plus the official Claude Opus 4.7 launch announcement. Benchmark scores from BenchLM.ai. Arena Elo from arena.ai. Current as of April 16, 2026.