Current Anthropic Claude API pricing from official model pages and the Claude Opus 4.7 launch announcement, including prompt caching, batch discounts, and current long-context notes.
Share This Report
Copy the link, post it, or save a PDF version.
Claude's pricing tells a simpler story than most comparison tables suggest. Three tiers, one consistent 5x output-to-input ratio, two discount levers. But the interesting question isn't "what does Claude cost?" — it's "when does Claude's quality premium pay for itself?" Cheaper models exist. Some of them score higher on aggregate benchmarks. The case for Claude has never been about being the cheapest option — it's about whether the quality gap on instruction following, writing, and precision tasks saves you enough rework to justify the price difference.
Anthropic just launched Claude Opus 4.7 on April 16, 2026 and kept Opus pricing unchanged at $5/$25 per million input/output tokens. That makes the pricing story simpler than many third-party tables suggest: Haiku 4.5 at $1/$5, Sonnet 4.6 at $3/$15, and now Opus 4.7 at $5/$25.
This guide uses Anthropic's current public model pages for Haiku 4.5 and Sonnet 4.6, plus the official Claude Opus 4.7 launch announcement, combined with benchmark data from BenchLM.ai and Arena Elo scores, to help you decide whether Claude's pricing makes economic sense for your workload.
| Model | Input $/M | Output $/M | Notes |
|---|---|---|---|
| Claude Haiku 4.5 | $1.00 | $5.00 | Fastest, cheapest Claude tier; also available in Claude Code |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Default production tier; 1M context beta on API only |
| Claude Opus 4.7 | $5.00 | $25.00 | New premium tier; pricing unchanged from Opus 4.6 |
Every current Claude tier keeps the same 5x output-to-input ratio. That consistency makes back-of-the-envelope budgeting easy: if you know your input cost, multiply by five for output. No other major provider is this predictable — OpenAI's ratios range from 3x to 8x depending on the model, and Gemini's vary by context length tier.
The tier spacing is also clean. Sonnet costs 3x Haiku on both input and output. Opus costs 1.67x Sonnet. That Opus-to-Sonnet gap is worth remembering — it's much smaller than you might expect if you're used to older Claude pricing.
Anthropic repeats the same two cost levers across its current model pages:
If your application sends the same system prompt, few-shot examples, or document prefix with every request, prompt caching is almost certainly your biggest cost lever. The cached portion of your input costs roughly 10% of the standard input rate. For a typical RAG application with a 2,000-token system prompt repeated across thousands of requests, caching that prefix alone can cut your total input cost by 30-50%.
The math: if 60% of your input tokens are cacheable, your effective input rate on Sonnet drops from $3.00/M to roughly $1.50/M — putting it in the same ballpark as Gemini 3.1 Pro's $2.00/M standard pricing, before Gemini's own caching discounts.
If your workload is asynchronous — offline analysis, nightly document processing, bulk classification — the Batch API cuts costs by 50%. This stacks conceptually with prompt caching: cache your repeated context, batch your requests, and the effective per-token cost drops substantially below the list price.
Before you optimize prompt wording or debate model tiers, turn on these two levers. They're the fastest path to a lower Claude bill.
Users can get price tables anywhere. The question that actually matters for budgeting is: how much quality am I getting per dollar?
Here's the comparison that pricing pages don't show — cost efficiency measured against BenchLM's overall score, which weights coding, reasoning, math, knowledge, multimodal, long-context, instruction following, and agentic capabilities:
| Model | BenchLM Score | Input $/M | Output $/M | Cost per point (output $/M / score) |
|---|---|---|---|---|
| Gemini 3.1 Pro | 83 | $2.00 | $12.00 | $0.145 |
| GPT-5.4 | 84 | $2.50 | $15.00 | $0.179 |
| Claude Sonnet 4.6 | 76 | $3.00 | $15.00 | $0.197 |
| Claude Opus 4.6 | 80 | $5.00 | $25.00 | $0.313 |
| DeepSeek V3.2 | 62 | $0.28 | $0.42 | $0.007 |
BenchLM overall scores from BenchLM.ai. Gemini 3.1 Pro pricing is for standard context (<=200K tokens). Prices per million tokens.
On raw cost-per-benchmark-point, Gemini 3.1 Pro and GPT-5.4 beat Claude in BenchLM's current public data. DeepSeek is still in a different universe on cost efficiency, though its overall quality sits well below the frontier.
So why would anyone pay the Claude premium?
Aggregate scores miss the specific dimensions where Claude leads. Anthropic's new Claude Opus 4.7 keeps the same premium price as Opus 4.6 while improving on Anthropic's own software-engineering benchmarks, including SWE-bench Pro: 64.3, SWE-bench Verified: 87.6, Terminal-Bench 2.0: 69.4, and OSWorld-Verified: 78.0. Older Arena preference data still shows Claude's strength on instruction following and writing quality more broadly. These aren't niche metrics. Instruction following determines whether the model does what you asked, the way you asked it. Writing quality determines whether the output needs a human rewrite.
For workloads where these dimensions matter — drafting content, editing documents, following complex multi-step instructions, maintaining brand voice — Claude's advantage is concrete, even if it doesn't show up in a single overall number.
GPT-5.4 leads on coding benchmarks (SWE-bench Verified: 84) and competition math (AIME: 99 vs Claude Opus's 98). Gemini 3.1 Pro leads on multimodal tasks (MMMU-Pro: 83.9) and offers the best frontier price. Each model has a domain where it wins. Claude's domain is precision and prose.
Here's the economic case for paying more per token. Suppose Claude Opus produces a usable first draft — one that needs minimal human editing — 80% of the time, while a model at half the price produces a usable first draft 60% of the time.
For a content team processing 50 documents per day, that 20-percentage-point gap means 10 fewer documents requiring significant human rework each day. If each rework takes 15 minutes of a $50/hour editor's time, that's $125/day in saved labor — $2,500/month. The Claude API premium over a cheaper model for that same volume might be $200-400/month.
The premium pays for itself five to ten times over. Not because Claude is cheap, but because human time is expensive.
This logic applies to any workflow where output quality translates to downstream labor: code review, legal document drafting, compliance documentation, customer-facing communications. The more expensive the human reviewer, the faster the quality premium pays for itself.
The rework argument breaks down when quality differences don't matter — or when the cheapest model that clears a quality threshold wins:
The principle: pay the quality premium only when the cost of a bad output exceeds the cost of the premium.
Don't start with Opus and work down. Start with Haiku and work up. Each tier jump should be justified by a measured quality gap that costs more in rework or lost value than the price increase.
Haiku 4.5 at $1/$5 is your baseline. Try it on your actual workload — not hypothetical prompts, but real production queries. Measure the output quality against your acceptance criteria.
If Haiku clears the bar, stop. You've found your model. Many high-volume workloads — chat routing, simple summarization, content moderation, entity extraction — run perfectly well on Haiku.
Sonnet 4.6 at $3/$15 costs 3x Haiku. The quality jump buys you meaningfully better reasoning, coding, and writing. Sonnet is Anthropic's default production tier for a reason — it handles the middle 80% of workloads where you need more than basic capability but don't need the absolute frontier.
Move to Sonnet when: your Haiku outputs need frequent correction, your tasks require multi-step reasoning, you need reliable code generation, or your users notice quality issues in customer-facing output.
Opus 4.7 at $5/$25 costs 1.67x Sonnet — a smaller jump than Haiku to Sonnet. But it should be the exception, not the default. Move to Opus when:
Most teams don't need a single tier — they need a routing strategy. Here's how a $500/month Claude budget might break down:
At Haiku's $1/$5 rate, $300 buys roughly 60M input tokens or 300M output tokens — enough for tens of thousands of simple requests. The Sonnet and Opus allocations handle the smaller volume of complex tasks where quality matters.
The effective blended rate of this routing strategy is significantly lower than running everything on Sonnet, while maintaining Opus-tier quality for the requests that need it most. This is how production Claude deployments should work.
Assume a RAG workflow serving 3,000 requests per day, each with 2,400 input tokens and 350 output tokens.
Now apply the tier-routing strategy from above. Assume 60% of those 3,000 daily requests are simple enough for Haiku, 35% need Sonnet, and 5% justify Opus:
Routed monthly total: about $710 — 37% cheaper than running everything on Sonnet, while still using Opus where quality matters most.
Use the cost calculator to model your own workload, or the token counter to estimate token counts from your actual prompts.
Both Sonnet 4.6 and Opus 4.6 now support a 1M token context window in beta — Sonnet on the API, Opus on Claude Platform. This opens use cases that were previously impossible in a single request: full codebase analysis, entire legal contracts, book-length documents, large dataset inspection.
But 1M context has a direct cost implication that's easy to underestimate.
Filling the full 1M context window on Sonnet costs $3.00 per request just for input tokens — before any output. On Opus, that's $5.00 per request for input alone. If your application makes dozens of these requests per day, the cost adds up fast.
For comparison, filling 1M context on Gemini 3.1 Pro costs $2.00 at the standard rate (<=200K tokens) or more at the extended context tier. On GPT-5.4, 1M of input costs $2.50.
Use the 1M window selectively. For most applications, RAG with targeted retrieval is dramatically cheaper than dumping everything into context. Instead of sending an entire codebase as context, retrieve the 5-10 most relevant files and send those.
Reserve the full 1M context for tasks where you genuinely need holistic understanding across the entire document — cross-referencing clauses in a legal contract, understanding architectural patterns across a full codebase, or analyzing narrative structure across a complete manuscript. For everything else, smaller targeted context windows save 90%+ on input costs.
If you do use long context regularly, prompt caching becomes even more critical. Caching a 500K-token document prefix reduces subsequent requests against that document from $1.50 to roughly $0.15 on Sonnet — a 10x saving that makes repeated long-context queries economically viable.
If you've seen comparison tables listing Claude Opus at $15/$75, those numbers are stale. The current Opus 4.7 public pricing is $5/$25 — unchanged from Opus 4.6 and still far below the older figures that many third-party sites continue to quote.
This changes the decision calculus significantly:
Always verify pricing against the official model pages before budgeting. Third-party comparison sites lag behind pricing changes, sometimes by months.
An honest pricing guide tells you when to look elsewhere. Claude is not the best choice for every workload, and pretending otherwise would waste your money.
If you're processing millions of simple classification or routing requests, GPT-5.4 nano at $0.20/$1.25 or DeepSeek V3.2 at $0.28/$0.42 (with cache hits as low as $0.028 on input) will handle the task at a fraction of even Haiku's cost. The quality difference on simple classification tasks is negligible — you're paying for capabilities you won't use.
If you're experimenting with LLM integration and need to keep costs near zero while you iterate on prompts and architecture, Gemini's free tier or DeepSeek's low-cost API are better starting points. Switch to Claude once you've validated your approach and need production-grade quality.
GPT-5.4 leads on AIME (99 vs Opus's 98), BRUMO 2025 (97), and MRCRv2 (97). If your workload is dominated by complex mathematical reasoning or competition-style problem solving, GPT-5.4 has the stronger benchmark profile at a lower price point ($2.50/$15 vs $5/$25).
If you regularly process long documents and cost matters more than Claude's instruction-following advantage, Gemini 3.1 Pro at $2.00/$12 (standard context) beats Sonnet's $3/$15 with a comparable 1M context window and a stronger current BenchLM overall score (83 vs 76). For large-context workloads where you're sending hundreds of thousands of tokens per request, that pricing gap compounds quickly.
GPT-5.4 leads on SWE-bench Verified (84), and Gemini 3.1 Pro is also strong on coding-adjacent tasks while costing less than Opus. Claude Opus is strong at coding, but if coding is your primary use case and you're not leveraging Claude's writing or instruction-following advantages, the premium is harder to justify.
For a full cross-vendor comparison, see the LLM pricing overview or the token pricing breakdown.
Claude's pricing story is straightforward: three tiers with a consistent 5x ratio, two discount levers that can cut costs dramatically, and a quality premium that pays for itself on precision-dependent workloads.
The decision framework:
Start with Haiku 4.5 for high-volume pipelines, routing, and latency-sensitive applications. At $1/$5, it's competitive with budget-tier models from other providers while offering Claude's instruction-following DNA.
Default to Sonnet 4.6 as your standard production tier. At $3/$15, it handles the vast majority of workloads — coding, reasoning, content generation — without the Opus premium. The 1M context beta on the API makes it viable for long-document workflows.
Upgrade to Opus 4.7 only with measured evidence. Run comparative evaluations on your actual tasks. If Opus outputs are measurably better on your criteria, the 1.67x jump from Sonnet is easy to justify — especially at the current $5/$25 pricing, which is far more accessible than the $15/$75 that older tables still quote.
Enable prompt caching and batch processing before you optimize anything else. These two levers — up to 90% and 50% savings respectively — often dwarf the savings from switching tiers or rewriting prompts.
Route across tiers. The most cost-effective Claude deployment isn't one that picks a single tier — it's one that routes simple requests to Haiku, standard work to Sonnet, and high-stakes tasks to Opus. A well-tuned routing strategy can cut costs 30-40% compared to running everything on Sonnet.
Claude is not the cheapest LLM. It's not trying to be. The question isn't whether you can find a lower price per token — you can, easily. The question is whether the quality gap on the tasks you care about saves enough rework, enough editing time, and enough downstream cost to make the premium worth it. For precision-dependent workloads, it usually does.
For a broader vendor comparison, see the LLM pricing overview. For provider-specific deep dives: OpenAI pricing, Gemini pricing, DeepSeek pricing. Use the cost calculator to model your workload or the token counter to estimate token volumes from your prompts.
Pricing from Anthropic's official model pages for Haiku 4.5 and Sonnet 4.6, plus the official Claude Opus 4.7 launch announcement. Benchmark scores from BenchLM.ai. Arena Elo from arena.ai. Current as of April 16, 2026.
Model pricing changes frequently. We send one email a week with what moved and why.
Share This Report
Copy the link, post it, or save a PDF version.
On this page
Which models moved up, what’s new, and what it costs. One email a week, 3-min read.
Free. One email per week.
Current DeepSeek API pricing from the official docs: deepseek-chat and deepseek-reasoner, cache-hit vs cache-miss pricing, output pricing, and the current V3.2 endpoint mapping.
Current Gemini API pricing from Google's official docs: 3.1 Pro Preview, 3.1 Flash-Lite Preview, 3 Flash Preview, 2.5 Flash, 2.5 Pro, plus Batch and Flex pricing.
Current OpenAI API pricing from official docs: GPT-5.4, GPT-5.2, GPT-5.1, cached input rates, Batch API discounts, and the pricing details that actually matter.