Current Anthropic Claude API pricing from official model pages, including prompt caching, batch discounts, and the current 1M context beta notes.
Share This Report
Copy the link, post it, or save a PDF version.
Anthropic's public Claude pricing is much simpler than many comparison pages make it look. The current public model pages expose three standard API tiers: Haiku 4.5, Sonnet 4.6, and Opus 4.6. Those are the rows that matter for most teams building on Claude today.
This guide uses Anthropic's current public model pages for Haiku 4.5, Sonnet 4.6, and Opus 4.6.
| Model | Input $/M | Output $/M | Notes |
|---|---|---|---|
| Claude Haiku 4.5 | $1.00 | $5.00 | Fastest, cheapest Claude tier; also available in Claude Code |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Default production tier; 1M context beta on API only |
| Claude Opus 4.6 | $5.00 | $25.00 | Premium tier; 1M context beta on Claude Platform only |
Every current public Claude tier still keeps the same 5x output-to-input ratio, which makes back-of-the-envelope budgeting easy.
Anthropic repeats the same two cost levers across the current model pages:
If you reuse long system prompts, stable few-shot prefixes, or repeated document headers, prompt caching is usually the fastest way to shrink your bill. If your workload is asynchronous, Batch is the bigger blunt instrument.
Anthropic positions Haiku as the fast, cost-efficient model for scaled deployments, real-time applications, and coding sub-agents. If your workload is latency-sensitive or cost-sensitive, this is the first Claude model to try.
Sonnet 4.6 is the default production tier. It is the current middle ground for coding, agents, and professional workflows, and Anthropic says the 1M token context window is available in beta on the API only.
Opus 4.6 is no longer priced as an extreme outlier relative to Sonnet. It is still the premium tier, but the published public price is $5/$25, not the much higher numbers that older comparison tables often repeat. Anthropic also says US-only inference is available at 1.1x pricing for Opus 4.6.
Assume a RAG workflow serving 3,000 requests per day, each with 2,400 input tokens and 350 output tokens.
That is the pricing shape to remember:
If you want a default:
And before you obsess over prompt wording, turn on the two discounts Anthropic is already telling you to use: prompt caching and batch processing.
For a broader vendor comparison, see our LLM pricing overview.
Model pricing changes frequently. We send one email a week with what moved and why.
Share This Report
Copy the link, post it, or save a PDF version.
On this page
Which models moved up, what’s new, and what it costs. One email a week, 3-min read.
Free. One email per week.
Current DeepSeek API pricing from the official docs: deepseek-chat and deepseek-reasoner, cache-hit vs cache-miss pricing, output pricing, and the current V3.2 endpoint mapping.
Current Gemini API pricing from Google's official docs: 3.1 Pro Preview, 3.1 Flash-Lite Preview, 3 Flash Preview, 2.5 Flash, 2.5 Pro, plus Batch and Flex pricing.
Current OpenAI API pricing from official docs: GPT-5.4, GPT-5.2, GPT-5.1, cached input rates, Batch API discounts, and the pricing details that actually matter.