Current DeepSeek API pricing from the official docs: deepseek-chat and deepseek-reasoner, cache-hit vs cache-miss pricing, output pricing, and the current V3.2 endpoint mapping.
Share This Report
Copy the link, post it, or save a PDF version.
The current DeepSeek pricing page is narrower than many comparison posts imply. Today, the official public API docs expose two priced endpoints: deepseek-chat and deepseek-reasoner. DeepSeek says both currently correspond to DeepSeek-V3.2 and use a 128K context limit.
This guide sticks to the current official DeepSeek pricing page.
| Endpoint | Model Version | Context | Input Cache Hit $/M | Input Cache Miss $/M | Output $/M |
|---|---|---|---|---|---|
deepseek-chat |
DeepSeek-V3.2 | 128K | $0.028 | $0.28 | $0.42 |
deepseek-reasoner |
DeepSeek-V3.2 | 128K | $0.028 | $0.28 | $0.42 |
That means the big cost split in DeepSeek's current public pricing is not chat versus reasoner. It is cache hit versus cache miss.
The current docs list the same base pricing for both endpoints, but they are not identical operationally.
deepseek-chatdeepseek-reasonerIf you are choosing between the two, the current practical question is feature behavior and output budget, not token price.
DeepSeek currently charges:
That is a straight 90% reduction for cached input. On workloads with repeated system prompts or reused shared prefixes, caching dominates every other cost optimization.
Assume 10,000 requests per day, each with 2,000 input tokens and 300 output tokens.
Same request volume. Same output. Very different bill.
If you are reading older DeepSeek comparisons, be careful with three common mistakes:
The current public page is simpler than that.
If you are budgeting the current public DeepSeek API:
deepseek-chat and deepseek-reasoner as the two priced endpoints that matterFor broader cross-provider comparisons, see our LLM pricing overview.
Model pricing changes frequently. We send one email a week with what moved and why.
Share This Report
Copy the link, post it, or save a PDF version.
On this page
Which models moved up, what’s new, and what it costs. One email a week, 3-min read.
Free. One email per week.
Current Anthropic Claude API pricing from official model pages, including prompt caching, batch discounts, and the current 1M context beta notes.
Current Gemini API pricing from Google's official docs: 3.1 Pro Preview, 3.1 Flash-Lite Preview, 3 Flash Preview, 2.5 Flash, 2.5 Pro, plus Batch and Flex pricing.
Current OpenAI API pricing from official docs: GPT-5.4, GPT-5.2, GPT-5.1, cached input rates, Batch API discounts, and the pricing details that actually matter.