Skip to main content
pricingdeepseekapicostguidebudget

DeepSeek API Pricing: deepseek-chat vs deepseek-reasoner (April 2026)

Current DeepSeek API pricing from the official docs: deepseek-chat and deepseek-reasoner, cache-hit vs cache-miss pricing, output pricing, and the current V3.2 endpoint mapping.

Glevd·Published April 13, 2026·6 min read

Share This Report

Copy the link, post it, or save a PDF version.

Share on XShare on LinkedIn

The current DeepSeek pricing page is narrower than many comparison posts imply. Today, the official public API docs expose two priced endpoints: deepseek-chat and deepseek-reasoner. DeepSeek says both currently correspond to DeepSeek-V3.2 and use a 128K context limit.

This guide sticks to the current official DeepSeek pricing page.

DeepSeek pricing at a glance

Endpoint Model Version Context Input Cache Hit $/M Input Cache Miss $/M Output $/M
deepseek-chat DeepSeek-V3.2 128K $0.028 $0.28 $0.42
deepseek-reasoner DeepSeek-V3.2 128K $0.028 $0.28 $0.42

That means the big cost split in DeepSeek's current public pricing is not chat versus reasoner. It is cache hit versus cache miss.

What changes between the two endpoints

The current docs list the same base pricing for both endpoints, but they are not identical operationally.

deepseek-chat

  • Non-thinking mode
  • Default max output: 4K
  • Maximum output: 8K
  • Supports JSON output
  • Supports tool calls
  • Supports chat prefix completion (beta)
  • Supports FIM completion (beta)

deepseek-reasoner

  • Thinking mode
  • Default max output: 32K
  • Maximum output: 64K
  • Supports JSON output
  • Supports tool calls
  • Supports chat prefix completion (beta)
  • Does not list FIM completion beta support on the current pricing page

If you are choosing between the two, the current practical question is feature behavior and output budget, not token price.

The pricing lever that matters most: caching

DeepSeek currently charges:

  • $0.28 / 1M input tokens on cache misses
  • $0.028 / 1M input tokens on cache hits

That is a straight 90% reduction for cached input. On workloads with repeated system prompts or reused shared prefixes, caching dominates every other cost optimization.

Real cost example: same workload, cached versus uncached

Assume 10,000 requests per day, each with 2,000 input tokens and 300 output tokens.

If the input is all cache misses

  • Daily input: 10,000 x 2,000 x $0.28 / 1M = $5.60
  • Daily output: 10,000 x 300 x $0.42 / 1M = $1.26
  • Monthly total: about $205.80

If the input is fully cached

  • Daily input: 10,000 x 2,000 x $0.028 / 1M = $0.56
  • Daily output: 10,000 x 300 x $0.42 / 1M = $1.26
  • Monthly total: about $54.60

Same request volume. Same output. Very different bill.

What not to assume from stale DeepSeek pricing posts

If you are reading older DeepSeek comparisons, be careful with three common mistakes:

  1. treating today's public API page as if it still published a broad V3 versus R1 versus Coder pricing matrix
  2. assuming reasoner pricing is currently higher than chat pricing in the official docs
  3. ignoring the difference between cache-hit and cache-miss pricing

The current public page is simpler than that.

The practical takeaway

If you are budgeting the current public DeepSeek API:

  • treat deepseek-chat and deepseek-reasoner as the two priced endpoints that matter
  • model spend using cache-hit and cache-miss assumptions, not just a single input number
  • choose between the endpoints based on reasoning behavior, output allowance, and feature support, not because one is currently listed as much more expensive

For broader cross-provider comparisons, see our LLM pricing overview.

Model pricing changes frequently. We send one email a week with what moved and why.