DeepSeek API Pricing: Current deepseek-chat & deepseek-reasoner Rates

DeepSeek's pricing page is the simplest in the industry: two endpoints, one pricing table, three numbers. But those three numbers tell a story that changes how you should think about LLM cost optimization. At $0.028 per million input tokens on cache hits, DeepSeek makes input tokens essentially free. The real question becomes: what's the quality trade-off, and when does it matter?

This guide uses the current official DeepSeek pricing page, combined with benchmark data from BenchLM.ai and cross-provider pricing from sibling posts on Claude, OpenAI, and Gemini, to help you decide when DeepSeek's pricing makes it the right (and wrong) choice.

Current DeepSeek prices (auto-updated)

The table and changelog below render from our pricing dataset on every site build, last refreshed July 12, 2026. The analysis further down references provider announcements with their original dates; when the two disagree, this table is current.

Model	Creator	Input	Output	Context	Overall Score
DeepSeek V4 Flash (Max)	DeepSeek	$0.14	$0.28	1M	59
DeepSeek V4 Flash (High)	DeepSeek	$0.14	$0.28	1M	52
DeepSeek V4 Flash	DeepSeek	$0.14	$0.28	1M	40
DeepSeek V3	DeepSeek	$0.27	$1.1	128K	44
DeepSeek V3.2	DeepSeek	$0.28	$0.42	128K	66
DeepSeek V3.2 (Thinking)	DeepSeek	$0.55	$2.19	128K	62
DeepSeek R1	DeepSeek	$0.55	$2.19	128K	—
DeepSeek V4 Pro (Max)	DeepSeek	$1.74	$3.48	1M	69
DeepSeek V4 Pro (High)	DeepSeek	$1.74	$3.48	1M	64
DeepSeek V4 Pro	DeepSeek	$1.74	$3.48	1M	46

Recent DeepSeek price changes in our registry:

No DeepSeek price changes recorded in BenchLM's registry over the last 6 months.

DeepSeek pricing: the simplest table in the industry

Endpoint	Model Version	Context	Input Cache Hit $/M	Input Cache Miss $/M	Output $/M
`deepseek-chat`	DeepSeek-V3.2	128K	$0.028	$0.28	$0.42
`deepseek-reasoner`	DeepSeek-V3.2	128K	$0.028	$0.28	$0.42

Two endpoints. Same underlying model. Same price. The real cost split in DeepSeek's current pricing is not chat versus reasoner. It is cache hit versus cache miss, a 10x difference on input tokens.

Compare this to the pricing complexity at other providers. OpenAI publishes separate rates for GPT-5.4, GPT-5.4 nano, GPT-5.4 mini, o3, and o4-mini, each with different input, output, and reasoning token prices. Anthropic has three Claude tiers with different ratios. Gemini has context-length-dependent pricing tiers. DeepSeek has one table with three numbers. That simplicity is worth appreciating, even if the model isn't competing at the frontier.

Output pricing is flat at $0.42 per million tokens regardless of caching or endpoint choice. There are no separate reasoning token charges, no context-length surcharges, no batch pricing tiers. What you see is what you pay.

The $0.028 cache hit: why this number changes everything

This is the number that should reshape how you architect on DeepSeek. At $0.028 per million input tokens on a cache hit, a 2,000-token prompt costs $0.000056. That is $0.056 per thousand requests. Input becomes a rounding error.

To put that in perspective: sending a 2,000-token prompt on GPT-5.4 costs $0.005. On Claude Sonnet 4.6, $0.006. On DeepSeek with a cache hit, $0.000056. DeepSeek's cached input is roughly 90x cheaper than frontier model input.

This changes your prompt engineering strategy

The standard advice for expensive models is to minimize input tokens. Shorter system prompts, fewer few-shot examples, compressed context. Every token you add to a GPT-5.4 or Claude Opus request costs real money at scale.

On DeepSeek with caching, that logic inverts. Input tokens are so cheap that you should optimize for more context, not less. Longer system prompts with detailed instructions. More few-shot examples to demonstrate the exact output format you want. Richer context from your retrieval pipeline. The marginal cost of an extra 1,000 input tokens on a cache hit is $0.000028, effectively zero. If adding those tokens improves output quality by even 1%, it's the best ROI in your entire stack.

Design your prompts for cache hits

DeepSeek's caching works on shared prefixes. The key design pattern: structure your prompts so the prefix (system prompt, few-shot examples, stable context) is as large and consistent as possible. The variable part, the user's actual query, goes at the end.

This means:

Your system prompt should be detailed and static across requests
Few-shot examples belong before the user query, not after
Any shared document context or retrieved knowledge should be placed in a stable position in the prompt
Only the final user message should vary between requests

Cache hit rate drives your real cost

The difference between a workload with 0% cache hits and 90% cache hits is enormous:

Cache Hit Rate	Effective Input Cost per M Tokens
0% (all misses)	$0.280
25%	$0.217
50%	$0.154
75%	$0.091
90%	$0.053
100% (all hits)	$0.028

A well-designed application with consistent system prompts should achieve 75-90% cache hit rates. At 90%, your effective input cost is $0.053 per million tokens, less than a fifth of the already-cheap cache miss rate.

Real savings: the same workload, three scenarios

Assume 10,000 requests per day, each with 2,000 input tokens and 300 output tokens.

Scenario	Daily input	Daily output	Monthly
DeepSeek, all cache misses	$5.60	$1.26	~$205.80
DeepSeek, 90% cache hits	$1.06	$1.26	~$69.60
GPT-5.4, no caching	$50.00	$45.00	~$2,850.00

The cache-hit row is where the design pattern from the previous section pays off: nine of every ten requests bill input at $0.028/M instead of $0.28/M.

Same request volume. DeepSeek with caching costs $69.60/month. GPT-5.4 costs $2,850/month. That's a 41x cost difference. Even DeepSeek without caching ($205.80) is nearly 14x cheaper than GPT-5.4.

Use the cost calculator to model your own workload, or the token counter to estimate token counts from your actual prompts.

Chat vs Reasoner: same price, different behavior

Both endpoints currently map to DeepSeek-V3.2 and cost exactly the same per token. The choice between them is about capability and behavior, not price.

`deepseek-chat`

Non-thinking mode
Default max output: 4K / Maximum output: 8K
Supports JSON output, tool calls, chat prefix completion (beta)
Supports FIM completion (beta), the only endpoint with fill-in-the-middle

`deepseek-reasoner`

Thinking mode: generates chain-of-thought before the final answer
Default max output: 32K / Maximum output: 64K
Supports JSON output, tool calls, chat prefix completion (beta)
Does not support FIM completion

When to use which

Use deepseek-chat for the majority of workloads: general Q&A, content generation, code completion, classification, extraction, and any task where a direct answer is sufficient. It's faster because it doesn't generate thinking tokens, and the 4-8K output cap is enough for most use cases.

Use deepseek-reasoner when the task benefits from explicit chain-of-thought: multi-step math, logic puzzles, complex analysis, and problems where showing the work improves accuracy. The 32-64K output cap also matters: if your task requires long-form generation beyond 8K tokens, reasoner is your only option.

The hidden cost of thinking tokens

One detail to watch: reasoner generates thinking tokens that count toward output cost. A reasoning request might produce 5,000 thinking tokens plus 500 visible output tokens, so 5,500 output tokens are billed at $0.42/M, costing $0.0023 per request.

At DeepSeek's prices, this is still dirt cheap. The same kind of reasoning on o3 or Claude Opus would cost 50-100x more. But if you're running reasoner on millions of requests, the thinking token multiplier adds up. Monitor your actual output token counts, not just the visible response length.

Benchmark-adjusted value: the quality trade-off nobody should ignore

Here's where the pricing story gets complicated. DeepSeek is extraordinarily cheap. But cheap tokens that produce wrong answers aren't saving you money. They're costing you rework.

Model	Overall Score	Input $/M (cache miss)	Output $/M	Score per dollar (output)
DeepSeek V3.2 (chat)	62	$0.28	$0.42	148
GPT-5.4 nano	49	$0.20	$1.25	39.2
Gemini 3.1 Flash-Lite	54	$0.25	$1.50	36.0
GPT-5.4	84	$2.50	$15.00	5.6
Claude Opus 4.6	80	$5.00	$25.00	3.2

Overall scores from our leaderboard. Prices per million tokens.

On raw benchmark-points-per-dollar, DeepSeek wins by an absurd margin. At roughly 148 points per output dollar, it delivers vastly more benchmark score per dollar than frontier-priced models.

But an overall score of 62 versus 84 isn't a minor gap; it's still a fundamentally different quality tier. Here's what that gap means in practice:

Where DeepSeek is good enough

DeepSeek earns its keep on general Q&A and chatbots (approximate answers acceptable, human spot-checks available), internal content drafts (summaries, notes, brainstorming, first drafts that will be edited anyway), simple code generation (boilerplate, repetitive patterns, straightforward implementations), classification and extraction (tagging, routing, pulling structured data out of unstructured text), and high-volume preprocessing, meaning any pipeline step where you process thousands of items and the occasional error is tolerable.

Where the quality gap bites

Hard reasoning tasks: DeepSeek misses problems that frontier models solve correctly. If your task involves multi-step logical inference or mathematical reasoning, the error rate is measurably higher.
Complex instruction following: frontier models like Claude Opus (Arena IF: 1500) and GPT-5.4 (Arena IF: 1470) are significantly more reliable at following detailed, multi-constraint instructions. DeepSeek is more likely to ignore constraints or produce partially compliant output.
Agentic workflows: long tool-use chains where each step depends on the previous one. Errors compound, and a model that's 90% accurate per step becomes 35% accurate over 10 steps. Frontier models' higher per-step accuracy matters exponentially in these chains.
Safety-critical output: legal analysis, medical information, compliance documentation, anything where a wrong answer has real consequences.

The honest assessment: DeepSeek is excellent for tasks where "good enough" is good enough. It is not the right choice when errors are expensive. The 41x cost savings only matter if the output is actually usable. Test on your specific task before committing.

Building reliable systems on DeepSeek

DeepSeek has experienced outages during high-demand periods. If you're building production systems on DeepSeek, you need a fallback architecture, not because DeepSeek is unreliable by default, but because any cost-optimized system should handle provider downtime gracefully.

The fallback pattern

Primary: DeepSeek. Secondary: a cheap model from a provider with high uptime guarantees. The two natural choices:

The two natural secondaries are GPT-5.4 nano at $0.20/$1.25 (OpenAI's infrastructure reliability, reasonable quality at low cost) and Gemini 3.1 Flash-Lite at $0.25/$1.50 (Google's infrastructure, competitive pricing).

Blended cost with fallback

Assume 10% of your requests hit the fallback model due to DeepSeek outages or rate limits:

Route 90% of traffic to DeepSeek at cache-miss rates ($0.28 input / $0.42 output) and 10% to GPT-5.4 nano ($0.20 / $1.25), and the blended rate lands at $0.272 input / $0.503 output.

That blended output rate of $0.503/M is still 30x cheaper than GPT-5.4's $15/M and 50x cheaper than Claude Opus's $25/M. The cost of reliability insurance is negligible compared to your savings.

Implementation

Libraries like LiteLLM make automatic failover straightforward. Define DeepSeek as your primary model, set a timeout threshold, and configure one or two fallback models. The routing logic adds minimal latency on the happy path and saves you from building manual retry logic.

The key design principle: your fallback model should be cheap enough that you never hesitate to use it, and reliable enough that it doesn't need its own fallback. GPT-5.4 nano and Gemini Flash-Lite both fit this profile.

The self-hosting question

DeepSeek models are open-weight: you can download and run them on your own infrastructure for free. This is a genuine advantage that no closed-source provider offers. But at DeepSeek's current API prices, the economics of self-hosting are unusual.

The break-even math

A single A100 GPU rents for roughly $1-3/hour from cloud providers. At $2/hour, that's $1,440/month in GPU costs before any engineering time, networking, or storage.

To break even against the API at cache-miss rates, you'd need to process enough tokens to accumulate $1,440 in API fees. At $0.28/M input tokens and 2,000 tokens per request, that's roughly 2.5 million requests per month, about 83,000 per day.

At cache-hit rates ($0.028/M), the break-even is even higher: you'd need 25 million requests per month just on input costs to justify the GPU rental.

For most teams, the API is cheaper than self-hosting unless you're operating at genuine scale — tens of thousands of requests per day sustained.

When to self-host anyway

The decision to self-host isn't always about cost:

Self-hosting still wins on four non-price grounds. Data residency is the hard one: if your data cannot leave your infrastructure for regulatory or compliance reasons, self-hosting is the only option. The others are latency guarantees (no network round-trips or API queue times), customization (fine-tuning, custom tokenizers, and model modifications require your own infrastructure), and freedom from rate limits, since self-hosted inference scales with your hardware rather than with the API's peak-demand throttles.

The middle ground

If you need DeepSeek in a specific geographic region but don't want to manage GPU infrastructure, third-party inference providers like Together AI and Fireworks host DeepSeek models in US and EU data centers. Their prices are higher than DeepSeek's own API — typically $0.50-1.50/M for input — but still far cheaper than frontier models, and you get the data residency and reliability of established cloud providers.

When DeepSeek isn't the right choice

An honest pricing guide should tell you when to spend more. DeepSeek's cost advantage is real, but there are workloads where choosing the cheapest model is a false economy.

When quality gaps have consequences

If wrong answers cost more than the savings — legal analysis, medical triage, financial compliance, safety-critical systems — the gap between 62 and 84 on the overall score translates directly to error rates you can't afford. Spend the extra money on GPT-5.4 or Claude Opus 4.6 for these tasks.

When instruction following precision matters

Claude Opus 4.6 leads the industry on instruction following with an Arena IF score of 1500. If your workflow depends on the model respecting complex formatting constraints, multi-step instructions, or brand voice guidelines, DeepSeek will require more prompt iteration and produce more non-compliant outputs. The debugging time can exceed the cost savings.

When you need the largest context window

DeepSeek's 128K context window is generous. Gemini 3.1 Pro, though, offers a 1M-token window — nearly 8x more. For workloads involving full codebases, long legal documents, or book-length analysis in a single pass, Gemini's context advantage is worth the higher price.

When reliability is non-negotiable

If your application has strict uptime SLAs that preclude any provider downtime, you need a provider with formal SLA guarantees. DeepSeek's API has historically experienced periods of degraded performance during high demand. Building a fallback architecture mitigates this, but if even brief outages are unacceptable, a primary deployment on OpenAI or Google's infrastructure is safer.

When data residency matters and you can't self-host

DeepSeek's API routes through infrastructure subject to Chinese data handling regulations. For enterprises with strict data sovereignty requirements — EU GDPR, US government, healthcare — this may be a non-starter unless you self-host or use a third-party inference provider in your required jurisdiction.

Four rules for a sane DeepSeek bill

DeepSeek's pricing makes it the default choice for cost-sensitive workloads where quality is "good enough", and that covers a surprisingly large share of real-world LLM use cases.

Four rules for getting the most out of DeepSeek:

Design for cache hits. Long, stable system prompts at the beginning. Variable user queries at the end. Aim for 75%+ cache hit rates. At 90% cache hits, your effective input cost drops to $0.053/M, essentially free.
Build with fallbacks. DeepSeek primary, GPT-5.4 nano or Gemini Flash-Lite secondary. Your blended cost stays far below frontier pricing, and you gain reliability insurance that costs almost nothing.
Choose chat vs reasoner by behavior, not price. They cost the same. Use chat for speed and simplicity, reasoner for tasks that benefit from chain-of-thought. Watch thinking token costs on reasoner if you're running high volume.
Test quality on YOUR task before committing. An overall score of 62 still means some workloads will show a meaningful gap versus frontier models. Run 50 representative prompts through both DeepSeek and a stronger alternative. If the outputs are equivalent for your use case, the savings are real. If they're not, no amount of caching makes up for wrong answers.

DeepSeek is not trying to be the best model. It's trying to be the model where the price-to-quality ratio is so extreme that you can't justify using anything else for the bottom 60% of your workload. On that metric, nothing else comes close.

For a broader vendor comparison, see the LLM pricing overview. For provider-specific deep dives: Claude pricing, OpenAI pricing, Gemini pricing. Use the cost calculator to model your workload, the token counter to estimate token volumes from your prompts, or read how token pricing works for a primer on LLM cost mechanics.

Pricing from DeepSeek's official pricing page. Benchmark scores from BenchLM.ai. Arena scores from arena.ai. Current as of April 2026.

Reader questions

Frequently asked questions

01How much does DeepSeek API cost?

DeepSeek's current pricing page lists both deepseek-chat and deepseek-reasoner at $0.028 per million input tokens on cache hits, $0.28 on cache misses, and $0.42 per million output tokens. The real cost driver is your cache hit rate — a 90% hit rate makes effective input cost $0.053 per million tokens.

02What is the difference between deepseek-chat and deepseek-reasoner?

Both endpoints map to DeepSeek-V3.2 at the same token price. deepseek-chat is non-thinking mode with 4-8K max output and FIM completion support. deepseek-reasoner is thinking mode with 32-64K max output. Choose based on behavior, not price — they cost the same.

03How much does DeepSeek caching save?

A cache hit costs $0.028 per million input tokens versus $0.28 for a cache miss — a 90% reduction. On workloads with consistent system prompts or repeated context, caching is the single biggest cost lever and can cut your effective input cost by 50-80% depending on your cache hit rate.

04Is DeepSeek cheaper than GPT-5.4?

Dramatically. DeepSeek cache-miss input ($0.28/M) is 9x cheaper than GPT-5.4 ($2.50/M). DeepSeek output ($0.42/M) is 36x cheaper than GPT-5.4 ($15/M). But GPT-5.4 still leads DeepSeek V3.2 on the current overall score (84 vs 62), so you're trading significant quality for significant savings. The right choice depends on whether your task needs frontier quality.

05Should I use DeepSeek or self-host it?

At DeepSeek's current API prices, self-hosting only makes economic sense at very high volumes (50,000+ daily requests) or when data cannot leave your infrastructure. The API at $0.028-$0.28/M input is so cheap that GPU rental typically costs more unless you're running at sustained high throughput.

06Is DeepSeek API reliable enough for production?

DeepSeek has experienced outages during high-demand periods. For production workloads, architect with a fallback — GPT-5.4 nano ($0.20/$1.25) or Gemini 3.1 Flash-Lite ($0.25/$1.50) as a secondary model. The cost savings from DeepSeek's normal pricing more than cover the occasional fallback to a pricier model.

Source ledger

External sources linked in this article

01DeepSeek pricing pageapi-docs.deepseek.com
02LiteLLMgithub.com
03arena.aiarena.ai

Model pricing changes frequently. We send one email a week with what moved and why.

Share or save

Share on X Share on LinkedIn