Current Gemini API pricing from Google's official docs: 3.1 Pro Preview, 3.1 Flash-Lite Preview, 3 Flash Preview, 2.5 Flash, 2.5 Pro, plus Batch and Flex pricing.
Share This Report
Copy the link, post it, or save a PDF version.
Gemini API pricing is more complex than any other major provider's. Google splits pricing by model, by service tier (Standard, Batch, Flex), and — for Pro models — by prompt size. That three-dimensional pricing grid is why most comparison tables get Gemini wrong. It's also why Gemini can be either the cheapest frontier option or one of the pricier ones, depending entirely on how you use it.
Five current models, three service tiers, and a prompt-size threshold on two of those models means dozens of price combinations. This guide walks through all of them, explains what most pricing summaries miss, and helps you figure out which combination actually minimizes your bill.
This guide uses the current official Gemini pricing page and Gemini rate-limit page. Use our cost calculator for quick estimates and our token counter to check prompt sizes before you ship.
Here are the current official rates for every Gemini model with API pricing, organized by service tier. For Pro models, note the prompt-size threshold — this is the detail that most comparison sites omit.
The newest Pro model. Materially more expensive than 2.5 Pro, with prompt-size-dependent pricing.
| Tier | Input $/M (<=200K) | Input $/M (>200K) | Output $/M (<=200K) | Output $/M (>200K) |
|---|---|---|---|---|
| Standard | $2.00 | $4.00 | $12.00 | $18.00 |
| Batch | $1.00 | $2.00 | $6.00 | $9.00 |
| Flex | $1.00 | $2.00 | $6.00 | $9.00 |
No free tier listed for this model.
Google's previous-generation Pro model. Lower pricing than 3.1 Pro Preview, same prompt-size threshold structure.
| Tier | Input $/M (<=200K) | Input $/M (>200K) | Output $/M (<=200K) | Output $/M (>200K) |
|---|---|---|---|---|
| Standard | $1.25 | $2.50 | $10.00 | $15.00 |
| Batch | $0.625 | $1.25 | $5.00 | $7.50 |
| Flex | $0.625 | $1.25 | $5.00 | $7.50 |
Free tier listed.
Google explicitly calls this its most cost-efficient model. No prompt-size threshold — flat pricing regardless of context length.
| Tier | Input $/M | Output $/M |
|---|---|---|
| Standard | $0.25 | $1.50 |
| Batch | $0.125 | $0.75 |
| Flex | $0.125 | $0.75 |
The mid-range Flash model. Also flat pricing.
| Tier | Input $/M | Output $/M |
|---|---|---|
| Standard | $0.50 | $3.00 |
| Batch | $0.25 | $1.50 |
| Flex | $0.25 | $1.50 |
The stable Flash model — not in preview, which matters for production reliability.
| Tier | Input $/M | Output $/M |
|---|---|---|
| Standard | $0.30 | $2.50 |
| Batch | $0.15 | $1.25 |
| Flex | $0.15 | $1.25 |
Price tables in $/M tokens are hard to reason about intuitively. Here's what a single standard-tier request costs for a common RAG workload — 2,500 input tokens, 400 output tokens — on each model:
| Model | Cost per request | Monthly at 5K req/day |
|---|---|---|
| 3.1 Flash-Lite Preview | $0.00123 | ~$184 |
| 2.5 Flash | $0.00175 | ~$263 |
| 3 Flash Preview | $0.00245 | ~$368 |
| 2.5 Pro (<=200K) | $0.00713 | ~$1,069 |
| 3.1 Pro Preview (<=200K) | $0.00980 | ~$1,470 |
The spread from cheapest to most expensive is nearly 8x. On no other provider is the gap between models this wide.
If you're comparing Gemini to other providers using a third-party table, check whether that table makes these mistakes. Most do.
The most common error. Many comparison sites still show Gemini Pro at $1.25 input and $5 output. Those were plausible numbers for an earlier generation. The current Gemini pricing page lists Gemini 3.1 Pro Preview at $2.00/$12.00 on Standard tier for prompts under 200K tokens, and $4.00/$18.00 above 200K.
That is a completely different pricing profile. If you're budgeting based on $1.25/$5, you'll undershoot your actual bill by 60% on input and 140% on output. If a comparison table you're reading doesn't distinguish between 3.1 Pro Preview and 2.5 Pro, it's not comparing real prices.
Both Pro models have split pricing based on total prompt size. Prompts under 200K tokens get the lower rate. Prompts above 200K get the higher rate — and the higher rate applies to the entire prompt, not just the tokens above the threshold.
Most guides show a single number for Pro pricing. That single number is only correct if every request stays under 200K. If you're using Gemini's large context window with Pro models, you need to know about this threshold, and we break it down in detail below.
Google's current rate-limit docs say limits vary by model and usage tier, and that preview models are more restricted. If you've seen blog posts quoting specific RPM numbers — "Flash is 15 RPM" or "Pro is 2 RPM" — don't plan capacity around those. Your actual limits should be checked in AI Studio. They are not stable constants you can bake into infrastructure.
This is the pricing lever that most teams ignore, and it's the simplest way to cut your Gemini bill in half. Every Gemini model offers three service tiers with different latency and cost tradeoffs.
Standard is real-time processing at full price. Requests are processed immediately. This is the tier for user-facing applications where latency matters — chatbots, interactive coding assistants, search-and-summarize UIs.
Batch is asynchronous processing at 50% off. You submit a batch of requests and get results returned later. This is the tier for content generation pipelines, data extraction jobs, nightly analysis runs, and anything where you can tolerate minutes or hours of delay.
Flex is lower-priority processing, also at 50% off. Similar discount to Batch but with potentially more variable latency. Google describes this as a lower-priority queue.
The discount is consistent across all models: Batch and Flex pricing is exactly half of Standard pricing for both input and output tokens.
The question isn't whether Batch/Flex is cheaper — it always is, by exactly 50%. The question is what percentage of your workload can tolerate async processing.
To make the savings concrete, here's what 10,000 daily requests (2,500 input + 400 output tokens each) costs on Standard vs Batch for each model:
| Model | Standard/month | Batch/month | Monthly savings |
|---|---|---|---|
| 3.1 Flash-Lite Preview | $368 | $184 | $184 |
| 2.5 Flash | $525 | $263 | $263 |
| 3 Flash Preview | $735 | $368 | $368 |
| 2.5 Pro (<=200K) | $2,138 | $1,069 | $1,069 |
| 3.1 Pro Preview (<=200K) | $2,940 | $1,470 | $1,470 |
On 2.5 Pro, switching from Standard to Batch saves over $1,000/month for this workload. On 3.1 Pro Preview, the savings exceed $1,400/month. These are not marginal optimizations — tier selection is the single highest-leverage cost decision you can make on Gemini.
Before you spend time optimizing prompts to shave tokens, ask whether any of your workload can run on Batch. It's a faster path to savings.
Both Gemini 3.1 Pro Preview and Gemini 2.5 Pro have a critical pricing discontinuity at 200K tokens. If your prompt exceeds 200K tokens, the higher rate applies to the entire prompt — not just the tokens above 200K.
This means crossing 200K by even a single token doubles your input cost on the whole request.
Consider a prompt of exactly 200K tokens vs 201K tokens on Gemini 3.1 Pro Preview Standard:
200K prompt (under threshold):
201K prompt (over threshold):
One extra token doubled your input cost. The effective per-token rate jumped from $2.00/M to $4.00/M across the board. On the output side, the jump is from $12.00/M to $18.00/M — a 50% increase.
For a 300K prompt on 3.1 Pro Preview Standard, the numbers look worse:
Gemini 2.5 Pro has the same structure but at lower absolute prices:
A 250K prompt on 2.5 Pro Standard costs 250,000 x $2.50 / 1M = $0.625. The same content, if you can trim it to 200K, costs 200,000 x $1.25 / 1M = $0.25 — less than half.
If your workloads approach the 200K boundary, these strategies keep you on the cheaper side:
Aggressive RAG retrieval. Instead of stuffing the full context window, retrieve only the most relevant chunks. A well-tuned RAG system that retrieves 20K tokens of context instead of 250K saves you the threshold jump and the raw token cost.
Summarize context. Pre-summarize long documents before including them in the prompt. A 300K document summarized to 50K keeps you well under the threshold.
Use Flash models for large-context work. Flash models have flat pricing with no threshold. If you're processing long documents where quality requirements are moderate, Gemini 2.5 Flash at $0.30/$2.50 handles any context length at the same rate — potentially cheaper than Pro at the >200K bracket despite being a simpler model.
Monitor prompt sizes in production. Log the token count of every request. If you see requests drifting above 200K, you're paying the premium rate without necessarily realizing it.
Price tables are useful, but they don't answer the question that matters for budgeting: how much quality am I getting per dollar? Here's the comparison that pricing pages don't show — cost efficiency measured against BenchLM's overall score, which weights coding, reasoning, math, knowledge, multimodal, long-context, instruction following, and agentic capabilities:
| Model | BenchLM Score | Input $/M | Output $/M | Cost per point (output) |
|---|---|---|---|---|
| Gemini 2.5 Pro | 65 | $1.25 | $10.00 | $0.154 |
| Gemini 3.1 Pro Preview | 83 | $2.00 | $12.00 | $0.145 |
| GPT-5.4 | 84 | $2.50 | $15.00 | $0.179 |
| Claude Sonnet 4.6 | 76 | $3.00 | $15.00 | $0.197 |
| Claude Opus 4.6 | 80 | $5.00 | $25.00 | $0.313 |
BenchLM overall scores from BenchLM.ai. Gemini Pro pricing is for standard context (<=200K tokens). Cost per point = output $/M divided by BenchLM score. Prices per million tokens.
Two findings stand out:
Gemini 3.1 Pro Preview is the best current cost-per-benchmark-point among the higher-end models in this table. Gemini 2.5 Pro is even cheaper, but it also sits meaningfully lower in BenchLM's current overall score.
Gemini 3.1 Pro Preview is competitive with GPT-5.4 on both quality and price. It trails GPT-5.4 by a point in BenchLM's current overall score (83 vs 84) while carrying a lower output rate ($12 vs $15). On cost-per-point, Gemini 3.1 Pro Preview is still cheaper than GPT-5.4.
On Batch/Flex pricing, the advantage widens further. Gemini 2.5 Pro Batch at $0.625/$5.00 is dramatically cheaper than GPT-5.4's standard rate of $2.50/$15.00 — and there's no GPT-5.4 batch discount that matches this gap.
The aggregate score tells one story. The category breakdowns tell a more actionable one. From our benchmark data, Gemini 3.1 Pro Preview is especially strong in several specific areas:
These category strengths mean Gemini's value proposition is strongest for teams doing multimodal processing, coding tasks, or agentic workloads — and weakest for pure instruction-following or creative writing tasks, where Claude leads.
The Flash models don't compete on raw benchmark quality, but they dominate on cost efficiency for workloads that don't need frontier-tier reasoning:
| Model | Tier | Input $/M | Output $/M |
|---|---|---|---|
| 3.1 Flash-Lite Preview | Standard | $0.25 | $1.50 |
| 3.1 Flash-Lite Preview | Batch/Flex | $0.125 | $0.75 |
| 2.5 Flash | Standard | $0.30 | $2.50 |
| 2.5 Flash | Batch/Flex | $0.15 | $1.25 |
Flash-Lite on Batch/Flex at $0.125/$0.75 is in DeepSeek's pricing territory. For high-volume classification, extraction, and summarization — tasks where a smaller model clears the quality bar — this is the cheapest option from any major Western provider.
Price tables are abstractions. Here's what different Gemini strategies cost on concrete workloads.
This is a standard retrieval-augmented generation workload where prompts are well under 200K tokens.
| Strategy | Daily cost | Monthly cost |
|---|---|---|
| 3.1 Flash-Lite Preview — Standard | $6.13 | ~$184 |
| 2.5 Flash — Standard | $8.75 | ~$263 |
| 3 Flash Preview — Standard | $12.25 | ~$368 |
| 2.5 Pro — Standard (<=200K) | $35.63 | ~$1,069 |
| 2.5 Pro — Batch (<=200K) | $17.81 | ~$534 |
| 3.1 Pro Preview — Standard (<=200K) | $49.00 | ~$1,470 |
| 3.1 Pro Preview — Batch (<=200K) | $24.50 | ~$735 |
The key insight: 2.5 Pro on Batch ($534/month) is cheaper than 3.1 Pro Preview on Batch ($735/month) and dramatically cheaper than either on Standard. If you don't need 3.1 Pro Preview's latest capabilities, 2.5 Pro Batch is the sweet spot for quality-sensitive workloads on a budget.
And if the quality bar is moderate — simple Q&A, summarization, extraction — Flash-Lite at $184/month handles 5,000 daily requests for less than many teams spend on a single SaaS subscription.
This is where the 200K threshold creates cost shocks. Assume 500 requests/day, each with 300K input tokens and 1,000 output tokens.
| Strategy | Input rate | Daily input cost | Daily output cost | Monthly cost |
|---|---|---|---|---|
| 3.1 Pro Preview — Standard | $4.00/M (>200K) | $600.00 | $9.00 | ~$18,270 |
| 2.5 Pro — Standard | $2.50/M (>200K) | $375.00 | $7.50 | ~$11,475 |
| 2.5 Flash — Standard | $0.30/M (flat) | $45.00 | $1.25 | ~$1,388 |
The difference is staggering. The same 500 daily requests cost $18,270/month on 3.1 Pro Preview or $1,388/month on 2.5 Flash — a 13x gap, driven almost entirely by the Pro model's >200K pricing bracket.
For large-context workloads, the right question isn't "which Pro model?" — it's "can a Flash model handle this task?" If the answer is yes, you save an order of magnitude. If quality requirements demand a Pro model, architect your pipeline to keep prompts under 200K through summarization or targeted retrieval.
The most cost-effective Gemini deployment routes requests based on complexity and latency requirements. Here's a realistic split for a team processing 10,000 daily requests:
Routed monthly total: ~$375 — compared to $2,138/month if everything ran on 2.5 Pro Standard. That's an 82% cost reduction with Pro-tier quality available for the requests that need it.
Google offers free-tier access for some Gemini models, and it's genuinely generous for getting started. But it has limitations that matter for production planning.
Not all models have a free tier. Gemini 2.5 Pro has a listed free tier. Gemini 3.1 Pro Preview does not. If you prototype on a free model and then need to switch to a model without free-tier access, your cost jumps from zero to full price with no middle ground.
Limits vary and are not published as fixed numbers. Google's docs say free-tier limits depend on the model and your usage tier. The actual limits should be checked in AI Studio, not in blog posts — including this one. Don't architect around free-tier quotas you found in a third-party comparison; they may already be stale.
The free tier is the best in the industry for experimentation. No other major provider offers free access to a model as capable as Gemini 2.5 Pro. For validating prompt designs, testing integrations, and building proof-of-concept applications, the Gemini free tier is genuinely hard to beat.
When to upgrade: Once you need reliable throughput, SLA guarantees, consistent rate limits, or data handling commitments that go beyond what the free tier provides. Production traffic should run on paid tiers — the free tier is for learning and prototyping, not for serving users.
Gemini's pricing complexity works in your favor for many workloads. But complexity doesn't mean it's the best option everywhere. Here's where other providers have concrete advantages.
Claude Opus 4.6 leads on Arena instruction following (about 1498) and creative writing (about 1468). Gemini 3.1 Pro Preview is close on instruction following (about 1490) but trails on writing. If your workload is primarily about precise instruction adherence, brand-voice consistency, or long-form prose quality, Claude's premium justifies itself in reduced human rework.
GPT-5.4 leads on AIME (99), BRUMO 2025 (97), and MRCRv2 (97). For competition-style problem solving or tasks where mathematical reasoning precision is the bottleneck, GPT-5.4 has the stronger benchmark profile.
DeepSeek V3.2 at $0.028/M on cache hits is in a different cost universe. If you're processing tens of millions of requests on tasks where a smaller model clears the quality bar, DeepSeek's pricing is unbeatable — and its cache-hit discount is more aggressive than any Gemini tier.
Claude has three tiers with a consistent 5x output-to-input ratio and no prompt-size thresholds. OpenAI also has a straightforward tier structure with cached input pricing. If pricing predictability matters more than pricing optimization — for budgeting, for procurement approvals, for cost estimation in proposals — simpler pricing structures reduce operational overhead.
Gemini's complexity is a feature for teams that optimize aggressively. It's a tax for teams that just want a predictable number.
Gemini's pricing complexity is a feature, not a bug — but only if you understand it. Most teams overspend on Gemini not because the rates are high, but because they use one model on one tier without considering the alternatives.
Two strategies work:
Strategy 1: Flash + Pro Batch. Route simple workloads to Flash-Lite or 2.5 Flash on Standard tier (cheap, real-time), and route complex workloads to a Pro model on Batch tier (high quality, 50% off). This captures the quality of Pro models and the cost efficiency of Flash models without paying Standard-tier Pro prices for everything.
Strategy 2: Stay on 2.5 Pro. If you need Pro-tier quality but Gemini 3.1 Pro Preview's improvements don't meaningfully change your output quality, stick with 2.5 Pro. It's 37% cheaper on input and 17% cheaper on output at Standard rates under 200K. Upgrade to 3.1 Pro Preview only if you've measured a quality improvement on your specific tasks that justifies the price jump.
In both cases: monitor whether your Pro-model prompts stay under 200K tokens. Crossing that threshold doubles input costs on the entire request, and it's easy to drift above it without realizing. Log token counts, set alerts, and architect your retrieval pipeline to stay under the line.
For a broader vendor comparison, see the LLM pricing overview. For provider-specific deep dives: Claude pricing, OpenAI pricing, DeepSeek pricing. Use the cost calculator to model your workload or the token counter to estimate token volumes from your prompts. For background on how token pricing works across all providers, see our token pricing guide.
Pricing from Google's official Gemini pricing page and rate-limit page. Benchmark scores from BenchLM.ai. Current as of April 2026.
Model pricing changes frequently. We send one email a week with what moved and why.
Share This Report
Copy the link, post it, or save a PDF version.
On this page
Which models moved up, what’s new, and what it costs. One email a week, 3-min read.
Free. One email per week.
Learn how LLM API pricing works — from tokens, input/output costs, and reasoning tokens to vision, embedding, and fine-tuning pricing. Includes real cost examples, free tiers, and 6 strategies to cut your AI spend.
Current Anthropic Claude API pricing from official model pages and the Claude Opus 4.7 launch announcement, including prompt caching, batch discounts, and current long-context notes.
Current DeepSeek API pricing from the official docs: deepseek-chat and deepseek-reasoner, cache-hit vs cache-miss pricing, output pricing, and the current V3.2 endpoint mapping.