Skip to main content
pricinggeminigoogleapicostguidefree tier

Gemini API Pricing: Current Flash, Flash-Lite, and Pro Rates (April 2026)

Current Gemini API pricing from Google's official docs: 3.1 Pro Preview, 3.1 Flash-Lite Preview, 3 Flash Preview, 2.5 Flash, 2.5 Pro, plus Batch and Flex pricing.

Glevd·Published April 13, 2026·14 min read

Share This Report

Copy the link, post it, or save a PDF version.

Share on XShare on LinkedIn

Gemini API pricing is more complex than any other major provider's. Google splits pricing by model, by service tier (Standard, Batch, Flex), and — for Pro models — by prompt size. That three-dimensional pricing grid is why most comparison tables get Gemini wrong. It's also why Gemini can be either the cheapest frontier option or one of the pricier ones, depending entirely on how you use it.

Five current models, three service tiers, and a prompt-size threshold on two of those models means dozens of price combinations. This guide walks through all of them, explains what most pricing summaries miss, and helps you figure out which combination actually minimizes your bill.

This guide uses the current official Gemini pricing page and Gemini rate-limit page. Use our cost calculator for quick estimates and our token counter to check prompt sizes before you ship.

The pricing you need to know

Here are the current official rates for every Gemini model with API pricing, organized by service tier. For Pro models, note the prompt-size threshold — this is the detail that most comparison sites omit.

Gemini 3.1 Pro Preview

The newest Pro model. Materially more expensive than 2.5 Pro, with prompt-size-dependent pricing.

Tier Input $/M (<=200K) Input $/M (>200K) Output $/M (<=200K) Output $/M (>200K)
Standard $2.00 $4.00 $12.00 $18.00
Batch $1.00 $2.00 $6.00 $9.00
Flex $1.00 $2.00 $6.00 $9.00

No free tier listed for this model.

Gemini 2.5 Pro

Google's previous-generation Pro model. Lower pricing than 3.1 Pro Preview, same prompt-size threshold structure.

Tier Input $/M (<=200K) Input $/M (>200K) Output $/M (<=200K) Output $/M (>200K)
Standard $1.25 $2.50 $10.00 $15.00
Batch $0.625 $1.25 $5.00 $7.50
Flex $0.625 $1.25 $5.00 $7.50

Free tier listed.

Gemini 3.1 Flash-Lite Preview

Google explicitly calls this its most cost-efficient model. No prompt-size threshold — flat pricing regardless of context length.

Tier Input $/M Output $/M
Standard $0.25 $1.50
Batch $0.125 $0.75
Flex $0.125 $0.75

Gemini 3 Flash Preview

The mid-range Flash model. Also flat pricing.

Tier Input $/M Output $/M
Standard $0.50 $3.00
Batch $0.25 $1.50
Flex $0.25 $1.50

Gemini 2.5 Flash

The stable Flash model — not in preview, which matters for production reliability.

Tier Input $/M Output $/M
Standard $0.30 $2.50
Batch $0.15 $1.25
Flex $0.15 $1.25

What a typical request actually costs

Price tables in $/M tokens are hard to reason about intuitively. Here's what a single standard-tier request costs for a common RAG workload — 2,500 input tokens, 400 output tokens — on each model:

Model Cost per request Monthly at 5K req/day
3.1 Flash-Lite Preview $0.00123 ~$184
2.5 Flash $0.00175 ~$263
3 Flash Preview $0.00245 ~$368
2.5 Pro (<=200K) $0.00713 ~$1,069
3.1 Pro Preview (<=200K) $0.00980 ~$1,470

The spread from cheapest to most expensive is nearly 8x. On no other provider is the gap between models this wide.

The three things most Gemini pricing guides get wrong

If you're comparing Gemini to other providers using a third-party table, check whether that table makes these mistakes. Most do.

1. Gemini 3.1 Pro Preview is NOT $1.25/$5

The most common error. Many comparison sites still show Gemini Pro at $1.25 input and $5 output. Those were plausible numbers for an earlier generation. The current Gemini pricing page lists Gemini 3.1 Pro Preview at $2.00/$12.00 on Standard tier for prompts under 200K tokens, and $4.00/$18.00 above 200K.

That is a completely different pricing profile. If you're budgeting based on $1.25/$5, you'll undershoot your actual bill by 60% on input and 140% on output. If a comparison table you're reading doesn't distinguish between 3.1 Pro Preview and 2.5 Pro, it's not comparing real prices.

2. The 200K prompt threshold changes your cost by 2x

Both Pro models have split pricing based on total prompt size. Prompts under 200K tokens get the lower rate. Prompts above 200K get the higher rate — and the higher rate applies to the entire prompt, not just the tokens above the threshold.

Most guides show a single number for Pro pricing. That single number is only correct if every request stays under 200K. If you're using Gemini's large context window with Pro models, you need to know about this threshold, and we break it down in detail below.

3. Rate limits aren't fixed RPM numbers

Google's current rate-limit docs say limits vary by model and usage tier, and that preview models are more restricted. If you've seen blog posts quoting specific RPM numbers — "Flash is 15 RPM" or "Pro is 2 RPM" — don't plan capacity around those. Your actual limits should be checked in AI Studio. They are not stable constants you can bake into infrastructure.

Standard vs Batch vs Flex — the tier that saves you 50%

This is the pricing lever that most teams ignore, and it's the simplest way to cut your Gemini bill in half. Every Gemini model offers three service tiers with different latency and cost tradeoffs.

Standard is real-time processing at full price. Requests are processed immediately. This is the tier for user-facing applications where latency matters — chatbots, interactive coding assistants, search-and-summarize UIs.

Batch is asynchronous processing at 50% off. You submit a batch of requests and get results returned later. This is the tier for content generation pipelines, data extraction jobs, nightly analysis runs, and anything where you can tolerate minutes or hours of delay.

Flex is lower-priority processing, also at 50% off. Similar discount to Batch but with potentially more variable latency. Google describes this as a lower-priority queue.

The discount is consistent across all models: Batch and Flex pricing is exactly half of Standard pricing for both input and output tokens.

The decision framework

The question isn't whether Batch/Flex is cheaper — it always is, by exactly 50%. The question is what percentage of your workload can tolerate async processing.

  • If 100% of your traffic is user-facing (chat, interactive tools), you're stuck on Standard. Focus on model selection instead.
  • If 50% or more is async (batch processing, content generation, data pipelines), run a mixed Standard+Batch architecture. The blended savings will be 25-50% depending on the split.
  • If everything is async (offline analysis, bulk extraction, evaluation runs), use Batch or Flex for everything. Your bill drops by half with zero quality difference.

Same workload, Standard vs Batch

To make the savings concrete, here's what 10,000 daily requests (2,500 input + 400 output tokens each) costs on Standard vs Batch for each model:

Model Standard/month Batch/month Monthly savings
3.1 Flash-Lite Preview $368 $184 $184
2.5 Flash $525 $263 $263
3 Flash Preview $735 $368 $368
2.5 Pro (<=200K) $2,138 $1,069 $1,069
3.1 Pro Preview (<=200K) $2,940 $1,470 $1,470

On 2.5 Pro, switching from Standard to Batch saves over $1,000/month for this workload. On 3.1 Pro Preview, the savings exceed $1,400/month. These are not marginal optimizations — tier selection is the single highest-leverage cost decision you can make on Gemini.

Before you spend time optimizing prompts to shave tokens, ask whether any of your workload can run on Batch. It's a faster path to savings.

The 200K prompt threshold — a pricing trap you need to know about

Both Gemini 3.1 Pro Preview and Gemini 2.5 Pro have a critical pricing discontinuity at 200K tokens. If your prompt exceeds 200K tokens, the higher rate applies to the entire prompt — not just the tokens above 200K.

This means crossing 200K by even a single token doubles your input cost on the whole request.

The math that makes this concrete

Consider a prompt of exactly 200K tokens vs 201K tokens on Gemini 3.1 Pro Preview Standard:

200K prompt (under threshold):

  • Input cost: 200,000 x $2.00 / 1,000,000 = $0.40

201K prompt (over threshold):

  • Input cost: 201,000 x $4.00 / 1,000,000 = $0.804

One extra token doubled your input cost. The effective per-token rate jumped from $2.00/M to $4.00/M across the board. On the output side, the jump is from $12.00/M to $18.00/M — a 50% increase.

For a 300K prompt on 3.1 Pro Preview Standard, the numbers look worse:

  • Input: 300,000 x $4.00 / 1,000,000 = $1.20 (vs $0.60 if the threshold didn't exist)
  • Output (say 2,000 tokens): 2,000 x $18.00 / 1,000,000 = $0.036 (vs $0.024 under threshold)

The same threshold on 2.5 Pro

Gemini 2.5 Pro has the same structure but at lower absolute prices:

  • Under 200K: $1.25 input / $10.00 output per million
  • Over 200K: $2.50 input / $15.00 output per million

A 250K prompt on 2.5 Pro Standard costs 250,000 x $2.50 / 1M = $0.625. The same content, if you can trim it to 200K, costs 200,000 x $1.25 / 1M = $0.25 — less than half.

How to stay under the threshold

If your workloads approach the 200K boundary, these strategies keep you on the cheaper side:

  1. Aggressive RAG retrieval. Instead of stuffing the full context window, retrieve only the most relevant chunks. A well-tuned RAG system that retrieves 20K tokens of context instead of 250K saves you the threshold jump and the raw token cost.

  2. Summarize context. Pre-summarize long documents before including them in the prompt. A 300K document summarized to 50K keeps you well under the threshold.

  3. Use Flash models for large-context work. Flash models have flat pricing with no threshold. If you're processing long documents where quality requirements are moderate, Gemini 2.5 Flash at $0.30/$2.50 handles any context length at the same rate — potentially cheaper than Pro at the >200K bracket despite being a simpler model.

  4. Monitor prompt sizes in production. Log the token count of every request. If you see requests drifting above 200K, you're paying the premium rate without necessarily realizing it.

Benchmark-adjusted value — is Gemini actually the cheapest good option?

Price tables are useful, but they don't answer the question that matters for budgeting: how much quality am I getting per dollar? Here's the comparison that pricing pages don't show — cost efficiency measured against BenchLM's overall score, which weights coding, reasoning, math, knowledge, multimodal, long-context, instruction following, and agentic capabilities:

Model BenchLM Score Input $/M Output $/M Cost per point (output)
Gemini 2.5 Pro 65 $1.25 $10.00 $0.154
Gemini 3.1 Pro Preview 83 $2.00 $12.00 $0.145
GPT-5.4 84 $2.50 $15.00 $0.179
Claude Sonnet 4.6 76 $3.00 $15.00 $0.197
Claude Opus 4.6 80 $5.00 $25.00 $0.313

BenchLM overall scores from BenchLM.ai. Gemini Pro pricing is for standard context (<=200K tokens). Cost per point = output $/M divided by BenchLM score. Prices per million tokens.

Two findings stand out:

Gemini 3.1 Pro Preview is the best current cost-per-benchmark-point among the higher-end models in this table. Gemini 2.5 Pro is even cheaper, but it also sits meaningfully lower in BenchLM's current overall score.

Gemini 3.1 Pro Preview is competitive with GPT-5.4 on both quality and price. It trails GPT-5.4 by a point in BenchLM's current overall score (83 vs 84) while carrying a lower output rate ($12 vs $15). On cost-per-point, Gemini 3.1 Pro Preview is still cheaper than GPT-5.4.

On Batch/Flex pricing, the advantage widens further. Gemini 2.5 Pro Batch at $0.625/$5.00 is dramatically cheaper than GPT-5.4's standard rate of $2.50/$15.00 — and there's no GPT-5.4 batch discount that matches this gap.

Where Gemini specifically excels on benchmarks

The aggregate score tells one story. The category breakdowns tell a more actionable one. From our benchmark data, Gemini 3.1 Pro Preview is especially strong in several specific areas:

  • MMMU-Pro: 83.9 — a strong multimodal score for image understanding, document analysis, and visual reasoning.
  • MRCRv2: 90 — strong long-context reasoning relative to most peers.
  • BrowseComp: 86 — a strong agentic browsing score for teams building autonomous agents or web-interaction tools.

These category strengths mean Gemini's value proposition is strongest for teams doing multimodal processing, coding tasks, or agentic workloads — and weakest for pure instruction-following or creative writing tasks, where Claude leads.

Adding Flash models to the value picture

The Flash models don't compete on raw benchmark quality, but they dominate on cost efficiency for workloads that don't need frontier-tier reasoning:

Model Tier Input $/M Output $/M
3.1 Flash-Lite Preview Standard $0.25 $1.50
3.1 Flash-Lite Preview Batch/Flex $0.125 $0.75
2.5 Flash Standard $0.30 $2.50
2.5 Flash Batch/Flex $0.15 $1.25

Flash-Lite on Batch/Flex at $0.125/$0.75 is in DeepSeek's pricing territory. For high-volume classification, extraction, and summarization — tasks where a smaller model clears the quality bar — this is the cheapest option from any major Western provider.

Real cost examples — same workload, different Gemini strategies

Price tables are abstractions. Here's what different Gemini strategies cost on concrete workloads.

Scenario 1: RAG application — 5,000 requests/day, 2,500 input + 400 output tokens

This is a standard retrieval-augmented generation workload where prompts are well under 200K tokens.

Strategy Daily cost Monthly cost
3.1 Flash-Lite Preview — Standard $6.13 ~$184
2.5 Flash — Standard $8.75 ~$263
3 Flash Preview — Standard $12.25 ~$368
2.5 Pro — Standard (<=200K) $35.63 ~$1,069
2.5 Pro — Batch (<=200K) $17.81 ~$534
3.1 Pro Preview — Standard (<=200K) $49.00 ~$1,470
3.1 Pro Preview — Batch (<=200K) $24.50 ~$735

The key insight: 2.5 Pro on Batch ($534/month) is cheaper than 3.1 Pro Preview on Batch ($735/month) and dramatically cheaper than either on Standard. If you don't need 3.1 Pro Preview's latest capabilities, 2.5 Pro Batch is the sweet spot for quality-sensitive workloads on a budget.

And if the quality bar is moderate — simple Q&A, summarization, extraction — Flash-Lite at $184/month handles 5,000 daily requests for less than many teams spend on a single SaaS subscription.

Scenario 2: Large-context document analysis — prompts averaging 300K tokens

This is where the 200K threshold creates cost shocks. Assume 500 requests/day, each with 300K input tokens and 1,000 output tokens.

Strategy Input rate Daily input cost Daily output cost Monthly cost
3.1 Pro Preview — Standard $4.00/M (>200K) $600.00 $9.00 ~$18,270
2.5 Pro — Standard $2.50/M (>200K) $375.00 $7.50 ~$11,475
2.5 Flash — Standard $0.30/M (flat) $45.00 $1.25 ~$1,388

The difference is staggering. The same 500 daily requests cost $18,270/month on 3.1 Pro Preview or $1,388/month on 2.5 Flash — a 13x gap, driven almost entirely by the Pro model's >200K pricing bracket.

For large-context workloads, the right question isn't "which Pro model?" — it's "can a Flash model handle this task?" If the answer is yes, you save an order of magnitude. If quality requirements demand a Pro model, architect your pipeline to keep prompts under 200K through summarization or targeted retrieval.

Scenario 3: Mixed architecture — routing across models and tiers

The most cost-effective Gemini deployment routes requests based on complexity and latency requirements. Here's a realistic split for a team processing 10,000 daily requests:

  • 60% to Flash-Lite Batch (classification, extraction, simple Q&A): 6,000 req x (2,500 x $0.125/1M + 400 x $0.75/1M) = $3.68/day
  • 30% to 2.5 Flash Standard (interactive features, moderate-quality generation): 3,000 req x (2,500 x $0.30/1M + 400 x $2.50/1M) = $5.25/day
  • 10% to 2.5 Pro Batch (complex reasoning, high-stakes output): 1,000 req x (2,500 x $0.625/1M + 400 x $5.00/1M) = $3.56/day

Routed monthly total: ~$375 — compared to $2,138/month if everything ran on 2.5 Pro Standard. That's an 82% cost reduction with Pro-tier quality available for the requests that need it.

The free tier — useful for prototyping, not for planning

Google offers free-tier access for some Gemini models, and it's genuinely generous for getting started. But it has limitations that matter for production planning.

Not all models have a free tier. Gemini 2.5 Pro has a listed free tier. Gemini 3.1 Pro Preview does not. If you prototype on a free model and then need to switch to a model without free-tier access, your cost jumps from zero to full price with no middle ground.

Limits vary and are not published as fixed numbers. Google's docs say free-tier limits depend on the model and your usage tier. The actual limits should be checked in AI Studio, not in blog posts — including this one. Don't architect around free-tier quotas you found in a third-party comparison; they may already be stale.

The free tier is the best in the industry for experimentation. No other major provider offers free access to a model as capable as Gemini 2.5 Pro. For validating prompt designs, testing integrations, and building proof-of-concept applications, the Gemini free tier is genuinely hard to beat.

When to upgrade: Once you need reliable throughput, SLA guarantees, consistent rate limits, or data handling commitments that go beyond what the free tier provides. Production traffic should run on paid tiers — the free tier is for learning and prototyping, not for serving users.

When Gemini isn't the right choice

Gemini's pricing complexity works in your favor for many workloads. But complexity doesn't mean it's the best option everywhere. Here's where other providers have concrete advantages.

Best instruction following and writing quality

Claude Opus 4.6 leads on Arena instruction following (about 1498) and creative writing (about 1468). Gemini 3.1 Pro Preview is close on instruction following (about 1490) but trails on writing. If your workload is primarily about precise instruction adherence, brand-voice consistency, or long-form prose quality, Claude's premium justifies itself in reduced human rework.

Strongest math and reasoning

GPT-5.4 leads on AIME (99), BRUMO 2025 (97), and MRCRv2 (97). For competition-style problem solving or tasks where mathematical reasoning precision is the bottleneck, GPT-5.4 has the stronger benchmark profile.

Cheapest at extreme scale

DeepSeek V3.2 at $0.028/M on cache hits is in a different cost universe. If you're processing tens of millions of requests on tasks where a smaller model clears the quality bar, DeepSeek's pricing is unbeatable — and its cache-hit discount is more aggressive than any Gemini tier.

Simplest pricing

Claude has three tiers with a consistent 5x output-to-input ratio and no prompt-size thresholds. OpenAI also has a straightforward tier structure with cached input pricing. If pricing predictability matters more than pricing optimization — for budgeting, for procurement approvals, for cost estimation in proposals — simpler pricing structures reduce operational overhead.

Gemini's complexity is a feature for teams that optimize aggressively. It's a tax for teams that just want a predictable number.

The practical takeaway

Gemini's pricing complexity is a feature, not a bug — but only if you understand it. Most teams overspend on Gemini not because the rates are high, but because they use one model on one tier without considering the alternatives.

Two strategies work:

Strategy 1: Flash + Pro Batch. Route simple workloads to Flash-Lite or 2.5 Flash on Standard tier (cheap, real-time), and route complex workloads to a Pro model on Batch tier (high quality, 50% off). This captures the quality of Pro models and the cost efficiency of Flash models without paying Standard-tier Pro prices for everything.

Strategy 2: Stay on 2.5 Pro. If you need Pro-tier quality but Gemini 3.1 Pro Preview's improvements don't meaningfully change your output quality, stick with 2.5 Pro. It's 37% cheaper on input and 17% cheaper on output at Standard rates under 200K. Upgrade to 3.1 Pro Preview only if you've measured a quality improvement on your specific tasks that justifies the price jump.

In both cases: monitor whether your Pro-model prompts stay under 200K tokens. Crossing that threshold doubles input costs on the entire request, and it's easy to drift above it without realizing. Log token counts, set alerts, and architect your retrieval pipeline to stay under the line.

For a broader vendor comparison, see the LLM pricing overview. For provider-specific deep dives: Claude pricing, OpenAI pricing, DeepSeek pricing. Use the cost calculator to model your workload or the token counter to estimate token volumes from your prompts. For background on how token pricing works across all providers, see our token pricing guide.

Pricing from Google's official Gemini pricing page and rate-limit page. Benchmark scores from BenchLM.ai. Current as of April 2026.

Model pricing changes frequently. We send one email a week with what moved and why.