Benchmark analysis of Gemma 4 31B by Google across 8 sourced tests on BenchLM.
According to BenchLM.ai, Gemma 4 31B ranks #22 out of 103 models with an overall score of 73/100. This places it in the mid-tier of AI models, with strengths in specific benchmark categories.
Gemma 4 31B is a open weight model with a 256K token context window. It uses explicit chain-of-thought reasoning, which typically improves performance on math and complex reasoning tasks at the cost of higher latency and token usage.
Gemma 4 31B sits inside the Gemma 4 family alongside Gemma 4 26B A4B, Gemma 4 E4B, Gemma 4 E2B. This profile currently has 8 of 125 tracked benchmarks. BenchLM only exposes verified benchmark rows publicly, so missing categories stay blank until a sourced evaluation is available.
Its strongest category is Multimodal & Grounded (#27), while its weakest is Knowledge (#40). This performance profile makes it particularly strong for screenshots, documents, charts, and grounded multimodal workflows.
Provider
GoogleSource Type
Open WeightReasoning
ReasoningContext Window
256K
Model Status
Current
Release Date
Apr 2, 2026Overall Score
Pricing
$0.00 / $0.00
Input / output per 1M
Runtime
N/A
Latency unavailable
Arena Elo
1452.08
Text Overall
Human-preference results from LM Arena text leaderboards. These are displayed separately from BenchLM benchmark scoring.
Text Overall
1452.08
±8.54 · 4,679 votes
Coding
1493.88
±17.66 · 1,055 votes
Math
1457.29
±31.65 · 307 votes
Instruction Following
1451.49
±16.06 · 1,293 votes
Creative Writing
1427.14
±21.69 · 755 votes
Multi-turn
1461.39
±20.58 · 796 votes
Hard Prompts
1472.48
±11.34 · 2,585 votes
Hard Prompts (English)
1486.28
±16.95 · 1,162 votes
Longer Query
1471.29
±15.78 · 1,307 votes
HLE w/o tools 2026 · Quarterly refresh · updated April 2, 2026
Gemma 4 31B ranks #22 out of 103 models with an overall score of 73. It is created by Google and features a 256K context window.
Gemma 4 31B ranks #40 out of 103 models in knowledge and understanding benchmarks with an average score of 61.3. There are stronger options in this category.
Gemma 4 31B has visible benchmark coverage in coding and programming, but BenchLM does not currently assign it a global category rank there.
Gemma 4 31B has visible benchmark coverage in reasoning and logic, but BenchLM does not currently assign it a global category rank there.
Gemma 4 31B ranks #27 out of 103 models in multimodal and grounded tasks benchmarks with an average score of 76.9. There are stronger options in this category.
Yes, Gemma 4 31B is an open weight model created by Google, meaning it can be downloaded and run locally or fine-tuned for specific use cases.
Gemma 4 31B belongs to the Gemma 4 family. Related variants on BenchLM include Gemma 4 26B A4B, Gemma 4 E4B, Gemma 4 E2B.
Not yet. Gemma 4 31B currently has 8 verified benchmark scores out of the 125 benchmarks BenchLM tracks. BenchLM only exposes verified public benchmark rows, so missing categories stay blank until a sourced evaluation is available.
Gemma 4 31B has a context window of 256K, which determines how much text it can process in a single interaction.
New model releases, benchmark scores, and leaderboard changes. Every Friday.
Free. Your signup is stored with a derived country code for compliance routing.