An agentic spatial reasoning benchmark reported as a normalized score.
BenchLM mirrors the published score view for Blueprint-Bench 2. Gemini 3.5 Flash leads the public snapshot at 33.6%. BenchLM does not use these results to rank models overall.
Year
2026
Tasks
Spatial reasoning from blueprints
Format
Normalized score
Difficulty
Agentic spatial reasoning
Google reported Blueprint-Bench 2 in the Gemini 3.5 Flash launch comparison table. BenchLM stores it as a display-only multimodal and spatial-reasoning benchmark until Google publishes the full methodology page.
Version
Blueprint-Bench 2 2026
Refresh cadence
Quarterly
Staleness state
Current
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
An agentic spatial reasoning benchmark reported as a normalized score.
Gemini 3.5 Flash by Google currently leads with a score of 33.6% on Blueprint-Bench 2.
1 AI models have been evaluated on Blueprint-Bench 2 on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.