Head-to-head comparison across 1benchmark categories. Overall scores shown here use BenchLM's provisional ranking lane.
Gemini 3.1 Flash-Lite
48
GPT-5.5
91
Verified leaderboard positions: Gemini 3.1 Flash-Lite unranked · GPT-5.5 #3
Pick GPT-5.5 if you want the stronger benchmark profile. Gemini 3.1 Flash-Lite only becomes the better choice if multimodal & grounded is the priority or you want the cheaper token bill.
Multimodal
+2.8 difference
Gemini 3.1 Flash-Lite
GPT-5.5
$0.25 / $1.5
$5 / $30
205 t/s
N/A
7.50s
N/A
1M
1M
Pick GPT-5.5 if you want the stronger benchmark profile. Gemini 3.1 Flash-Lite only becomes the better choice if multimodal & grounded is the priority or you want the cheaper token bill.
GPT-5.5 is clearly ahead on the provisional aggregate, 91 to 48. The gap is large enough that you do not need to squint at the spreadsheet to see the difference.
GPT-5.5 is also the more expensive model on tokens at $5.00 input / $30.00 output per 1M tokens, versus $0.25 input / $1.50 output per 1M tokens for Gemini 3.1 Flash-Lite. That is roughly 20.0x on output cost alone. GPT-5.5 is the reasoning model in the pair, while Gemini 3.1 Flash-Lite is not. That usually helps on harder chain-of-thought-heavy tests, but it can also mean more latency and more token spend in real use.
GPT-5.5 is ahead on BenchLM's provisional leaderboard, 91 to 48.
Gemini 3.1 Flash-Lite has the edge for multimodal and grounded tasks in this comparison, averaging 73.2 versus 70.4. GPT-5.5 stays close enough that the answer can still flip depending on your workload.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.