Grok 4.20 Beta Benchmark Scores & Performance

BenchLM is tracking Grok 4.20 Beta by xAI. Benchmark coverage is coming soon.

BenchLM is tracking Grok 4.20 Beta, but sourced benchmark results are not published on the site yet. This page currently shows the model metadata we can verify now, and score-level benchmark coverage will appear once public evaluations land.

Grok 4.20 Beta is a proprietary model with a 2M token context window. It uses explicit chain-of-thought reasoning, which typically improves performance on math and complex reasoning tasks at the cost of higher latency and token usage.

Grok 4.20 Beta sits inside the Grok 4.20 family alongside Grok 4.20 Multi-agent Beta. BenchLM links it directly to Grok 4.1 as the earlier related model in that lineage. This profile currently has 0 sourced benchmarks on BenchLM, so the benchmark sections below are intentionally marked as coming soon.

Creator

xAI

Source Type

Proprietary

Reasoning

Reasoning

Context Window

2M

Overall Score

Coming soon

Family & Lineage

Family

Grok 4.20

beta · Reasoning

Related Earlier Model

Grok 4.1

Rankings Overview

Category rankings are coming soon. BenchLM will populate this section once sourced benchmark results are available for this model.

Frequently Asked Questions

How does Grok 4.20 Beta perform overall in AI benchmarks?

BenchLM is tracking Grok 4.20 Beta, but sourced benchmark coverage is still coming soon. We currently list its creator, model type, and context window while we wait for public benchmark results.

Which sibling models are related to Grok 4.20 Beta?

Grok 4.20 Beta belongs to the Grok 4.20 family. Related variants on BenchLM include Grok 4.20 Multi-agent Beta.

Does Grok 4.20 Beta have full benchmark coverage on BenchLM?

Not yet. Grok 4.20 Beta currently has 0 sourced benchmark scores out of the 32 benchmarks BenchLM tracks, so its overall score is intentionally conservative until more results are added.

What is the context window size of Grok 4.20 Beta?

Grok 4.20 Beta has a context window of 2M, which determines how much text it can process in a single interaction.

Last updated: March 12, 2026

Weekly LLM Updates

New model releases, benchmark scores, and leaderboard changes. Every Friday.

Free. Your signup is stored with a derived country code for compliance routing.