Skip to main content

CAIS AI Dashboard Text Capabilities Index (CAIS Text Leaderboard)

A Center for AI Safety dashboard view summarizing text capabilities across HLE, ARC-AGI-2, SWE-Bench Pro, and TextQuests.

How BenchLM shows the CAIS Text Leaderboard

BenchLM mirrors the CAIS AI Dashboard text-capability view as a simple average over hle, arc_agi_2, swebench_pro, textquests. The source dashboard publishes the component benchmark scores and model metadata used here.

The CAIS Text Leaderboard is display only on BenchLM. It is a composite dashboard view rather than a single benchmark-native task set, so BenchLM keeps it out of weighted rankings.

25 mirrored rows4 text componentsCAIS AI DashboardComposite scoreDisplay only

Text average on CAIS Text Leaderboard — June 2026 dashboard snapshot

BenchLM mirrors the published text average view for CAIS Text Leaderboard. GPT-5.5 leads the public snapshot at 54.1% , followed by Opus 4.8 (53.8%) and Gemini 3.1 Pro (52.9%). BenchLM does not use these results to rank models overall.

25 modelsExternal benchmark mirrorsCurrentDisplay onlyUpdated June 2026 dashboard snapshot

The published CAIS Text Leaderboard snapshot is tightly clustered at the top: GPT-5.5 sits at 54.1%, while the third row is only 1.2 points behind. The broader top-10 spread is 18.5 points, so the benchmark still separates strong models even when the leaders cluster.

25 models have been evaluated on CAIS Text Leaderboard. The benchmark falls in the External benchmark mirrors category. BenchLM tracks this category separately from its weighted global scoring system, so these results are best compared on the dedicated Korean benchmark views. CAIS Text Leaderboard is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About CAIS Text Leaderboard

Year

2025

Tasks

HLE, ARC-AGI-2, SWE-Bench Pro, and TextQuests

Format

Average component score

Difficulty

Composite frontier text capability

BenchLM mirrors the text-capability portion of the CAIS AI Dashboard as a display-only composite. The displayed score is the average of the public HLE, ARC-AGI-2, SWE-Bench Pro, and TextQuests component scores.

BenchLM freshness & provenance

Version

CAIS Text Leaderboard 2025

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Text average table (25 models)

1
GPT-5.5gpt-5.5-high
54.1%
2
Opus 4.8opus-4-8-adaptive-64k
53.8%
3
Gemini 3.1 Progemini-3.1-pro-preview-high
52.9%
4
GPT-5.4gpt-5.4-high
49.3%
5
Gemini 3.5 Flashgemini-3.5-flash
48.8%
6
Opus 4.7opus-4-7-adaptive-64k
46.9%
7
Opus 4.6opus-4-6-adaptive-64k
44.0%
8
Gemini 3 Progemini-3-pro-preview-high
38.4%
9
Opus 4.5opus-4-5-thinking-32k
36.6%
10
Gemini 3 Flashgemini-3-flash-preview-high
35.6%
11
GPT-5.2gpt-5.2-high
33.8%
12
Sonnet 4.6sonnet-4-6-adaptive-64k
32.6%
13
Grok 4.2grok-4-2
32.5%
14
DeepSeek 4 Prodeepseek-v4-pro
32.1%
15
Kimi K2.6kimi-k2.6-64k
31.4%
16
GLM 5.1glm-5.1
29.8%
17
GPT-5.1gpt-5.1-high
29.0%
18
Kimi K2.5kimi-k2.5-thinking
26.1%
19
Sonnet 4.5sonnet-4-5-thinking-32k
25.4%
20
Grok 4.3grok-4-3
24.7%
21
GPT-5.4-minigpt-5.4-mini-high
24.2%
22
GPT-5gpt-5-high
20.9%
23
Grok 4grok-4
20.8%
24
o3o3-high
20.5%
25
DeepSeek 3.2deepseek-v3.2-thinking
20.3%

FAQ

What does CAIS Text Leaderboard measure?

A Center for AI Safety dashboard view summarizing text capabilities across HLE, ARC-AGI-2, SWE-Bench Pro, and TextQuests.

Which model leads the published CAIS Text Leaderboard snapshot?

GPT-5.5 currently leads the published CAIS Text Leaderboard snapshot with 54.1% text average. BenchLM shows this benchmark for display only and does not use it in overall rankings.

How many models are evaluated on CAIS Text Leaderboard?

25 AI models are included in BenchLM's mirrored CAIS Text Leaderboard snapshot, based on the public leaderboard captured on June 2026 dashboard snapshot.

Last updated: June 2026 dashboard snapshot · mirrored from the public benchmark leaderboard

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.