Artificial Analysis Omniscience Index (AA-Omniscience Index)

Name: Artificial Analysis Omniscience Index
Creator: BenchLM

A display-only Artificial Analysis factual knowledge index.

Benchmark score on AA-Omniscience Index — July 4, 2026

BenchLM mirrors the published score view for AA-Omniscience Index. Gemini 3.1 Pro leads the public snapshot at 32.9% , followed by Claude Opus 4.8 (27.4%) and Claude Opus 4.7 (Adaptive) (26.2%). BenchLM does not use these results to rank models overall.

1Closed

Gemini 3.1 Pro

Google

32.9%

Overall 88Context 1M

2Closed

Claude Opus 4.8

Anthropic

27.4%

Overall 85Context 1M

3Closed

Claude Opus 4.7 (Adaptive)

Anthropic

26.2%

Overall 75Context 1M

123 modelsKnowledgeCurrentDisplay onlyUpdated July 4, 2026

The published AA-Omniscience Index snapshot is tightly clustered at the top: Gemini 3.1 Pro sits at 32.9%, while the third row is only 6.7 points behind. The broader top-10 spread is 19.4 points, so the benchmark still separates strong models even when the leaders cluster.

123 models have been evaluated on AA-Omniscience Index. The benchmark falls in the Knowledge category. This category carries a 12% weight in BenchLM.ai's overall scoring system. AA-Omniscience Index is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About AA-Omniscience Index

Year

2026

Tasks

Knowledge questions

Format

Index score

Difficulty

Broad factual knowledge

BenchLM stores the AA-Omniscience index as a display-only factuality signal alongside the accuracy and hallucination-rate rows.

AA-Omniscience: Knowledge and Hallucination Benchmark

BenchLM freshness & provenance

Version

AA-Omniscience Index 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (123 models)

Gemini 3.1 Pro

GoogleClosed

32.9%

Claude Opus 4.8

AnthropicClosed

27.4%

Claude Opus 4.7 (Adaptive)

AnthropicClosed

26.2%

Gemini 3.5 Flash

GoogleClosed

22.7%

GPT-5.5

OpenAIClosed

20.1%

Grok 4.3

xAIClosed

18.3%

Gemini 3 Pro

GoogleClosed

15.8%

Claude Opus 4.7

AnthropicClosed

14.2%

Qwen3.7 Max

AlibabaClosed

14.1%

Claude Opus 4.6 (Adaptive)

AnthropicClosed

13.5%

Claude Opus 4.5 Thinking

AnthropicClosed

13.3%

Qwen 3.6 Max (preview)

AlibabaClosed

10.2%

GPT-5.3 Codex

OpenAIClosed

9.9%

Kimi K2.6

Moonshot AIOpen

6.4%

GPT-5.4

OpenAIClosed

5.7%

GPT-5.1

OpenAIClosed

5.6%

MiMo-V2-Pro

XiaomiClosed

4.9%

Muse Spark

MetaClosed

4.1%

GLM-5.2

Z.AIOpen

4.0%

Grok 4

xAIClosed

3.8%

MiMo-V2.5-Pro

XiaomiClosed

3.6%

Claude Opus 4.6

AnthropicClosed

3.5%

Qwen3.6 Plus

AlibabaClosed

2.7%

Qwen3.7 Plus

AlibabaClosed

2.4%

GLM-5

Z.AIOpen

2.0%

GLM-5.1

Z.AIOpen

1.9%

MiniMax M3

MiniMaxOpen

1.4%

MiniMax M2.7

MiniMaxOpen

0.7%

Nemotron 3 Ultra

NVIDIAOpen

-0.8%

GPT-5.2

OpenAIClosed

-1.0%

GPT-5.2-Codex

OpenAIClosed

-2.5%

Claude Sonnet 4.6

AnthropicClosed

-2.9%

Gemini 3 Flash

GoogleClosed

-3.6%

Claude Opus 4.5

AnthropicClosed

-3.9%

Command A+

CohereOpen

-4.0%

GPT-5.1-Codex-Max

OpenAIClosed

-6.0%

GPT-5.1-Codex

OpenAIClosed

-6.0%

GPT-5 (high)

OpenAIClosed

-8.1%

Kimi K2.5 (Reasoning)

Moonshot AIClosed

-8.1%

Kimi K2.5

Moonshot AIOpen

-8.1%

Claude 4 Sonnet

AnthropicClosed

-9.2%

DeepSeek V4 Pro (High)

DeepSeekOpen

-9.7%

DeepSeek V4 Pro (Max)

DeepSeekOpen

-10.0%

GPT-5 (medium)

OpenAIClosed

-10.1%

OpenAIClosed

-10.5%

GPT-4o

OpenAIClosed

-10.7%

Kimi K2.7 Code

Moonshot AIOpen

-10.7%

Gemini 2.5 Pro

GoogleClosed

-14.3%

GLM-5-Turbo

Z.AIClosed

-15.1%

OpenAIClosed

-15.3%

Gemini 3.1 Flash-Lite

GoogleClosed

-15.5%

Llama 3.1 405B

MetaOpen

-17.3%

MiMo-V2-Omni

XiaomiClosed

-17.4%

GPT-5.4 mini

OpenAIClosed

-18.7%

GLM-5V-Turbo

Z.AIClosed

-19.0%

Qwen3.6-27B

AlibabaOpen

-19.8%

Gemma 4 E4B

GoogleOpen

-20.0%

Qwen3.6-35B-A3B

AlibabaOpen

-21.4%

DeepSeek V4 Flash (High)

DeepSeekOpen

-22.3%

DeepSeek V4 Flash (Max)

DeepSeekOpen

-22.9%

Gemma 4 E2B

GoogleOpen

-24.0%

DeepSeek-R1

DeepSeekOpen

-27.1%

Kimi K2

Moonshot AIClosed

-27.5%

DeepSeek V3.1 (Reasoning)

DeepSeekOpen

-28.4%

Grok 4 Fast (Reasoning)

xAIClosed

-28.4%

Grok 4.1 Fast (Reasoning)

xAIClosed

-28.7%

GPT-5.4 nano

OpenAIClosed

-29.5%

Qwen3.5 397B (Reasoning)

AlibabaOpen

-29.8%

Mistral Small 4 (Reasoning)

MistralOpen

-29.9%

Mistral Small 4

MistralOpen

-29.9%

Mistral Medium 3

MistralClosed

-31.5%

GLM-4.6

Z.AIOpen

-31.6%

LFM2.5-8B-A1B

LiquidAIOpen

-33.3%

Mistral Large 2

MistralClosed

-34.0%

GLM-4.7

Z.AIOpen

-34.6%

Hy3 Preview

TencentOpen

-34.6%

Grok Code Fast 1

xAIClosed

-36.0%

Qwen3.5 397B

AlibabaOpen

-36.1%

GPT-4.1

OpenAIClosed

-36.2%

Mistral Medium 3.5 128B

MistralOpen

-36.3%

Step 3.7 Flash

StepFunOpen

-37.5%

Mistral Large 3

MistralClosed

-39.4%

Qwen3.5-122B-A10B

AlibabaOpen

-39.6%

DeepSeek V3.1

DeepSeekOpen

-41.1%

DeepSeek V3

DeepSeekOpen

-41.3%

Llama 4 Maverick

MetaOpen

-41.8%

Qwen3.5-27B

AlibabaOpen

-42.0%

Gemini 2.5 Flash

GoogleClosed

-42.0%

Qwen3 Max

AlibabaClosed

-43.1%

Trinity-Large-Thinking

Arcee AIOpen

-44.2%

Trinity-Large-Preview

Arcee AIOpen

-44.2%

Gemma 4 31B

GoogleOpen

-45.4%

Nemotron Ultra 253B

NVIDIAOpen

-45.5%

Qwen3.5-35B-A3B

AlibabaOpen

-46.4%

DeepSeek V3.2

DeepSeekOpen

-46.7%

Claude 3 Haiku

AnthropicClosed

-47.6%

Nova Pro

AmazonClosed

-47.6%

Gemma 4 26B A4B

GoogleOpen

-48.1%

MiMo-V2-Flash

XiaomiOpen

-48.5%

100

GPT-OSS 120B

OpenAIOpen

-50.0%

101

GPT-4.1 mini

OpenAIClosed

-50.1%

102

Grok 4.1 Fast

xAIClosed

-50.9%

103

Gemma 4 12B

GoogleOpen

-51.9%

104

Llama 4 Scout

MetaOpen

-52.4%

105

Nemotron 3 Nano Omni 30B A3B

NVIDIAOpen

-56.0%

106

GPT-4.1 nano

OpenAIClosed

-56.4%

107

Phi-4

MicrosoftOpen

-56.7%

108

K-Exaone

LG AI ResearchClosed

-57.9%

109

Sarvam 105B

SarvamOpen

-59.5%

110

Solar Pro 2

UpstageClosed

-61.7%

111

Exaone 4.0 32B

LG AI ResearchOpen

-62.3%

112

GLM-4.5-Air

Z.AIClosed

-62.5%

113

GPT-OSS 20B

OpenAIOpen

-63.9%

114

Ling 2.6 Flash

InclusionAIOpen

-65.7%

115

Gemma 3 27B

GoogleOpen

-65.9%

116

Nemotron 3 Nano 30B

NVIDIAOpen

-69.2%

117

Sarvam 30B

SarvamOpen

-72.0%

118

Granite-4.0-350M

IBMOpen

-72.1%

119

Granite-4.0-H-1B

IBMOpen

-73.6%

120

Granite-4.0-1B

IBMOpen

-81.8%

121

Exaone 4.0 1.2B

LG AI ResearchOpen

-82.6%

122

LFM2.5-VL-1.6B-Extract

LiquidAIOpen

-83.9%

123

Granite-4.0-H-350M

IBMOpen

-87.2%

FAQ

What does AA-Omniscience Index measure?

A display-only Artificial Analysis factual knowledge index.

Which model scores highest on AA-Omniscience Index?

Gemini 3.1 Pro by Google currently leads with a score of 32.9% on AA-Omniscience Index.

How many models are evaluated on AA-Omniscience Index?

123 AI models have been evaluated on AA-Omniscience Index on BenchLM.

Compare Top Models on AA-Omniscience Index

Gemini 3.1 Pro vs Claude Opus 4.8 Claude Opus 4.8 vs Claude Opus 4.7 (Adaptive)Claude Opus 4.7 (Adaptive) vs Gemini 3.5 Flash Gemini 3.5 Flash vs GPT-5.5

Last updated: July 4, 2026 · BenchLM version AA-Omniscience Index 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.