Artificial Analysis Agentic Index (AA Agentic Index)

Name: Artificial Analysis Agentic Index
Creator: BenchLM

A display-only Artificial Analysis agentic index.

Benchmark score on AA Agentic Index — July 4, 2026

BenchLM mirrors the published score view for AA Agentic Index. Claude Opus 4.8 leads the public snapshot at 77.8% , followed by Claude Opus 4.7 (Adaptive) (71.3%) and Gemini 3.5 Flash (70.3%). BenchLM does not use these results to rank models overall.

1Closed

Claude Opus 4.8

Anthropic

77.8%

Overall 85Context 1M

2Closed

Claude Opus 4.7 (Adaptive)

Anthropic

71.3%

Overall 75Context 1M

3Closed

Gemini 3.5 Flash

Google

70.3%

Overall 81Context 1M

123 modelsAgenticCurrentDisplay onlyUpdated July 4, 2026

The published AA Agentic Index snapshot is tightly clustered at the top: Claude Opus 4.8 sits at 77.8%, while the third row is only 7.5 points behind. The broader top-10 spread is 11.2 points, so the benchmark still separates strong models even when the leaders cluster.

123 models have been evaluated on AA Agentic Index. The benchmark falls in the Agentic category. This category carries a 22% weight in BenchLM.ai's overall scoring system. AA Agentic Index is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About AA Agentic Index

Year

2026

Tasks

Cross-benchmark agentic index

Format

Aggregated model score

Difficulty

Display-only external reference

BenchLM mirrors this agentic index for comparison, but does not use it as a weighted agentic benchmark row.

Artificial Analysis model leaderboards

BenchLM freshness & provenance

Version

AA Agentic Index 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (123 models)

Claude Opus 4.8

AnthropicClosed

77.8%

Claude Opus 4.7 (Adaptive)

AnthropicClosed

71.3%

Gemini 3.5 Flash

GoogleClosed

70.3%

MiniMax M3

MiniMaxOpen

68.6%

GPT-5.4

OpenAIClosed

68.0%

Claude Opus 4.6 (Adaptive)

AnthropicClosed

67.6%

MiMo-V2.5-Pro

XiaomiClosed

67.4%

DeepSeek V4 Pro (Max)

DeepSeekOpen

67.2%

GLM-5.1

Z.AIOpen

67.0%

DeepSeek V4 Pro (High)

DeepSeekOpen

66.7%

Qwen3.7 Max

AlibabaClosed

66.6%

GLM-5-Turbo

Z.AIClosed

66.1%

Kimi K2.6

Moonshot AIOpen

66.0%

Qwen3.7 Plus

AlibabaClosed

65.1%

Qwen 3.6 Max (preview)

AlibabaClosed

64.8%

Claude Opus 4.7

AnthropicClosed

64.6%

Claude Opus 4.6

AnthropicClosed

64.2%

GLM-5

Z.AIOpen

63.1%

Qwen3.6-27B

AlibabaOpen

62.9%

MiMo-V2-Pro

XiaomiClosed

62.8%

DeepSeek V4 Flash (High)

DeepSeekOpen

62.3%

Muse Spark

MetaClosed

62.0%

Kimi K2.7 Code

Moonshot AIOpen

61.9%

Qwen3.6 Plus

AlibabaClosed

61.7%

Claude Sonnet 4.6

AnthropicClosed

61.6%

MiniMax M2.7

MiniMaxOpen

61.5%

DeepSeek V4 Flash (Max)

DeepSeekOpen

61.3%

GLM-5V-Turbo

Z.AIClosed

61.1%

GPT-5.3 Codex

OpenAIClosed

60.5%

GPT-5.2

OpenAIClosed

60.2%

Claude Opus 4.5 Thinking

AnthropicClosed

59.6%

Step 3.7 Flash

StepFunOpen

59.5%

Claude Opus 4.5

AnthropicClosed

59.2%

Kimi K2.5 (Reasoning)

Moonshot AIClosed

58.9%

Kimi K2.5

Moonshot AIOpen

58.9%

GPT-5.4 mini

OpenAIClosed

58.9%

MiMo-V2-Omni

XiaomiClosed

58.6%

Qwen3.6-35B-A3B

AlibabaOpen

58.3%

Nemotron 3 Ultra

NVIDIAOpen

57.1%

GPT-5.2-Codex

OpenAIClosed

56.5%

Qwen3.5 397B (Reasoning)

AlibabaOpen

55.8%

Hy3 Preview

TencentOpen

55.7%

GLM-4.7

Z.AIOpen

55.0%

GPT-5 (high)

OpenAIClosed

54.6%

Qwen3.5-27B

AlibabaOpen

54.6%

Qwen3.5 397B

AlibabaOpen

53.3%

Mistral Medium 3.5 128B

MistralOpen

53.2%

Qwen3.5-122B-A10B

AlibabaOpen

53.0%

Gemini 3 Pro

GoogleClosed

52.0%

GPT-5.1

OpenAIClosed

51.3%

GPT-5.1-Codex-Max

OpenAIClosed

50.7%

GPT-5.1-Codex

OpenAIClosed

50.7%

Grok 4.1 Fast (Reasoning)

xAIClosed

49.3%

GPT-5.4 nano

OpenAIClosed

47.6%

MiMo-V2-Flash

XiaomiOpen

47.3%

GPT-5 (medium)

OpenAIClosed

45.8%

GPT-5.5

OpenAIClosed

44.9%

Qwen3.5-35B-A3B

AlibabaOpen

44.1%

GLM-5.2

Z.AIOpen

43.1%

Qwen3 Max

AlibabaClosed

43.0%

GLM-4.6

Z.AIOpen

42.9%

Trinity-Large-Thinking

Arcee AIOpen

42.6%

Trinity-Large-Preview

Arcee AIOpen

42.6%

Grok 4

xAIClosed

41.5%

Gemma 4 31B

GoogleOpen

40.9%

Command A+

CohereOpen

40.9%

DeepSeek V3.2

DeepSeekOpen

39.8%

Grok 4 Fast (Reasoning)

xAIClosed

39.5%

Claude 4 Sonnet

AnthropicClosed

39.2%

K-Exaone

LG AI ResearchClosed

38.1%

Ling 2.6 Flash

InclusionAIOpen

38.1%

GPT-OSS 120B

OpenAIOpen

37.9%

OpenAIClosed

36.1%

Grok Code Fast 1

xAIClosed

35.6%

Gemini 3 Flash

GoogleClosed

35.0%

Grok 4.1 Fast

xAIClosed

33.0%

Gemini 2.5 Pro

GoogleClosed

32.7%

Gemma 4 26B A4B

GoogleOpen

32.1%

DeepSeek V3.1

DeepSeekOpen

31.9%

OpenAIClosed

31.1%

GPT-OSS 20B

OpenAIOpen

27.6%

GPT-4.1

OpenAIClosed

27.3%

Mistral Small 4 (Reasoning)

MistralOpen

25.9%

Mistral Small 4

MistralOpen

25.9%

Gemini 3.1 Flash-Lite

GoogleClosed

25.7%

GPT-4.1 mini

OpenAIClosed

25.1%

Sarvam 105B

SarvamOpen

24.7%

Gemma 4 12B

GoogleOpen

24.6%

Kimi K2

Moonshot AIClosed

24.3%

Grok 4.3

xAIClosed

24.1%

Nemotron 3 Nano Omni 30B A3B

NVIDIAOpen

23.9%

Mistral Large 3

MistralClosed

21.7%

Gemini 3.1 Pro

GoogleClosed

21.4%

GLM-4.5-Air

Z.AIClosed

21.0%

DeepSeek-R1

DeepSeekOpen

20.8%

DeepSeek V3.1 (Reasoning)

DeepSeekOpen

18.9%

Gemini 2.5 Flash

GoogleClosed

15.0%

Mistral Medium 3

MistralClosed

13.7%

Solar Pro 2

UpstageClosed

12.7%

100

Sarvam 30B

SarvamOpen

11.5%

101

Mistral Large 2

MistralClosed

10.2%

102

DeepSeek V3

DeepSeekOpen

8.8%

103

Nemotron 3 Nano 30B

NVIDIAOpen

8.5%

104

GPT-4o

OpenAIClosed

8.4%

105

Granite-4.0-1B

IBMOpen

7.6%

106

Llama 4 Maverick

MetaOpen

7.2%

107

Claude 3 Haiku

AnthropicClosed

7.0%

108

Gemma 4 E4B

GoogleOpen

6.9%

109

Gemma 4 E2B

GoogleOpen

6.9%

110

Exaone 4.0 1.2B

LG AI ResearchOpen

6.8%

111

Granite-4.0-H-1B

IBMOpen

6.5%

112

Llama 3.1 405B

MetaOpen

6.3%

113

GPT-4.1 nano

OpenAIClosed

5.8%

114

LFM2.5-8B-A1B

LiquidAIOpen

5.4%

115

Llama 4 Scout

MetaOpen

5.2%

116

Granite-4.0-H-350M

IBMOpen

4.9%

117

Nova Pro

AmazonClosed

4.7%

118

Granite-4.0-350M

IBMOpen

4.4%

119

Nemotron Ultra 253B

NVIDIAOpen

3.8%

120

Gemma 3 27B

GoogleOpen

3.5%

121

LFM2.5-VL-1.6B-Extract

LiquidAIOpen

2.8%

122

Exaone 4.0 32B

LG AI ResearchOpen

1.4%

123

Phi-4

MicrosoftOpen

0.0%

FAQ

What does AA Agentic Index measure?

A display-only Artificial Analysis agentic index.

Which model scores highest on AA Agentic Index?

Claude Opus 4.8 by Anthropic currently leads with a score of 77.8% on AA Agentic Index.

How many models are evaluated on AA Agentic Index?

123 AI models have been evaluated on AA Agentic Index on BenchLM.

Compare Top Models on AA Agentic Index

Claude Opus 4.8 vs Claude Opus 4.7 (Adaptive)Claude Opus 4.7 (Adaptive) vs Gemini 3.5 Flash Gemini 3.5 Flash vs MiniMax M3 MiniMax M3 vs GPT-5.4

Last updated: July 4, 2026 · BenchLM version AA Agentic Index 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.