Artificial Analysis Coding Index (AA Coding Index)

Name: Artificial Analysis Coding Index
Creator: BenchLM

A display-only Artificial Analysis coding index.

Benchmark score on AA Coding Index — July 4, 2026

BenchLM mirrors the published score view for AA Coding Index. GPT-5.5 leads the public snapshot at 74.9% , followed by Gemini 3.1 Pro (68.8%) and GLM-5.2 (68.8%). BenchLM does not use these results to rank models overall.

1Closed

GPT-5.5

OpenAI

74.9%

Overall 78Context 1M

2Closed

Gemini 3.1 Pro

Google

68.8%

Overall 88Context 1M

3Open

GLM-5.2

Z.AI

68.8%

Overall 80Context 1M

129 modelsCodingCurrentDisplay onlyUpdated July 4, 2026

The published AA Coding Index snapshot is tightly clustered at the top: GPT-5.5 sits at 74.9%, while the third row is only 6.1 points behind. The broader top-10 spread is 24.8 points, so the benchmark still separates strong models even when the leaders cluster.

129 models have been evaluated on AA Coding Index. The benchmark falls in the Coding category. This category carries a 20% weight in BenchLM.ai's overall scoring system. AA Coding Index is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About AA Coding Index

Year

2026

Tasks

Cross-benchmark coding index

Format

Aggregated model score

Difficulty

Display-only external reference

BenchLM mirrors this coding index for comparison, but does not use it as a weighted coding benchmark row.

Artificial Analysis model leaderboards

BenchLM freshness & provenance

Version

AA Coding Index 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (129 models)

GPT-5.5

OpenAIClosed

74.9%

Gemini 3.1 Pro

GoogleClosed

68.8%

GLM-5.2

Z.AIOpen

68.8%

GPT-5.4

OpenAIClosed

57.3%

Claude Opus 4.8

AnthropicClosed

56.7%

GPT-5.3 Codex

OpenAIClosed

53.1%

Claude Opus 4.7

AnthropicClosed

53.1%

Claude Opus 4.7 (Adaptive)

AnthropicClosed

52.5%

GPT-5.4 mini

OpenAIClosed

51.5%

Qwen3.7 Max

AlibabaClosed

50.1%

GPT-5.2

OpenAIClosed

48.7%

Claude Opus 4.6 (Adaptive)

AnthropicClosed

48.1%

Claude Opus 4.5 Thinking

AnthropicClosed

47.8%

Claude Opus 4.6

AnthropicClosed

47.6%

DeepSeek V4 Pro (Max)

DeepSeekOpen

47.5%

Muse Spark

MetaClosed

47.5%

Kimi K2.6

Moonshot AIOpen

47.1%

Gemini 3 Pro

GoogleClosed

46.5%

Qwen3.7 Plus

AlibabaClosed

46.5%

Claude Sonnet 4.6

AnthropicClosed

46.4%

Kimi K2.7 Code

Moonshot AIOpen

45.6%

MiMo-V2.5-Pro

XiaomiClosed

45.5%

Gemini 3.5 Flash

GoogleClosed

45.0%

Qwen 3.6 Max (preview)

AlibabaClosed

44.9%

GPT-5.1

OpenAIClosed

44.7%

GLM-5

Z.AIOpen

44.2%

GPT-5.4 nano

OpenAIClosed

43.9%

MiniMax M3

MiniMaxOpen

43.4%

GLM-5.1

Z.AIOpen

43.4%

DeepSeek V4 Pro (High)

DeepSeekOpen

43.3%

GPT-5.2-Codex

OpenAIClosed

43.0%

Claude Opus 4.5

AnthropicClosed

42.9%

Qwen3.6 Plus

AlibabaClosed

42.9%

Grok 4.3

xAIClosed

42.3%

MiniMax M2.7

MiniMaxOpen

41.9%

MiMo-V2-Pro

XiaomiClosed

41.4%

Qwen3.5 397B (Reasoning)

AlibabaOpen

41.3%

Grok 4

xAIClosed

40.5%

DeepSeek V4 Flash (High)

DeepSeekOpen

39.8%

Kimi K2.5 (Reasoning)

Moonshot AIClosed

39.5%

Kimi K2.5

Moonshot AIOpen

39.5%

GPT-5 (medium)

OpenAIClosed

39.0%

DeepSeek V4 Flash (Max)

DeepSeekOpen

38.7%

Gemma 4 31B

GoogleOpen

38.7%

OpenAIClosed

38.4%

Gemini 3 Flash

GoogleClosed

37.8%

Nemotron 3 Ultra

NVIDIAOpen

37.5%

Qwen3.5 397B

AlibabaOpen

37.4%

Step 3.7 Flash

StepFunOpen

37.1%

GLM-5-Turbo

Z.AIClosed

36.8%

GPT-5.1-Codex-Max

OpenAIClosed

36.6%

GPT-5.1-Codex

OpenAIClosed

36.6%

Claude 4.1 Opus Thinking

AnthropicClosed

36.5%

Qwen3.6-27B

AlibabaOpen

36.5%

Hy3 Preview

TencentOpen

36.5%

GLM-4.7

Z.AIOpen

36.3%

GLM-5V-Turbo

Z.AIClosed

36.2%

GPT-5 (high)

OpenAIClosed

36.0%

MiMo-V2-Omni

XiaomiClosed

35.5%

Mistral Medium 3.5 128B

MistralOpen

35.4%

Qwen3.6-35B-A3B

AlibabaOpen

35.1%

Qwen3.5-27B

AlibabaOpen

34.9%

Qwen3.5-122B-A10B

AlibabaOpen

34.7%

DeepSeek V3.2

DeepSeekOpen

34.6%

o1-preview

OpenAIClosed

34.0%

Gemini 2.5 Pro

GoogleClosed

31.9%

Grok 4.1 Fast (Reasoning)

xAIClosed

30.9%

Claude 4 Sonnet

AnthropicClosed

30.6%

Qwen3.5-35B-A3B

AlibabaOpen

30.3%

GLM-4.6

Z.AIOpen

30.2%

Gemini 3.1 Flash-Lite

GoogleClosed

30.1%

DeepSeek V3.1 (Reasoning)

DeepSeekOpen

29.7%

Command A+

CohereOpen

29.3%

GPT-OSS 120B

OpenAIOpen

28.6%

DeepSeek V3.1

DeepSeekOpen

28.4%

Grok 4 Fast (Reasoning)

xAIClosed

27.4%

Trinity-Large-Thinking

Arcee AIOpen

27.2%

Trinity-Large-Preview

Arcee AIOpen

27.2%

K-Exaone

LG AI ResearchClosed

27.0%

Qwen3 Max

AlibabaClosed

26.4%

MiMo-V2-Flash

XiaomiOpen

25.8%

Gemma 4 12B

GoogleOpen

24.9%

Mistral Small 4 (Reasoning)

MistralOpen

24.3%

Mistral Small 4

MistralOpen

24.3%

DeepSeek-R1

DeepSeekOpen

24.0%

GLM-4.5-Air

Z.AIClosed

23.8%

Grok Code Fast 1

xAIClosed

23.7%

Gemini 1.5 Pro

GoogleClosed

23.6%

Ling 2.6 Flash

InclusionAIOpen

23.2%

Mistral Large 3

MistralClosed

22.7%

Gemma 4 26B A4B

GoogleOpen

22.4%

Kimi K2

Moonshot AIClosed

22.1%

GPT-4.1

OpenAIClosed

21.8%

GPT-4 Turbo

OpenAIClosed

21.5%

OpenAIClosed

20.5%

Claude 3 Opus

AnthropicClosed

19.5%

Grok 4.1 Fast

xAIClosed

19.5%

GPT-OSS 20B

OpenAIOpen

18.5%

GPT-4.1 mini

OpenAIClosed

18.5%

100

o3-mini

OpenAIClosed

17.9%

101

Gemini 2.5 Flash

GoogleClosed

17.8%

102

GPT-4o

OpenAIClosed

16.7%

103

DeepSeek V3

DeepSeekOpen

16.4%

104

Nemotron 3 Nano 30B

NVIDIAOpen

15.8%

105

Llama 4 Maverick

MetaOpen

15.6%

106

Nemotron 3 Nano Omni 30B A3B

NVIDIAOpen

14.8%

107

Llama 3.1 405B

MetaOpen

14.5%

108

Mistral Large 2

MistralClosed

13.8%

109

Gemma 4 E4B

GoogleOpen

13.7%

110

Mistral Medium 3

MistralClosed

13.6%

111

Nemotron Ultra 253B

NVIDIAOpen

13.1%

112

Solar Pro 2

UpstageClosed

11.3%

113

Phi-4

MicrosoftOpen

11.2%

114

GPT-4.1 nano

OpenAIClosed

11.2%

115

Nova Pro

AmazonClosed

11.0%

116

Sarvam 105B

SarvamOpen

9.8%

117

Gemma 3 27B

GoogleOpen

9.6%

118

Exaone 4.0 32B

LG AI ResearchOpen

9.4%

119

Gemma 4 E2B

GoogleOpen

9.0%

120

Sarvam 30B

SarvamOpen

7.9%

121

Claude 3 Haiku

AnthropicClosed

6.7%

122

Llama 4 Scout

MetaOpen

6.7%

123

LFM2.5-8B-A1B

LiquidAIOpen

5.6%

124

Granite-4.0-1B

IBMOpen

2.9%

125

Granite-4.0-H-1B

IBMOpen

2.7%

126

Exaone 4.0 1.2B

LG AI ResearchOpen

2.5%

127

LFM2.5-VL-1.6B-Extract

LiquidAIOpen

1.0%

128

Granite-4.0-H-350M

IBMOpen

0.6%

129

Granite-4.0-350M

IBMOpen

0.3%

FAQ

What does AA Coding Index measure?

A display-only Artificial Analysis coding index.

Which model scores highest on AA Coding Index?

GPT-5.5 by OpenAI currently leads with a score of 74.9% on AA Coding Index.

How many models are evaluated on AA Coding Index?

129 AI models have been evaluated on AA Coding Index on BenchLM.

Compare Top Models on AA Coding Index

GPT-5.5 vs Gemini 3.1 Pro Gemini 3.1 Pro vs GLM-5.2 GLM-5.2 vs GPT-5.4 GPT-5.4 vs Claude Opus 4.8

Last updated: July 4, 2026 · BenchLM version AA Coding Index 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.