Coding Benchmarks

Programming and software development - Compare AI models across 1 programming benchmarks including HumanEval, CodeContest, and more.

Filters & Search

Filter models by creator, type, reasoning, or search by name to find the perfect AI model for your needs

Coding Benchmark Results

Showing 25 of 52 models • Click column headers to sort

1
GPT-5 (high)
OpenAI
OpenAIProprietaryReasoning128K7285%
2
o1-preview
OpenAI
OpenAIProprietaryReasoning200K7186%
3
GPT-5 (medium)
OpenAI
OpenAIProprietaryReasoning128K7083%
4
Grok 4
xAI
xAIProprietaryNon-Reasoning128K6979%
5
GPT-5 mini
OpenAI
OpenAIProprietaryReasoning128K6880%
6
o3-pro
OpenAI
OpenAIProprietaryReasoning200K6880%
7
o3
OpenAI
OpenAIProprietaryReasoning200K6778%
8
Qwen2.5-1M
Alibaba
AlibabaOpen WeightNon-Reasoning1M6676%
9
Qwen2.5-72B
Alibaba
AlibabaOpen WeightNon-Reasoning128K6575%
10
o4-mini (high)
OpenAI
OpenAIProprietaryNon-Reasoning200K6574%
11
Gemini 2.5 Pro
Google
GoogleProprietaryNon-Reasoning2M6575%
12
DeepSeek Coder 2.0
DeepSeek
DeepSeekOpen WeightNon-Reasoning128K6482%
13
DeepSeek LLM 2.0
DeepSeek
DeepSeekOpen WeightNon-Reasoning128K6373%
14
Claude 4.1 Opus
Anthropic
AnthropicProprietaryNon-Reasoning200K6168%
15
Claude 4 Sonnet
Anthropic
AnthropicProprietaryNon-Reasoning200K5965%
16
Llama 3.1 405B
Meta
MetaOpen WeightNon-Reasoning128K5862%
17
Mistral Large 2
Mistral
MistralProprietaryNon-Reasoning128K5760%
18
GPT-4o
OpenAI
OpenAIProprietaryNon-Reasoning128K5658%
19
Claude 3.5 Sonnet
Anthropic
AnthropicProprietaryNon-Reasoning200K5557%
20
Gemini 1.5 Pro
Google
GoogleProprietaryNon-Reasoning2M5456%
21
Mistral 8x7B
Mistral
MistralOpen WeightNon-Reasoning32K5255%
22
Gemini 1.0 Pro
Google
GoogleProprietaryNon-Reasoning32K5254%
23
Claude 3 Opus
Anthropic
AnthropicProprietaryNon-Reasoning200K5153%
24
GPT-4 Turbo
OpenAI
OpenAIProprietaryNon-Reasoning128K5052%
25
Llama 3 70B
Meta
MetaOpen WeightNon-Reasoning128K4850%

Showing 25 of 52 models

About Coding Benchmarks

HumanEval

Python programming problems with test cases