Knowledge Benchmarks

General knowledge and factual understanding - Compare AI models across 4 specialized benchmarks including MMLU, ARC-Challenge, HellaSwag, GPQA, and more.

Filters & Search

Filter models by creator, type, reasoning, or search by name to find the perfect AI model for your needs

Knowledge Benchmark Results

Showing 25 of 52 models • Click column headers to sort

1
GPT-5 (high)
OpenAI
OpenAIProprietaryReasoning128K7293%91%89%87%
2
o1-preview
OpenAI
OpenAIProprietaryReasoning200K7192%90%88%86%
3
GPT-5 (medium)
OpenAI
OpenAIProprietaryReasoning128K7091%89%87%85%
4
Grok 4
xAI
xAIProprietaryNon-Reasoning128K6987%86%84%82%
5
GPT-5 mini
OpenAI
OpenAIProprietaryReasoning128K6888%86%84%82%
6
o3-pro
OpenAI
OpenAIProprietaryReasoning200K6888%89%87%85%
7
o3
OpenAI
OpenAIProprietaryReasoning200K6786%87%85%83%
8
Qwen2.5-1M
Alibaba
AlibabaOpen WeightNon-Reasoning1M6684%83%81%79%
9
Qwen2.5-72B
Alibaba
AlibabaOpen WeightNon-Reasoning128K6583%82%80%78%
10
o4-mini (high)
OpenAI
OpenAIProprietaryNon-Reasoning200K6582%82%80%78%
11
Gemini 2.5 Pro
Google
GoogleProprietaryNon-Reasoning2M6583%83%81%79%
12
DeepSeek Coder 2.0
DeepSeek
DeepSeekOpen WeightNon-Reasoning128K6480%79%77%75%
13
DeepSeek LLM 2.0
DeepSeek
DeepSeekOpen WeightNon-Reasoning128K6379%78%76%74%
14
Claude 4.1 Opus
Anthropic
AnthropicProprietaryNon-Reasoning200K6176%76%74%72%
15
Claude 4 Sonnet
Anthropic
AnthropicProprietaryNon-Reasoning200K5973%73%71%69%
16
Llama 3.1 405B
Meta
MetaOpen WeightNon-Reasoning128K5870%70%68%66%
17
Mistral Large 2
Mistral
MistralProprietaryNon-Reasoning128K5768%68%66%64%
18
GPT-4o
OpenAI
OpenAIProprietaryNon-Reasoning128K5666%66%64%62%
19
Claude 3.5 Sonnet
Anthropic
AnthropicProprietaryNon-Reasoning200K5565%65%63%61%
20
Gemini 1.5 Pro
Google
GoogleProprietaryNon-Reasoning2M5464%64%62%60%
21
Mistral 8x7B
Mistral
MistralOpen WeightNon-Reasoning32K5265%64%62%60%
22
Gemini 1.0 Pro
Google
GoogleProprietaryNon-Reasoning32K5262%62%60%58%
23
Claude 3 Opus
Anthropic
AnthropicProprietaryNon-Reasoning200K5161%61%59%57%
24
GPT-4 Turbo
OpenAI
OpenAIProprietaryNon-Reasoning128K5060%60%58%56%
25
Llama 3 70B
Meta
MetaOpen WeightNon-Reasoning128K4858%58%56%54%

Showing 25 of 52 models

About Knowledge Benchmarks

MMLU

Tests knowledge across 57 academic subjects

GPQA

Expert-level questions in biology, physics, and chemistry

SuperGPQA

Enhanced version covering 285 disciplines

OpenBookQA

Multi-step reasoning with scientific facts