Reasoning Benchmarks

Logical reasoning and problem solving - Compare AI models across 2 logical reasoning benchmarks including SimpleQA, MuSR, and more.

Filters & Search

Filter models by creator or search by name to find the perfect AI model for your needs

Reasoning Benchmark Results

Showing 15 of 20 models • Click column headers to sort

SimpleQAFactual question answering benchmark
MuSRComplex multi-step reasoning problems
1
GPT-5 (high)
OpenAI
OpenAIClosed-source6986%84%
2
GPT-5 (medium)
OpenAI
OpenAIClosed-source6884%82%
3
Grok 4
xAI
xAIClosed-source6882%80%
4
o3-pro
OpenAI
OpenAIClosed-source6885%83%
5
o3
OpenAI
OpenAIClosed-source6783%81%
6
o4-mini (high)
OpenAI
OpenAIClosed-source6580%78%
7
Gemini 2.5 Pro
Google
GoogleClosed-source6581%79%
8
GPT-5 mini
OpenAI
OpenAIClosed-source6477%75%
9
Claude 4.1 Opus
Anthropic
AnthropicClosed-source6174%72%
10
Claude 4 Sonnet
Anthropic
AnthropicClosed-source5971%69%
11
Llama 3.1 405B
Meta
MetaOpen-source5868%66%
12
Mistral Large 2
Mistral
MistralOpen-source5766%64%
13
GPT-4o
OpenAI
OpenAIClosed-source5664%62%
14
Claude 3.5 Sonnet
Anthropic
AnthropicClosed-source5563%61%
15
Gemini 1.5 Pro
Google
GoogleClosed-source5462%60%

Showing 15 of 20 models

About Reasoning Benchmarks

SimpleQA

Factual question answering benchmark

MuSR

Complex multi-step reasoning problems