SuperGPQA: Scaling LLM Evaluation Across 285 Graduate Disciplines (SuperGPQA)

An expanded version of GPQA that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines, providing comprehensive coverage of academic domains.

About SuperGPQA

Year

2025

Tasks

285 disciplines

Format

Multiple choice questions

Difficulty

Graduate level

SuperGPQA significantly expands the scope of graduate-level evaluation by covering 285 disciplines compared to GPQA's focus on 3 subjects. It maintains the same rigorous standards while providing broader coverage of academic knowledge.

SuperGPQA: Scaling LLM Evaluation Across 285 Graduate Disciplines

Leaderboard (88 models)

#1GPT-5.4
95
#2Gemini 3.1 Pro
95
#3Claude Opus 4.6
95
#4GPT-5.3 Codex
95
#5Grok 4.1
95
#6GPT-5.2
95
#7GPT-5.2-Codex
95
#9Claude Sonnet 4.6
95
#10Claude Opus 4.5
95
#11Gemini 3 Pro
95
#13GPT-5.1
93
#14GLM-5 (Reasoning)
92
#15Claude Sonnet 4.5
91
#17GPT-5 (high)
89
#18o1-preview
88
#19Kimi K2.5 (Reasoning)
88
#20GPT-5 (medium)
87
#21o3-pro
87
#23o3
85
#24GPT-5 mini
84
#25Grok 4
84
#26GLM-5
84
#28GLM-4.7
82
#29Qwen2.5-1M
81
#30Gemini 2.5 Pro
81
#31DeepSeek V3.2
81
#32Qwen2.5-72B
80
#33o4-mini (high)
80
#34Qwen3.5 397B
80
#35DeepSeek Coder 2.0
77
#36DeepSeekMath V2
77
#37DeepSeek LLM 2.0
76
#38MiMo-V2-Flash
76
#39Claude 4.1 Opus
74
#40Kimi K2.5
74
#41Mistral Large 3
73
#42Claude 4 Sonnet
71
#44MiniMax M2.5
70
#46Gemini 3 Flash
67
#47Mistral Large 2
66
#48Claude Haiku 4.5
65
#49GPT-4o
64
#50Claude 3.5 Sonnet
63
#51GLM-4.7-Flash
63
#52Mistral 8x7B
62
#53Gemini 1.5 Pro
62
#56Gemini 1.0 Pro
60
#58Claude 3 Opus
59
#59GPT-4 Turbo
58
#60Llama 3 70B
56
#61Claude 3 Haiku
54
#63Nemotron-4 15B
51
#64Moonshot v1
50
#65Z-1
49
#66GPT-OSS 120B
48
#67Gemini 2.5 Flash
47
#70Llama 4 Scout
44
#72Gemma 3 27B
42
#73DeepSeek-R1
41
#74Qwen2.5-VL-32B
40
#76Nova Pro
38
#78Qwen3 235B 2507
36
#80GLM-4.5
34
#81MiniMax M1 80k
33
#82GLM-4.5-Air
32
#84DeepSeek V3.1
30
#85Kimi K2
29
#86GPT-OSS 20B
28
#87Mistral 7B v0.3
27
#88Mistral 8x7B v0.2
26

FAQ

What does SuperGPQA measure?

An expanded version of GPQA that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines, providing comprehensive coverage of academic domains.

Which model scores highest on SuperGPQA?

GPT-5.4 by OpenAI currently leads with a score of 95 on SuperGPQA.

How many models are evaluated on SuperGPQA?

88 AI models have been evaluated on SuperGPQA on BenchLM.