An expanded version of GPQA that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines, providing comprehensive coverage of academic domains.
Year
2025
Tasks
285 disciplines
Format
Multiple choice questions
Difficulty
Graduate level
SuperGPQA significantly expands the scope of graduate-level evaluation by covering 285 disciplines compared to GPQA's focus on 3 subjects. It maintains the same rigorous standards while providing broader coverage of academic knowledge.
SuperGPQA: Scaling LLM Evaluation Across 285 Graduate DisciplinesAn expanded version of GPQA that evaluates graduate-level knowledge and reasoning capabilities across 285 disciplines, providing comprehensive coverage of academic domains.
GPT-5.4 by OpenAI currently leads with a score of 95 on SuperGPQA.
88 AI models have been evaluated on SuperGPQA on BenchLM.