A challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. Designed to be difficult even for skilled non-experts with access to Google.
Year
2023
Tasks
448 questions
Format
Multiple choice questions
Difficulty
Graduate level
GPQA questions are crafted by PhD-level domain experts and validated to be answerable by experts but challenging for non-experts even with internet access. This makes it an excellent test of deep scientific knowledge and reasoning.
GPQA: A Graduate-Level Google-Proof Q&A BenchmarkA challenging dataset of 448 multiple-choice questions written by domain experts in biology, physics, and chemistry. Designed to be difficult even for skilled non-experts with access to Google.
GPT-5.4 by OpenAI currently leads with a score of 97 on GPQA.
88 AI models have been evaluated on GPQA on BenchLM.