A science question-answering benchmark that tests whether models can apply a small open-book set of elementary science facts to multi-step reasoning questions.
Year
2018
Tasks
Elementary science questions
Format
4-way multiple choice
Difficulty
Elementary science reasoning
OpenBookQA was designed to test grounded science reasoning rather than pure memorization. Each question is paired with a core science fact, but models still need additional commonsense knowledge to infer the correct answer.
Version
OpenBookQA 2018
Refresh cadence
Static
Staleness state
Stale
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
A science question-answering benchmark that tests whether models can apply a small open-book set of elementary science facts to multi-step reasoning questions.
No models have been evaluated on OpenBookQA yet.
0 AI models have been evaluated on OpenBookQA on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.