Skip to main content

OpenBookQA

A science question-answering benchmark that tests whether models can apply a small open-book set of elementary science facts to multi-step reasoning questions.

About OpenBookQA

Year

2018

Tasks

Elementary science questions

Format

4-way multiple choice

Difficulty

Elementary science reasoning

OpenBookQA was designed to test grounded science reasoning rather than pure memorization. Each question is paired with a core science fact, but models still need additional commonsense knowledge to infer the correct answer.

BenchLM freshness & provenance

Version

OpenBookQA 2018

Refresh cadence

Static

Staleness state

Stale

Question availability

Public benchmark set

StaleDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (0 models)

FAQ

What does OpenBookQA measure?

A science question-answering benchmark that tests whether models can apply a small open-book set of elementary science facts to multi-step reasoning questions.

Which model scores highest on OpenBookQA?

No models have been evaluated on OpenBookQA yet.

How many models are evaluated on OpenBookQA?

0 AI models have been evaluated on OpenBookQA on BenchLM.

Last updated: April 20, 2026 · BenchLM version OpenBookQA 2018

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.