A benchmark that evaluates the ability of language models to answer short, fact-seeking questions accurately. Focuses on factual correctness rather than reasoning complexity.
As of May 13, 2026, DeepSeek V4 Pro (Max) leads the SimpleQA leaderboard with 57.9% , followed by DeepSeek V4 Pro Base (55.2%) and DeepSeek V4 Pro (High) (46.2%).
DeepSeek V4 Pro (Max)
DeepSeek
DeepSeek V4 Pro Base
DeepSeek
DeepSeek V4 Pro (High)
DeepSeek
According to BenchLM.ai, DeepSeek V4 Pro (Max) leads the SimpleQA benchmark with a score of 57.9%, followed by DeepSeek V4 Pro Base (55.2%) and DeepSeek V4 Pro (High) (46.2%). There is significant spread across the leaderboard, making this benchmark effective at differentiating model capabilities.
8 models have been evaluated on SimpleQA. The benchmark falls in the Knowledge category. This category carries a 12% weight in BenchLM.ai's overall scoring system. Within that category, SimpleQA contributes 13% of the category score, so strong performance here directly affects a model's overall ranking.
Year
2024
Tasks
Factual questions
Format
Short-form Q&A
Difficulty
Factual accuracy focused
SimpleQA prioritizes two key properties: questions should have short, factual answers that can be easily verified, and questions should be diverse and challenging. It serves as a crucial test of factual knowledge and accuracy.
Version
SimpleQA 2024
Refresh cadence
Annual
Staleness state
Refreshing
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
A benchmark that evaluates the ability of language models to answer short, fact-seeking questions accurately. Focuses on factual correctness rather than reasoning complexity.
DeepSeek V4 Pro (Max) by DeepSeek currently leads with a score of 57.9% on SimpleQA.
8 AI models have been evaluated on SimpleQA on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.