Cleaned KMMLU from national technical qualification exams, with errors removed, decontaminated, and deduplicated.
Tasks
~3,500 questions
Format
Technical multiple choice
Difficulty
Industrial/technical
Version
KMMLU-Redux
Refresh cadence
Static
Staleness state
Refreshing
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
Cleaned KMMLU from national technical qualification exams, with errors removed, decontaminated, and deduplicated.
No models have been evaluated on KMMLU-Redux yet.
0 AI models have been evaluated on KMMLU-Redux on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.