A Chinese-language academic and professional benchmark spanning humanities, social science, STEM, and applied subjects.
As of March 2026, Kimi K2.5 leads the C-Eval leaderboard with 94% , followed by Qwen3.6 Plus (93.3%) and Qwen3.5 397B (93%).
Kimi K2.5
Moonshot AI
Qwen3.6 Plus
Alibaba
Qwen3.5 397B
Alibaba
According to BenchLM.ai, Kimi K2.5 leads the C-Eval benchmark with a score of 94%, followed by Qwen3.6 Plus (93.3%) and Qwen3.5 397B (93%). The top models are clustered within 1.0 points, suggesting this benchmark is nearing saturation for frontier models.
5 models have been evaluated on C-Eval. The benchmark falls in the Knowledge category. This category carries a 12% weight in BenchLM.ai's overall scoring system. C-Eval is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.
Year
2023
Tasks
Chinese academic and professional exams
Format
Multiple choice questions
Difficulty
High school to professional level
C-Eval is one of the clearest public signals for non-English academic knowledge performance. It tests whether a model can sustain strong factual recall and reasoning under Chinese-language exam conditions across many domains.
C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation ModelsVersion
C-Eval 2023
Refresh cadence
Static
Staleness state
Stale
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
A Chinese-language academic and professional benchmark spanning humanities, social science, STEM, and applied subjects.
Kimi K2.5 by Moonshot AI currently leads with a score of 94% on C-Eval.
5 AI models have been evaluated on C-Eval on BenchLM.
Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.
Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.