C-Eval (C-Eval)

A Chinese-language academic and professional benchmark spanning humanities, social science, STEM, and applied subjects.

Top Models on C-Eval — March 2026

As of March 2026, Kimi K2.5 leads the C-Eval leaderboard with 94% , followed by Qwen3.6 Plus (93.3%) and Qwen3.5 397B (93%).

5 modelsKnowledgeStaleDisplay onlyUpdated April 2, 2026

According to BenchLM.ai, Kimi K2.5 leads the C-Eval benchmark with a score of 94%, followed by Qwen3.6 Plus (93.3%) and Qwen3.5 397B (93%). The top models are clustered within 1.0 points, suggesting this benchmark is nearing saturation for frontier models.

5 models have been evaluated on C-Eval. The benchmark falls in the Knowledge category. This category carries a 12% weight in BenchLM.ai's overall scoring system. C-Eval is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About C-Eval

Year

2023

Tasks

Chinese academic and professional exams

Format

Multiple choice questions

Difficulty

High school to professional level

C-Eval is one of the clearest public signals for non-English academic knowledge performance. It tests whether a model can sustain strong factual recall and reasoning under Chinese-language exam conditions across many domains.

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models

BenchLM freshness & provenance

Version

C-Eval 2023

Refresh cadence

Static

Staleness state

Stale

Question availability

Public benchmark set

StaleDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Leaderboard (5 models)

#1Kimi K2.5
94%
#2Qwen3.6 Plus
93.3%
#3Qwen3.5 397B
93%
#4GLM-5
92.8%
#5Claude Opus 4.5
92.2%

FAQ

What does C-Eval measure?

A Chinese-language academic and professional benchmark spanning humanities, social science, STEM, and applied subjects.

Which model scores highest on C-Eval?

Kimi K2.5 by Moonshot AI currently leads with a score of 94% on C-Eval.

How many models are evaluated on C-Eval?

5 AI models have been evaluated on C-Eval on BenchLM.

Last updated: April 2, 2026 · BenchLM version C-Eval 2023

Weekly LLM Benchmark Digest

Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.

Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.