Top AI models from Chinese labs — DeepSeek, Alibaba Qwen, Zhipu GLM, Moonshot Kimi, and more — ranked by benchmark performance.
Unless noted otherwise, ranking surfaces on this page use BenchLM's provisional leaderboard lane rather than the stricter sourced-only verified leaderboard.
Chinese AI labs have produced some of the strongest models on our leaderboard, especially in math, reasoning, and agentic workflows. DeepSeek models are notable for being open weight while matching proprietary competitors. Alibaba's Qwen series, Zhipu's GLM line, ByteDance Seed, and StepFun now compete directly with GPT and Claude on a growing share of practical benchmarks.
Bottom line: Chinese AI labs produce some of the strongest models — GLM-5 (Reasoning) scores within striking distance of top proprietary APIs. DeepSeek and Qwen are strong open-weight alternatives.
According to BenchLM.ai, GLM-5.1 leads this ranking with a score of 84, followed by GLM-5 (Reasoning) (84) and Kimi 2.6 (83). The top three are separated by just a few points — any of them would perform well for this use case.
The best open-weight option is GLM-5.1 (ranked #1 with a score of 84). Open-weight models are highly competitive in this category — self-hosting is a viable alternative to proprietary APIs.
This ranking is based on provisional overall weighted scores across BenchLM.ai's scoring formula tracked by BenchLM.ai. For detailed model profiles, click any model name below. To compare two specific models head-to-head, use the "vs #" links.
GLM-5.1
Z.AI · 203K
Z.AI's latest. Best instruction following among Chinese models.
GLM-5 (Reasoning)
Z.AI · 200K
Best Chinese model overall. Strong across math, reasoning, and agentic.
Kimi 2.6
Moonshot AI · 256K
GLM-5 (Reasoning) leads Chinese models — strong math (93), reasoning (88), and agentic (86).
GLM-5.1 Z.AI's latest with strong instruction following (93) and math (89).
Qwen3.5 397B (Reasoning) Alibaba's flagship — top math (92) and coding (85).
Best Chinese model overall?
GLM-5 (Reasoning) — highest score among Chinese labs
Open-weight from China?
DeepSeek R1 or Qwen3.5 — both competitive and free
Coding-focused?
Qwen3.5 397B (Reasoning) — best Chinese coding (85)
Compare with Western models?
See the overall leaderboard for cross-provider rankings
Get notified when models move. One email a week with what changed and why.
Free. No spam. Unsubscribe anytime.
The top model is GLM-5.1 by Z.AI with a provisional score of 84.
The best open-weight model is GLM-5.1 at position #1.
33 models are included in this ranking.
Chinese models are ranked by the same overall BenchLM score. This page collects models from Chinese labs for easier comparison within this ecosystem.
Some Chinese models have limited English-language benchmark coverage. Models from smaller labs may have sparse data. Regional API availability varies.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.