Top AI models from Chinese labs — DeepSeek, Alibaba Qwen, Zhipu GLM, Moonshot Kimi, and more — ranked by benchmark performance.
Unless noted otherwise, ranking surfaces on this page use BenchLM's provisional leaderboard lane rather than the stricter sourced-only verified leaderboard.
Chinese AI labs have produced some of the strongest models on our leaderboard, especially in math, reasoning, and agentic workflows. DeepSeek models are notable for being open weight while matching proprietary competitors. Alibaba's Qwen series, Zhipu's GLM line, ByteDance Seed, and StepFun now compete directly with GPT and Claude on a growing share of practical benchmarks.
Bottom line: Chinese AI labs produce some of the strongest models — GLM-5 (Reasoning) scores within striking distance of top proprietary APIs. DeepSeek and Qwen are strong open-weight alternatives.
According to BenchLM.ai, Qwen3.7 Max leads this ranking with a score of 91, followed by DeepSeek V4 Pro (Max) (87) and Kimi K2.6 (84). There is meaningful separation between the top models, suggesting genuine performance differences.
The best open-weight option is DeepSeek V4 Pro (Max) (ranked #2 with a score of 87). Open-weight models are highly competitive in this category — self-hosting is a viable alternative to proprietary APIs.
This ranking is based on provisional overall weighted scores across BenchLM.ai's scoring formula tracked by BenchLM.ai. For detailed model profiles, click any model name below. To compare two specific models head-to-head, use the "vs #" links.
Qwen3.7 Max
Alibaba · 1M
DeepSeek V4 Pro (Max)
DeepSeek · 1M
Kimi K2.6
Moonshot AI · 256K
GLM-5 (Reasoning) leads Chinese models — strong math (93), reasoning (88), and agentic (86).
GLM-5.1 Z.AI's latest with strong instruction following (93) and math (89).
Qwen3.5 397B (Reasoning) Alibaba's flagship — top math (92) and coding (85).
Best Chinese model overall?
GLM-5 (Reasoning) — highest score among Chinese labs
Open-weight from China?
DeepSeek R1 or Qwen3.5 — both competitive and free
Coding-focused?
Qwen3.5 397B (Reasoning) — best Chinese coding (85)
Compare with Western models?
See the overall leaderboard for cross-provider rankings
Get notified when models move. One email a week with what changed and why.
Free. No spam. Unsubscribe anytime.
The top model is Qwen3.7 Max by Alibaba with a provisional score of 91.
The best open-weight model is DeepSeek V4 Pro (Max) at position #2.
42 models are included in this ranking.
Chinese models are ranked by the same overall BenchLM score. This page collects models from Chinese labs for easier comparison within this ecosystem.
Some Chinese models have limited English-language benchmark coverage. Models from smaller labs may have sparse data. Regional API availability varies.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.