Benchmark analysis of Qwen3.6 Plus by Alibaba across 60 sourced tests on BenchLM.
According to BenchLM.ai, Qwen3.6 Plus ranks #27 out of 98 models with an overall score of 69/100. This places it in the mid-tier of AI models, with strengths in specific benchmark categories.
Qwen3.6 Plus is a proprietary model with a 1M token context window. It uses explicit chain-of-thought reasoning, which typically improves performance on math and complex reasoning tasks at the cost of higher latency and token usage.
This profile currently has 60 of 125 tracked benchmarks. BenchLM only exposes verified benchmark rows publicly, so missing categories stay blank until a sourced evaluation is available.
Its strongest category is Instruction Following (#5), while its weakest is Agentic (#32). This performance profile makes it a well-rounded choice across a range of tasks.
Provider
AlibabaSource Type
ProprietaryReasoning
ReasoningContext Window
1M
Model Status
Current
Release Date
Apr 2, 2026Overall Score
Pricing
$0.00 / $0.00
Input / output per 1M
Runtime
N/A
Latency unavailable
MMLU-Redux 2026 · Quarterly refresh · updated April 2, 2026
SWE-bench Verified 2024 · Annual refresh · updated April 2, 2026
SWE Multilingual 2026 · Quarterly refresh · updated April 2, 2026
LiveCodeBench v6 2026 · Quarterly refresh · updated April 2, 2026
HMMT Feb 2025 2025 · Quarterly refresh · updated April 2, 2026
HMMT Nov 2025 2025 · Quarterly refresh · updated April 2, 2026
HMMT Feb 2026 2026 · Quarterly refresh · updated April 2, 2026
MMAnswerBench 2026 · Quarterly refresh · updated April 2, 2026
QwenClawBench 2026 · Quarterly refresh · updated April 2, 2026
QwenWebBench 2026 · Quarterly refresh · updated April 2, 2026
TAU3-Bench 2026 · Quarterly refresh · updated April 2, 2026
VITA-Bench 2025 · Quarterly refresh · updated April 2, 2026
DeepPlanning 2026 · Quarterly refresh · updated April 2, 2026
Toolathlon 2026 · Quarterly refresh · updated April 2, 2026
WideResearch 2026 · Quarterly refresh · updated April 2, 2026
RealWorldQA 2026 · Quarterly refresh · updated April 2, 2026
OmniDocBench 1.5 2026 · Quarterly refresh · updated April 2, 2026
Video-MME (with subtitle) 2026 · Quarterly refresh · updated April 2, 2026
Video-MME (w/o subtitle) 2026 · Quarterly refresh · updated April 2, 2026
MathVision 2026 · Quarterly refresh · updated April 2, 2026
MMLongBench-Doc 2026 · Quarterly refresh · updated April 2, 2026
CountBench 2026 · Quarterly refresh · updated April 2, 2026
RefCOCO (avg) 2026 · Quarterly refresh · updated April 2, 2026
MLVU (M-Avg) 2026 · Quarterly refresh · updated April 2, 2026
ScreenSpot Pro 2025 · Quarterly refresh · updated April 2, 2026
VWT2k-lite 2026 · Quarterly refresh · updated April 2, 2026
Qwen3.6 Plus ranks #27 out of 98 models with an overall score of 69. It is created by Alibaba and features a 1M context window.
Qwen3.6 Plus ranks #31 out of 98 models in knowledge and understanding benchmarks with an average score of 66. There are stronger options in this category.
Qwen3.6 Plus ranks #25 out of 98 models in coding and programming benchmarks with an average score of 64.9. There are stronger options in this category.
Qwen3.6 Plus has visible benchmark coverage in mathematics, but BenchLM does not currently assign it a global category rank there.
Qwen3.6 Plus has visible benchmark coverage in reasoning and logic, but BenchLM does not currently assign it a global category rank there.
Qwen3.6 Plus ranks #32 out of 98 models in agentic tool use and computer tasks benchmarks with an average score of 62. There are stronger options in this category.
Qwen3.6 Plus ranks #21 out of 98 models in multimodal and grounded tasks benchmarks with an average score of 78.8. There are stronger options in this category.
Qwen3.6 Plus ranks #5 out of 98 models in instruction following benchmarks with an average score of 94.3. It is among the top performers in this category.
Qwen3.6 Plus ranks #22 out of 98 models in multilingual tasks benchmarks with an average score of 84.7. There are stronger options in this category.
Not yet. Qwen3.6 Plus currently has 60 verified benchmark scores out of the 125 benchmarks BenchLM tracks. BenchLM only exposes verified public benchmark rows, so missing categories stay blank until a sourced evaluation is available.
Qwen3.6 Plus has a context window of 1M, which determines how much text it can process in a single interaction.
New model releases, benchmark scores, and leaderboard changes. Every Friday.
Free. Your signup is stored with a derived country code for compliance routing.