Benchmark analysis of Qwen3.5-27B by Alibaba across 13 sourced tests on BenchLM.
According to BenchLM.ai, Qwen3.5-27B ranks #25 out of 97 models with an overall score of 71/100. This places it in the mid-tier of AI models, with strengths in specific benchmark categories.
Qwen3.5-27B is a open weight model with a 262K token context window. It uses explicit chain-of-thought reasoning, which typically improves performance on math and complex reasoning tasks at the cost of higher latency and token usage.
This profile currently has 13 of 62 tracked benchmarks. BenchLM only exposes verified benchmark rows publicly, so missing categories stay blank until a sourced evaluation is available.
Its strongest category is Instruction Following (#4), while its weakest is Agentic (#35). This performance profile makes it a well-rounded choice across a range of tasks.
Provider
AlibabaSource Type
Open WeightReasoning
ReasoningContext Window
262K
Model Status
Current
Release Date
Mar 4, 2026Overall Score
Pricing
$0.00 / $0.00
Input / output per 1M
Runtime
N/A
Latency unavailable
External frontier-model reference data from Artificial Analysis, updated 2026-03-31. BenchLM uses these signals as a bounded calibration input for coding, agentic, and final display ordering.
Intelligence Index
42.07
Coding Index
34.87
Agentic Index
54.61
Qwen3.5-27B ranks #25 out of 97 models with an overall score of 71. It is created by Alibaba and features a 262K context window.
Qwen3.5-27B ranks #6 out of 97 models in knowledge and understanding benchmarks with an average score of 80.6. It is among the top performers in this category.
Qwen3.5-27B ranks #7 out of 97 models in coding and programming benchmarks with an average score of 74.3. It is among the top performers in this category.
Qwen3.5-27B has visible benchmark coverage in reasoning and logic, but BenchLM does not currently assign it a global category rank there.
Qwen3.5-27B ranks #35 out of 97 models in agentic tool use and computer tasks benchmarks with an average score of 59.8. There are stronger options in this category.
Qwen3.5-27B ranks #33 out of 97 models in multimodal and grounded tasks benchmarks with an average score of 75. There are stronger options in this category.
Qwen3.5-27B ranks #4 out of 97 models in instruction following benchmarks with an average score of 95. It is among the top performers in this category.
Qwen3.5-27B ranks #26 out of 97 models in multilingual tasks benchmarks with an average score of 82.2. There are stronger options in this category.
Yes, Qwen3.5-27B is an open weight model created by Alibaba, meaning it can be downloaded and run locally or fine-tuned for specific use cases.
Not yet. Qwen3.5-27B currently has 13 verified benchmark scores out of the 62 benchmarks BenchLM tracks. BenchLM only exposes verified public benchmark rows, so missing categories stay blank until a sourced evaluation is available.
Qwen3.5-27B has a context window of 262K, which determines how much text it can process in a single interaction.
New model releases, benchmark scores, and leaderboard changes. Every Friday.
Free. Your signup is stored with a derived country code for compliance routing.