Qwen3.5-35B-A3B Benchmark Scores & Performance

Benchmark analysis of Qwen3.5-35B-A3B by Alibaba across 13 sourced tests on BenchLM.

According to BenchLM.ai, Qwen3.5-35B-A3B ranks #32 out of 97 models with an overall score of 67/100. While not a frontier model, it offers specific advantages depending on the use case.

Qwen3.5-35B-A3B is a open weight model with a 262K token context window. It uses explicit chain-of-thought reasoning, which typically improves performance on math and complex reasoning tasks at the cost of higher latency and token usage.

This profile currently has 13 of 62 tracked benchmarks. BenchLM only exposes verified benchmark rows publicly, so missing categories stay blank until a sourced evaluation is available.

Its strongest category is Knowledge (#8), while its weakest is Agentic (#49). This performance profile makes it particularly effective for knowledge-intensive tasks like research, analysis, and factual Q&A.

Provider

Alibaba

Source Type

Open Weight

Reasoning

Reasoning

Context Window

262K

Model Status

Current

Release Date

Mar 4, 2026

Overall Score

67#32 of 97

Pricing

$0.00 / $0.00

Input / output per 1M

Runtime

N/A

Latency unavailable

Artificial Analysis Snapshot

External frontier-model reference data from Artificial Analysis, updated 2026-03-31. BenchLM uses these signals as a bounded calibration input for coding, agentic, and final display ordering.

Intelligence Index

37.12

Coding Index

30.25

Agentic Index

44.11

Knowledge Benchmarks

MMLU-ProRefreshingDetails
85.3%

MMLU-Pro · Static refresh · updated March 31, 2026

SuperGPQACurrentDetails
63.4%

SuperGPQA 2025 · Quarterly refresh · updated March 31, 2026

GPQARefreshingDetails
84.2%

GPQA Diamond · Static refresh · updated March 31, 2026

Coding Benchmarks

SWE-bench VerifiedRefreshingDetails
69.2%

SWE-bench Verified 2024 · Annual refresh · updated March 31, 2026

LiveCodeBenchCurrentDetails
74.6%

Rolling 2026 set · Rolling refresh · updated March 31, 2026

Reasoning Benchmarks

LongBench v2CurrentDetails
59%

LongBench v2 2025 · Quarterly refresh · updated March 31, 2026

Agentic Benchmarks

Terminal-Bench 2.0CurrentDetails
40.5%

Terminal-Bench 2 · Quarterly refresh · updated March 31, 2026

BrowseCompCurrentDetails
61%

BrowseComp 2026 · Quarterly refresh · updated March 31, 2026

OSWorld-VerifiedCurrentDetails
54.5%

OSWorld Verified · Quarterly refresh · updated March 31, 2026

tau2-benchCurrentDisplay onlyDetails
81.2%

τ²-Bench 2026 · Quarterly refresh · updated March 31, 2026

Multimodal & Grounded Benchmarks

MMMU-ProRefreshingDetails
75.1%

MMMU-Pro 2024 · Annual refresh · updated March 31, 2026

Instruction Following Benchmarks

IFEvalStaleDetails
91.9%

IFEval 2023 · Static refresh · updated March 31, 2026

Multilingual Benchmarks

MMLU-ProXCurrentDetails
81%

MMLU-ProX 2025 · Static refresh · updated March 31, 2026

Frequently Asked Questions

How does Qwen3.5-35B-A3B perform overall in AI benchmarks?

Qwen3.5-35B-A3B ranks #32 out of 97 models with an overall score of 67. It is created by Alibaba and features a 262K context window.

Is Qwen3.5-35B-A3B good for knowledge and understanding?

Qwen3.5-35B-A3B ranks #8 out of 97 models in knowledge and understanding benchmarks with an average score of 79.3. It is among the top performers in this category.

Is Qwen3.5-35B-A3B good for coding and programming?

Qwen3.5-35B-A3B ranks #17 out of 97 models in coding and programming benchmarks with an average score of 68.3. There are stronger options in this category.

Is Qwen3.5-35B-A3B good for reasoning and logic?

Qwen3.5-35B-A3B has visible benchmark coverage in reasoning and logic, but BenchLM does not currently assign it a global category rank there.

Is Qwen3.5-35B-A3B good for agentic tool use and computer tasks?

Qwen3.5-35B-A3B ranks #49 out of 97 models in agentic tool use and computer tasks benchmarks with an average score of 53.8. There are stronger options in this category.

Is Qwen3.5-35B-A3B good for multimodal and grounded tasks?

Qwen3.5-35B-A3B ranks #32 out of 97 models in multimodal and grounded tasks benchmarks with an average score of 75.1. There are stronger options in this category.

Is Qwen3.5-35B-A3B good for instruction following?

Qwen3.5-35B-A3B ranks #14 out of 97 models in instruction following benchmarks with an average score of 91.9. There are stronger options in this category.

Is Qwen3.5-35B-A3B good for multilingual tasks?

Qwen3.5-35B-A3B ranks #34 out of 97 models in multilingual tasks benchmarks with an average score of 81. There are stronger options in this category.

Is Qwen3.5-35B-A3B open source?

Yes, Qwen3.5-35B-A3B is an open weight model created by Alibaba, meaning it can be downloaded and run locally or fine-tuned for specific use cases.

Does Qwen3.5-35B-A3B have full benchmark coverage on BenchLM?

Not yet. Qwen3.5-35B-A3B currently has 13 verified benchmark scores out of the 62 benchmarks BenchLM tracks. BenchLM only exposes verified public benchmark rows, so missing categories stay blank until a sourced evaluation is available.

What is the context window size of Qwen3.5-35B-A3B?

Qwen3.5-35B-A3B has a context window of 262K, which determines how much text it can process in a single interaction.

Last updated: March 31, 2026 · Runtime metrics stay blank until BenchLM has a sourced snapshot.

Weekly LLM Updates

New model releases, benchmark scores, and leaderboard changes. Every Friday.

Free. Your signup is stored with a derived country code for compliance routing.