Model profile

Qwen3.6-35B-A3B

Name: Qwen3.6-35B-A3B
Author: Alibaba

AlibabaCurrentReleased Apr 15, 2026

Data verified July 15, 2026

Overall Score

59Prov. #45 of 78Verified #31 of 32

Arena Elo

Not listed

Categories Ranked

3of 8

Price (1M tokens)

Not listedAPI pricing

Speed

Not listed

Context

262K

Evidence coverage

58 of 300 tracked benchmarks are published. 41 are verified and 17 provisional. 7 of 8 categories are measured.

Updated July 15, 2026Methodology

Published / tracked: 58 / 300
Verified: 41
Provisional: 17
Categories measured: 7 / 8

Agentic15 benchmarks
Mixed evidence
Coding9 benchmarks
Mixed evidence
Reasoning2 benchmarks
Reported
Knowledge11 benchmarks
Mixed evidence
Math5 benchmarks
Verified
Multilingual0 benchmarks
Not measured
Multimodal15 benchmarks
Mixed evidence
Inst. Following1 benchmark
Reported

Open WeightSelf-hostReasoning

Confidence:

High

base

According to BenchLM.ai, Qwen3.6-35B-A3B ranks #45 out of 78 models on the provisional leaderboard with an overall score of 59/100. It also ranks #31 out of 32 on the verified leaderboard. While not a frontier model, it offers specific advantages depending on the use case.

Qwen3.6-35B-A3B is a open weight model with a 262K token context window. It uses explicit chain-of-thought reasoning, which typically improves performance on math and complex reasoning tasks at the cost of higher latency and token usage.

This profile currently has 58 of 300 tracked benchmarks. BenchLM only exposes non-generated benchmark rows publicly, so missing categories stay blank until a sourced evaluation is available.

Its strongest category is Coding (#27), while its weakest is Knowledge (#49). This performance profile makes it particularly well-suited for software development and code generation tasks.

Peer position

Exact provisional scores and ranks for the closest listed peers.

Claude Sonnet 4.5
Anthropic
#4359.0
Claude Sonnet 4.5 is #43 with a score of 59.0.
Compare
Qwen3.5 397B
Alibaba
#4459.0
Qwen3.5 397B is #44 with a score of 59.0.
Compare
Qwen3.6-35B-A3BCurrent model
Alibaba
#4559.0
Qwen3.6-35B-A3B is #45 with a score of 59.0.
Qwen3.5-122B-A10B
Alibaba
#4659.0
Qwen3.5-122B-A10B is #46 with a score of 59.0.
Compare
Gemini 2.5 Pro
Google
#4759.0
Gemini 2.5 Pro is #47 with a score of 59.0.
Compare
Qwen3.5-27B
Alibaba
#4859.0
Qwen3.5-27B is #48 with a score of 59.0.
Compare
GLM-5V-Turbo
Z.AI
Unranked59.0
GLM-5V-Turbo is Unranked with a score of 59.0.
Compare

Category percentile

Relative position among models eligible for each sourced category. A higher percentile means a stronger position within that category's ranked cohort; 100 is highest.

Coding72%
Eligible cohort rank #27 of 93Category score 71.8
Multimodal56%
Eligible cohort rank #49 of 111Category score 61.8
Knowledge55%
Eligible cohort rank #49 of 107Category score 59.5

Category evidence

Scores and ranks appear only where this model has published benchmark evidence. Categories without displayable source records remain not measured.

Category scores, ranks, weighting, benchmark coverage, and evidence status
Category	Score	Rank	Percentile	Weight	Benchmarks	Evidence
AgenticRank Not rankedWeight 22%15 benchmarksMixed sources	44.7	Not ranked	Not available	22%	15 benchmarks	Mixed sources
CodingRank #27 of 93Percentile 72ndWeight 20%9 benchmarksMixed sources	71.8	#27 of 93	72nd	20%	9 benchmarks	Mixed sources
ReasoningRank Not rankedWeight 17%2 benchmarksReported	0.0	Not ranked	Not available	17%	2 benchmarks	Reported
KnowledgeRank #49 of 107Percentile 55thWeight 12%11 benchmarksMixed sources	59.5	#49 of 107	55th	12%	11 benchmarks	Mixed sources
MathRank Not rankedWeight 5%5 benchmarksVerified	71.8	Not ranked	Not available	5%	5 benchmarks	Verified
MultilingualWeight 7%0 benchmarksNot measured	Not measured	Not ranked	Not available	7%	0 benchmarks	Not measured
MultimodalRank #49 of 111Percentile 56thWeight 12%15 benchmarksMixed sources	61.8	#49 of 111	56th	12%	15 benchmarks	Mixed sources
Inst. FollowingRank Not rankedWeight 5%1 benchmarkReported	0.0	Not ranked	Not available	5%	1 benchmark	Reported

Benchmark Details

Rows below have a displayable published verification record. Each source link and provenance note remains in the page HTML while its category is closed. Source-unverified manual rows and generated rows stay hidden.

Agentic15 benchmarks

Terminal-Bench 2.0Provider exact

51.5%Weighted 38%

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

Claw-EvalProvider exact

68.7%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

QwenClawBenchProvider exact

52.6%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

QwenWebBenchProvider exact

1397Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

τ³-bench resultsProvider exact

τ³-Bench Tool-Agent-User Evaluation

67.2%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

VITA-BenchProvider exact

35.6%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

DeepPlanningProvider exact

25.9%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

ToolathlonProvider exact

26.9%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

MCP AtlasProvider exact

62.8%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

WideResearchProvider exact

60.1%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

AA Agentic IndexReported

Artificial Analysis Agentic Index

21.4%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

τ²-bench resultsReported

τ²-Bench Tool-Agent-User Evaluation

95.3%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

GDPval-AAReported

GDPval-AA normalized

27.4%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

GDPval-AAReported

1049Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Gert LabsBenchmark exact

Gert Labs Composite Game Benchmark

42.65%Display only

Source: Gert Labs rankingsProvenance: Gert Labs reports this composite leaderboard score in the public rankings API. BenchLM scales the source gscore from 0-1 to 0-100 and stores it as a display-only agentic benchmark.

Coding9 benchmarks

LiveCodeBenchProvider exact

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

80.4%Weighted 38%

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

SWE-bench VerifiedProvider exact

Software Engineering Benchmark Verified

73.4%Weighted 16%

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

SWE-bench ProProvider exact

49.5%Weighted 10%

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

SWE MultilingualProvider exact

67.2%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

Terminal-Bench 2.0Provider exact

51.5%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

NL2RepoProvider exact

29.4%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

AA Coding IndexReported

Artificial Analysis Coding Index

41.9%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Terminal-Bench HardReported

34.8%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-SciCodeReported

Artificial Analysis SciCode

35.8%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Reasoning2 benchmarks

AA-LCRReported

Artificial Analysis Long Context Reasoning

63.7%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

CritPtReported

Critical Physics Tasks

0.3%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Knowledge11 benchmarks

HLEProvider exact

Humanity's Last Exam

21.4%Weighted 32%

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

MMLU-ProProvider exact

Massive Multitask Language Understanding Professional

85.2%Weighted 22%

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

SuperGPQAProvider exact

SuperGPQA: Scaling LLM Evaluation Across 285 Graduate Disciplines

64.7%Weighted 5%

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

GPQAProvider exact

Graduate-Level Google-Proof Q&A

86%Weighted 5%

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

C-EvalProvider exact

90%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

Artificial Analysis Intelligence IndexReported

31.6%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-GPQA DiamondReported

Artificial Analysis GPQA Diamond

84.1%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-HLEReported

Artificial Analysis Humanity's Last Exam

20.2%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-Omniscience IndexReported

Artificial Analysis Omniscience Index

-21.4%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-Omniscience AccuracyReported

Artificial Analysis Omniscience Accuracy

18.9%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-Omniscience Hallucination RateReported

Artificial Analysis Omniscience Hallucination Rate

49.7%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Math5 benchmarks

HMMT Feb 2026Provider exact

Harvard-MIT Mathematics Tournament February 2026

83.6%Weighted 25%

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

AIME26Provider exact

AIME 2026

92.7%Weighted 25%

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

HMMT Feb 2025Provider exact

Harvard-MIT Mathematics Tournament February 2025

90.7%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

HMMT Nov 2025Provider exact

Harvard-MIT Mathematics Tournament November 2025

89.1%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

MMAnswerBenchProvider exact

78.9%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

Multimodal15 benchmarks

MMMU-ProProvider exact

Massive Multi-discipline Multimodal Understanding Pro

75.3%Weighted 45%

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

CharXivProvider exact

CharXiv Reasoning

78%Weighted 25%

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

MMMUProvider exact

Massive Multi-discipline Multimodal Understanding

81.7%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

RealWorldQAProvider exact

85.3%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

OmniDocBench 1.5Provider exact

89.9%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

SimpleVQAProvider exact

58.9%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

CC-OCRProvider exact

81.9%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

AI2D_TESTProvider exact

AI2D test split

92.7%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

RefCOCO (avg)Provider exact

RefCOCO average

92.0%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

ODINW13Provider exact

50.8%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

Video-MME (with subtitle)Provider exact

Video-MME with subtitle

86.6%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

Video-MME (w/o subtitle)Provider exact

Video-MME without subtitle

82.5%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

VideoMMMUProvider exact

83.7%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

MLVU (M-Avg)Provider exact

MLVU mean average

86.2%Display only

Source: Qwen3.6-35B-A3B model cardProvenance: Provider exact

AA-MMMU-ProReported

Artificial Analysis MMMU-Pro

75.0%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Inst. Following1 benchmark

AA-IFBenchReported

Artificial Analysis IFBench

64.4%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Frequently Asked Questions

How does Qwen3.6-35B-A3B perform overall in AI benchmarks?

Qwen3.6-35B-A3B currently ranks #45 out of 78 models on BenchLM's provisional leaderboard with an overall score of 59. It also ranks #31 out of 32 on the verified leaderboard. It is created by Alibaba and features a 262K context window.

Is Qwen3.6-35B-A3B good for knowledge and understanding?

Qwen3.6-35B-A3B ranks #49 out of 78 models in knowledge and understanding benchmarks with an average score of 59.5. There are stronger options in this category.

Is Qwen3.6-35B-A3B good for coding and programming?

Qwen3.6-35B-A3B ranks #27 out of 78 models in coding and programming benchmarks with an average score of 71.8. There are stronger options in this category.

Is Qwen3.6-35B-A3B good for mathematics?

Qwen3.6-35B-A3B has visible benchmark coverage in mathematics, but BenchLM does not currently assign it a global category rank there.

Is Qwen3.6-35B-A3B good for reasoning and logic?

Qwen3.6-35B-A3B has visible benchmark coverage in reasoning and logic, but BenchLM does not currently assign it a global category rank there.

Is Qwen3.6-35B-A3B good for agentic tool use and computer tasks?

Qwen3.6-35B-A3B has visible benchmark coverage in agentic tool use and computer tasks, but BenchLM does not currently assign it a global category rank there.

Is Qwen3.6-35B-A3B good for multimodal and grounded tasks?

Qwen3.6-35B-A3B ranks #49 out of 78 models in multimodal and grounded tasks benchmarks with an average score of 61.8. There are stronger options in this category.

Is Qwen3.6-35B-A3B good for instruction following?

Qwen3.6-35B-A3B has visible benchmark coverage in instruction following, but BenchLM does not currently assign it a global category rank there.

Is Qwen3.6-35B-A3B open source?

Yes, Qwen3.6-35B-A3B is an open weight model created by Alibaba, meaning it can be downloaded and run locally or fine-tuned for specific use cases.

Does Qwen3.6-35B-A3B have full benchmark coverage on BenchLM?

Not yet. Qwen3.6-35B-A3B currently has 58 published benchmark scores out of the 300 benchmarks BenchLM tracks. BenchLM only exposes non-generated public benchmark rows, so missing categories stay blank until a sourced evaluation is available.

What is the context window size of Qwen3.6-35B-A3B?

Qwen3.6-35B-A3B has a context window of 262K, which determines how much text it can process in a single interaction.

Related Resources

Last updated: July 15, 2026 · Runtime metrics stay blank until BenchLM has a sourced snapshot.

Don't miss the next GPT moment

Which models moved up, what is new, and what it costs. One email each week.

Free. One email per week.

Qwen3.6-35B-A3B

Evidence coverage

Evidence by category

Peer position

Category percentile

Category evidence

Benchmark Details

Frequently Asked Questions

How does Qwen3.6-35B-A3B perform overall in AI benchmarks?

Is Qwen3.6-35B-A3B good for knowledge and understanding?

Is Qwen3.6-35B-A3B good for coding and programming?

Is Qwen3.6-35B-A3B good for mathematics?

Is Qwen3.6-35B-A3B good for reasoning and logic?

Is Qwen3.6-35B-A3B good for agentic tool use and computer tasks?

Is Qwen3.6-35B-A3B good for multimodal and grounded tasks?

Is Qwen3.6-35B-A3B good for instruction following?

Is Qwen3.6-35B-A3B open source?

Does Qwen3.6-35B-A3B have full benchmark coverage on BenchLM?

What is the context window size of Qwen3.6-35B-A3B?

Related Resources

Don't miss the next GPT moment