Model comparison

LFM2.5-8B-A1B vs Qwen3.6 Plus

Data verified July 16, 2026

Head-to-head evidence from 15 shared benchmark results across 6 categories. Overall scores shown here use the public BenchAlign v5 ranking lane.

LFM2.5-8B-A1B

LiquidAI

41.25/100

Margin

23.8pts

winning →

Qwen3.6 Plus

Alibaba

65.05/100

0 category wins2 category wins

Verified leaderboard positions: LFM2.5-8B-A1B unranked; Qwen3.6 Plus #14

BenchAlign evidence: LFM2.5-8B-A1B estimated; Qwen3.6 Plus supported. Intervals and evidence labels describe ranking uncertainty, not a guarantee for a specific workload.

Evidence parity. LFM2.5-8B-A1B and Qwen3.6 Plus share 15 comparable benchmark results. 2 of 8 categories are comparable. 3 results are unique to LFM2.5-8B-A1B; 46 to Qwen3.6 Plus.

Updated July 16, 2026

Shared results: 15
LFM2.5-8B-A1B only: 3
Qwen3.6 Plus only: 46
Comparable categories: 2 / 8

Pick Qwen3.6 Plus if you want the stronger benchmark profile. LFM2.5-8B-A1B only becomes the better choice if its workflow or ecosystem matters more than the raw scoreboard.

Confidence note. This is a partial-evidence comparison with 15 shared benchmark results across 6 evidence categories; 2 of 8 categories currently have scoreable aggregates for both models. Treat the verdict as directional until coverage is more balanced.

Why this result

Qwen3.6 Plus is clearly ahead on the provisional aggregate, 63 to 37. The gap is large enough that you do not need to squint at the spreadsheet to see the difference.

Qwen3.6 Plus's sharpest advantage is in instruction following, where it averages 82.3 against 68.8. The single biggest benchmark swing on the page is AIME26, 50.0% to 95.3%.

Qwen3.6 Plus gives you the larger context window at 1M, compared with 128K for LFM2.5-8B-A1B.

Category breakdown

Exact category averages are shown below. Not measured means BenchLM does not have enough sourced public coverage for that model and category.

Category scores and score margins for LFM2.5-8B-A1B and Qwen3.6 Plus
Category	LFM2.5-8B-A1B	Δ	Qwen3.6 Plus
Inst. Following	LFM2.5-8B-A1B68.8	Margin→ 13.5	Qwen3.6 Plus82.3
Math	LFM2.5-8B-A1B50.0	Margin→ 10.5	Qwen3.6 Plus60.5
Agentic	LFM2.5-8B-A1BNot measured	MarginNo overlap	Qwen3.6 Plus61.6
Coding	LFM2.5-8B-A1BNot measured	MarginNo overlap	Qwen3.6 Plus70.3
Reasoning	LFM2.5-8B-A1BNot measured	MarginNo overlap	Qwen3.6 Plus62.0
Knowledge	LFM2.5-8B-A1BNot measured	MarginNo overlap	Qwen3.6 Plus57.5
Multilingual	LFM2.5-8B-A1BNot measured	MarginNo overlap	Qwen3.6 Plus84.7
Multimodal	LFM2.5-8B-A1BNot measured	MarginNo overlap	Qwen3.6 Plus79.8

Decisive benchmark drivers

The largest measured benchmark gaps in this matchup, with exact reported values.

A · LFM2.5-8B-A1BB · Qwen3.6 Plus

AIME26
Math
Source ↗
A 50.0%B 95.3%
Winner: Qwen3.6 PlusΔ 45.3
AIME26: LFM2.5-8B-A1B scored 50.0%; Qwen3.6 Plus scored 95.3%. Qwen3.6 Plus wins this benchmark.
IFBench
Inst. Following
Source ↗
A 56.5%B 75.8%
Winner: Qwen3.6 PlusΔ 19.3
IFBench: LFM2.5-8B-A1B scored 56.5%; Qwen3.6 Plus scored 75.8%. Qwen3.6 Plus wins this benchmark.
IFEval
Inst. Following
Source ↗
A 91.8%B 94.3%
Winner: Qwen3.6 PlusΔ 2.5
IFEval: LFM2.5-8B-A1B scored 91.8%; Qwen3.6 Plus scored 94.3%. Qwen3.6 Plus wins this benchmark.

Operational comparison

Runtime and commercial metrics are compared only when both models have a complete sourced value.

Metric	LFM2.5-8B-A1B	Qwen3.6 Plus	Comparison
Input / output priceUSD per 1M tokens	LFM2.5-8B-A1B$0 input / $0 output	Qwen3.6 PlusNot available	A complete price comparison is not available.
Generation speedtokens per second	LFM2.5-8B-A1BNot available	Qwen3.6 PlusNot available	A complete speed comparison is not available.
First-answer latencyseconds to first token	LFM2.5-8B-A1BNot available	Qwen3.6 PlusNot available	A complete latency comparison is not available.
Context windowmaximum listed tokens	LFM2.5-8B-A1B128K	Qwen3.6 Plus1M	Qwen3.6 Plus lists the larger context window.

Benchmark Deep Dive

Agentic

17 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.6 Plus	Result
BFCL v4Source	49.7%	—	Not comparable
τ²-bench resultsSource	16.1%	97.7%	Qwen3.6 Plus leads
Terminal-Bench 2.0Source	—	61.6%	Not comparable
Claw-EvalSource	—	58.8%	Not comparable
QwenClawBenchSource	—	57.2%	Not comparable
τ³-bench resultsSource	—	70.7%	Not comparable
VITA-BenchSource	—	44.3%	Not comparable
DeepPlanningSource	—	41.5%	Not comparable
ToolathlonSource	—	39.8%	Not comparable
MCP AtlasSource	—	48.2%	Not comparable
MCP-TasksSource	—	74.1%	Not comparable
WideResearchSource	—	74.3%	Not comparable
AA Agentic IndexSource	—	27.6%	Not comparable
GDPval-AASource	—	31.8%	Not comparable
GDPval-AASource	—	1135	Not comparable
Gert LabsSource	—	50.60%	Not comparable
ResearchClawBenchSource	—	18.0%	Not comparable

Coding

8 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.6 Plus	Result
Terminal-Bench HardSource	4.5%	43.9%	Qwen3.6 Plus leads
AA-SciCodeSource	7.8%	40.7%	Qwen3.6 Plus leads
SWE-bench VerifiedSource	—	78.8%	Not comparable
SWE-bench ProSource	—	56.6%	Not comparable
SWE MultilingualSource	—	73.8%	Not comparable
LiveCodeBench v6Source	—	87.1%	Not comparable
Vibe Code BenchSource	—	25.56%	Not comparable
AA Coding IndexSource	—	54.5%	Not comparable

Reasoning

4 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.6 Plus	Result
AA-LCRSource	0.0%	69.7%	Qwen3.6 Plus leads
CritPtSource	0.0%	2.9%	Qwen3.6 Plus leads
AI-NeedleSource	—	68.3%	Not comparable
LongBench v2Source	—	62%	Not comparable

Knowledge

12 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.6 Plus	Result
AA-GPQA DiamondSource	51.3%	88.2%	Qwen3.6 Plus leads
AA-HLESource	6.9%	25.7%	Qwen3.6 Plus leads
AA-Omniscience IndexSource	-33.3%	2.7%	Qwen3.6 Plus leads
AA-Omniscience AccuracySource	9.4%	26.2%	Qwen3.6 Plus leads
AA-Omniscience Hallucination RateSource	47.0%	32.0%	Qwen3.6 Plus leads
Artificial Analysis Intelligence IndexSource	8.3%	39.6%	Qwen3.6 Plus leads
GPQASource	—	90.4%	Not comparable
SuperGPQASource	—	71.6%	Not comparable
MMLU-ProSource	—	88.5%	Not comparable
MMLU-ReduxSource	—	94.5%	Not comparable
C-EvalSource	—	93.3%	Not comparable
HLESource	—	28.8%	Not comparable

MathQwen3.6 Plus wins

9 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.6 Plus	Result
MATH-500Source	88.8%	—	Not comparable
AIME 2025Source	42.5%	—	Not comparable
AIME26Source	50.0%	95.3%	Qwen3.6 Plus leads
HMMT Feb 2025Source	—	96.7%	Not comparable
HMMT Nov 2025Source	—	94.6%	Not comparable
HMMT Feb 2026Source	—	87.8%	Not comparable
MMAnswerBenchSource	—	83.8%	Not comparable
FrontierMath v2 (Tiers 1-3)Source	—	26.207%	Not comparable
FrontierMath v2 (Tier 4)Source	—	8.333%	Not comparable

Multilingual

2 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.6 Plus	Result
MMLU-ProXSource	—	84.7%	Not comparable
NOVA-63Source	—	57.9%	Not comparable

Multimodal

9 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.6 Plus	Result
MMMUSource	—	86.0%	Not comparable
MMMU-ProSource	—	78.8%	Not comparable
MathVisionSource	—	88.0%	Not comparable
VideoMMMUSource	—	84.0%	Not comparable
ScreenSpot ProSource	—	68.2%	Not comparable
CharXivSource	—	81.5%	Not comparable
V*Source	—	96.9%	Not comparable
AA-MMMU-ProSource	—	78.0%	Not comparable
Design Arena WebsiteSource	—	1254	Not comparable

Inst. FollowingQwen3.6 Plus wins

3 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.6 Plus	Result
IFEvalSource	91.8%	94.3%	Qwen3.6 Plus leads
IFBenchSource	56.5%	75.8%	Qwen3.6 Plus leads
AA-IFBenchSource	55.6%	75.2%	Qwen3.6 Plus leads

Frequently Asked Questions (3)

Which is better, LFM2.5-8B-A1B or Qwen3.6 Plus?

Qwen3.6 Plus is ahead on BenchLM's provisional leaderboard, 63 to 37. The biggest single separator in this matchup is AIME26, where the scores are 50.0% and 95.3%.

Which is better for math, LFM2.5-8B-A1B or Qwen3.6 Plus?

Qwen3.6 Plus has the edge for math in this comparison, averaging 60.5 versus 50. Inside this category, AIME26 is the benchmark that creates the most daylight between them.

Which is better for instruction following, LFM2.5-8B-A1B or Qwen3.6 Plus?

Qwen3.6 Plus has the edge for instruction following in this comparison, averaging 82.3 versus 68.8. Inside this category, AA-IFBench is the benchmark that creates the most daylight between them.

Related Comparisons

Explore More

LiquidAI Compare Pricing Methodology Find Your Best LLM Overall Rankings

Last updated: July 16, 2026

The AI models change fast. We track them for you.

A weekly brief for engineers and researchers covering new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.