Model comparison

LFM2.5-8B-A1B vs Qwen3.7 Max

Data verified July 16, 2026

Head-to-head evidence from 15 shared benchmark results across 5 categories. Overall scores shown here use the public BenchAlign v5 ranking lane.

LFM2.5-8B-A1B

LiquidAI

41.25/100

Margin

19.7pts

winning →

Qwen3.7 Max

Alibaba

60.99/100

0 category wins2 category wins

Verified leaderboard positions: LFM2.5-8B-A1B unranked; Qwen3.7 Max #2

BenchAlign evidence: LFM2.5-8B-A1B estimated; Qwen3.7 Max estimated. Intervals and evidence labels describe ranking uncertainty, not a guarantee for a specific workload.

Evidence parity. LFM2.5-8B-A1B and Qwen3.7 Max share 15 comparable benchmark results. 2 of 8 categories are comparable. 3 results are unique to LFM2.5-8B-A1B; 42 to Qwen3.7 Max.

Updated July 16, 2026

Shared results: 15
LFM2.5-8B-A1B only: 3
Qwen3.7 Max only: 42
Comparable categories: 2 / 8

Pick Qwen3.7 Max if you want the stronger benchmark profile. LFM2.5-8B-A1B only becomes the better choice if its workflow or ecosystem matters more than the raw scoreboard.

Confidence note. This is a partial-evidence comparison with 15 shared benchmark results across 5 evidence categories; 2 of 8 categories currently have scoreable aggregates for both models. Treat the verdict as directional until coverage is more balanced.

Why this result

Qwen3.7 Max is clearly ahead on the provisional aggregate, 83 to 37. The gap is large enough that you do not need to squint at the spreadsheet to see the difference.

Qwen3.7 Max's sharpest advantage is in mathematics, where it averages 97.1 against 50. The single biggest benchmark swing on the page is IFBench, 56.5% to 79.1%.

Qwen3.7 Max gives you the larger context window at 1M, compared with 128K for LFM2.5-8B-A1B.

Category breakdown

Exact category averages are shown below. Not measured means BenchLM does not have enough sourced public coverage for that model and category.

Category scores and score margins for LFM2.5-8B-A1B and Qwen3.7 Max
Category	LFM2.5-8B-A1B	Δ	Qwen3.7 Max
Math	LFM2.5-8B-A1B50.0	Margin→ 47.1	Qwen3.7 Max97.1
Inst. Following	LFM2.5-8B-A1B68.8	Margin→ 15.6	Qwen3.7 Max84.4
Agentic	LFM2.5-8B-A1BNot measured	MarginNo overlap	Qwen3.7 Max69.7
Coding	LFM2.5-8B-A1BNot measured	MarginNo overlap	Qwen3.7 Max77.9
Reasoning	LFM2.5-8B-A1BNot measured	MarginNo overlap	Qwen3.7 Max90.4
Knowledge	LFM2.5-8B-A1BNot measured	MarginNo overlap	Qwen3.7 Max64.5
Multilingual	LFM2.5-8B-A1BNot measured	MarginNo overlap	Qwen3.7 Max87.0

Decisive benchmark drivers

The largest measured benchmark gaps in this matchup, with exact reported values.

A · LFM2.5-8B-A1BB · Qwen3.7 Max

IFBench
Inst. Following
Source ↗
A 56.5%B 79.1%
Winner: Qwen3.7 MaxΔ 22.6
IFBench: LFM2.5-8B-A1B scored 56.5%; Qwen3.7 Max scored 79.1%. Qwen3.7 Max wins this benchmark.
IFEval
Inst. Following
Source ↗
A 91.8%B 94.3%
Winner: Qwen3.7 MaxΔ 2.5
IFEval: LFM2.5-8B-A1B scored 91.8%; Qwen3.7 Max scored 94.3%. Qwen3.7 Max wins this benchmark.

Operational comparison

Runtime and commercial metrics are compared only when both models have a complete sourced value.

Metric	LFM2.5-8B-A1B	Qwen3.7 Max	Comparison
Input / output priceUSD per 1M tokens	LFM2.5-8B-A1B$0 input / $0 output	Qwen3.7 MaxNot available	A complete price comparison is not available.
Generation speedtokens per second	LFM2.5-8B-A1BNot available	Qwen3.7 MaxNot available	A complete speed comparison is not available.
First-answer latencyseconds to first token	LFM2.5-8B-A1BNot available	Qwen3.7 MaxNot available	A complete latency comparison is not available.
Context windowmaximum listed tokens	LFM2.5-8B-A1B128K	Qwen3.7 Max1M	Qwen3.7 Max lists the larger context window.

Benchmark Deep Dive

Agentic

18 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.7 Max	Result
BFCL v4Source	49.7%	75.0%	Qwen3.7 Max leads
τ²-bench resultsSource	16.1%	94.7%	Qwen3.7 Max leads
Terminal-Bench 2.0Source	—	69.7%	Not comparable
QwenClawBenchSource	—	64.3%	Not comparable
QwenWebBenchSource	—	1568	Not comparable
Claw-EvalSource	—	65.2%	Not comparable
MCP AtlasSource	—	76.4%	Not comparable
VITA-BenchSource	—	47.9%	Not comparable
HLE w/ toolsSource	—	53.5%	Not comparable
AA Agentic IndexSource	—	30.6%	Not comparable
GDPval-AASource	—	38.7%	Not comparable
GDPval-AASource	—	1273	Not comparable
Gert LabsSource	—	64.27%	Not comparable
ResearchClawBenchSource	—	18.7%	Not comparable
AA BriefcaseSource	—	908	Not comparable
AA AutomationBenchSource	—	25.6%	Not comparable
AA EnterpriseOps-GymSource	—	45.0%	Not comparable
AA ITBenchSource	—	42.5%	Not comparable

Coding

11 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.7 Max	Result
Terminal-Bench HardSource	4.5%	50.8%	Qwen3.7 Max leads
AA-SciCodeSource	7.8%	48.8%	Qwen3.7 Max leads
SWE-bench VerifiedSource	—	80.4%	Not comparable
SWE-bench ProSource	—	60.6%	Not comparable
SWE MultilingualSource	—	78.3%	Not comparable
NL2RepoSource	—	47.2%	Not comparable
SciCodeSource	—	53.5%	Not comparable
LiveCodeBenchSource	—	91.6%	Not comparable
Terminal-Bench 2.0Source	—	69.7%	Not comparable
AA Coding IndexSource	—	66.0%	Not comparable
AA Terminal-Bench 2.1Source	—	74.5%	Not comparable

Reasoning

3 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.7 Max	Result
AA-LCRSource	0.0%	69.0%	Qwen3.7 Max leads
CritPtSource	0.0%	13.4%	Qwen3.7 Max leads
MRCRv2Source	—	90.4%	Not comparable

Knowledge

13 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.7 Max	Result
AA-GPQA DiamondSource	51.3%	92.3%	Qwen3.7 Max leads
AA-HLESource	6.9%	38.1%	Qwen3.7 Max leads
AA-Omniscience IndexSource	-33.3%	14.1%	Qwen3.7 Max leads
AA-Omniscience AccuracySource	9.4%	30.1%	Qwen3.7 Max leads
AA-Omniscience Hallucination RateSource	47.0%	22.9%	Qwen3.7 Max leads
Artificial Analysis Intelligence IndexSource	8.3%	46.0%	Qwen3.7 Max leads
GPQASource	—	92.4%	Not comparable
GPQA-DSource	—	92.4%	Not comparable
HLESource	—	41.4%	Not comparable
MMLU-ProSource	—	89.6%	Not comparable
MMLU-ReduxSource	—	95%	Not comparable
SuperGPQASource	—	73.6%	Not comparable
MMMLUSource	—	90.3%	Not comparable

MathQwen3.7 Max wins

6 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.7 Max	Result
MATH-500Source	88.8%	—	Not comparable
AIME 2025Source	42.5%	—	Not comparable
AIME26Source	50.0%	—	Not comparable
HMMT Feb 2026Source	—	97.1%	Not comparable
IMOAnswerBenchSource	—	90.0%	Not comparable
ApexSource	—	44.5%	Not comparable

Multilingual

5 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.7 Max	Result
MMLU-ProXSource	—	87%	Not comparable
NOVA-63Source	—	59.0%	Not comparable
INCLUDESource	—	86.2%	Not comparable
MAXIFESource	—	89.2%	Not comparable
PolyMathSource	—	86.5%	Not comparable

Multimodal

1 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.7 Max	Result
Design Arena WebsiteSource	—	1296	Not comparable

Inst. FollowingQwen3.7 Max wins

3 benchmarks

Benchmark	LFM2.5-8B-A1B	Qwen3.7 Max	Result
IFEvalSource	91.8%	94.3%	Qwen3.7 Max leads
IFBenchSource	56.5%	79.1%	Qwen3.7 Max leads
AA-IFBenchSource	55.6%	80.5%	Qwen3.7 Max leads

Frequently Asked Questions (3)

Which is better, LFM2.5-8B-A1B or Qwen3.7 Max?

Qwen3.7 Max is ahead on BenchLM's provisional leaderboard, 83 to 37. The biggest single separator in this matchup is IFBench, where the scores are 56.5% and 79.1%.

Which is better for math, LFM2.5-8B-A1B or Qwen3.7 Max?

Qwen3.7 Max has the edge for math in this comparison, averaging 97.1 versus 50. LFM2.5-8B-A1B stays close enough that the answer can still flip depending on your workload.

Which is better for instruction following, LFM2.5-8B-A1B or Qwen3.7 Max?

Qwen3.7 Max has the edge for instruction following in this comparison, averaging 84.4 versus 68.8. Inside this category, AA-IFBench is the benchmark that creates the most daylight between them.

Related Comparisons

Explore More

LiquidAI Compare Pricing Methodology Find Your Best LLM Overall Rankings

Last updated: July 16, 2026

The AI models change fast. We track them for you.

A weekly brief for engineers and researchers covering new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.