Model profile

LFM2.5-8B-A1B

Name: LFM2.5-8B-A1B
Author: LiquidAI

LiquidAICurrentReleased May 28, 2026

Data verified July 16, 2026

Overall Score

Unranked

Arena Elo

Not listed

Categories Ranked

1of 8

Price (1M tokens)

$0 in / $0 out

API pricing

Speed

Not listed

Context

128K

Evidence coverage

18 of 313 tracked benchmarks are published. 9 are verified and 9 provisional. 6 of 8 categories are measured.

Updated July 16, 2026Methodology

Published / tracked: 18 / 313
Verified: 9
Provisional: 9
Categories measured: 6 / 8

Agentic2 benchmarks
Verified
Coding2 benchmarks
Reported
Reasoning2 benchmarks
Reported
Knowledge6 benchmarks
Mixed evidence
Math3 benchmarks
Verified
Multilingual0 benchmarks
Not measured
Multimodal0 benchmarks
Not measured
Inst. Following3 benchmarks
Mixed evidence

Open WeightSelf-hostReasoning

Confidence:

Low

reasoning

BenchLM is tracking LFM2.5-8B-A1B, but this profile is currently excluded from the public leaderboard because it still lacks enough non-generated benchmark coverage to rank safely. Only non-generated public benchmark rows appear below.

LFM2.5-8B-A1B is a open weight model with a 128K token context window. It uses explicit chain-of-thought reasoning, which typically improves performance on math and complex reasoning tasks at the cost of higher latency and token usage.

This profile currently has 18 of 313 tracked benchmarks. BenchLM only exposes non-generated benchmark rows publicly, so missing categories stay blank until a sourced evaluation is available.

Its strongest category is Instruction Following (#110). This performance profile makes it a well-rounded choice across a range of tasks.

Peer position

Exact provisional scores and ranks for the closest listed peers.

Range 36.0–38.0

DeepSeek V3
DeepSeek
#7038.0
DeepSeek V3 is #70 with a score of 38.0.
Compare
Gemini 2.5 Flash
Google
#7137.0
Gemini 2.5 Flash is #71 with a score of 37.0.
Compare
1-bit Bonsai 4B
Prism ML
Unranked38.0
1-bit Bonsai 4B is Unranked with a score of 38.0.
Compare
LFM2.5-8B-A1BCurrent model
LiquidAI
Unranked37.0
LFM2.5-8B-A1B is Unranked with a score of 37.0.
Ling 2.6 Flash
InclusionAI
Unranked36.0
Ling 2.6 Flash is Unranked with a score of 36.0.
Compare
MiniCPM5-1B
OpenBMB
Unranked36.0
MiniCPM5-1B is Unranked with a score of 36.0.
Compare
Qwen2.5-VL-32B
Alibaba
Unranked36.0
Qwen2.5-VL-32B is Unranked with a score of 36.0.
Compare

Category percentile

Relative position among models eligible for each sourced category. A higher percentile means a stronger position within that category's ranked cohort; 100 is highest.

Inst. Following19%
Eligible cohort rank #110 of 135Category score 40.4

Category evidence

Scores and ranks appear only where this model has published benchmark evidence. Categories without displayable source records remain not measured.

Category scores, ranks, weighting, benchmark coverage, and evidence status
Category	Score	Rank	Percentile	Weight	Benchmarks	Evidence
AgenticRank Not rankedWeight 22%2 benchmarksVerified	0.0	Not ranked	Not available	22%	2 benchmarks	Verified
CodingRank Not rankedWeight 20%2 benchmarksReported	0.0	Not ranked	Not available	20%	2 benchmarks	Reported
ReasoningRank Not rankedWeight 17%2 benchmarksReported	0.0	Not ranked	Not available	17%	2 benchmarks	Reported
KnowledgeRank Not rankedWeight 12%6 benchmarksMixed sources	0.0	Not ranked	Not available	12%	6 benchmarks	Mixed sources
MathRank Not rankedWeight 5%3 benchmarksVerified	28.8	Not ranked	Not available	5%	3 benchmarks	Verified
MultilingualWeight 7%0 benchmarksNot measured	Not measured	Not ranked	Not available	7%	0 benchmarks	Not measured
MultimodalWeight 12%0 benchmarksNot measured	Not measured	Not ranked	Not available	12%	0 benchmarks	Not measured
Inst. FollowingRank #110 of 135Percentile 19thWeight 5%3 benchmarksMixed sources	40.4	#110 of 135	19th	5%	3 benchmarks	Mixed sources

Benchmark Details

Rows below have a displayable published verification record. Each source link and provenance note remains in the page HTML while its category is closed. Source-unverified manual rows and generated rows stay hidden.

Agentic2 benchmarks

BFCL v4Provider exact

Berkeley Function Calling Leaderboard v4

49.7%Display only

Source: Liquid AI: LFM2.5-8B-A1B launch postProvenance: Liquid reports LFM2.5-8B-A1B at 49.73 on BFCLv4.

τ²-bench resultsProvider exact

τ²-Bench Tool-Agent-User Evaluation

16.1%Display only

Source: Liquid AI: LFM2.5-8B-A1B launch postProvenance: Liquid reports LFM2.5-8B-A1B at 88.07 on Tau² Telecom. BenchLM stores it on the existing tau2Bench key.

Coding2 benchmarks

Terminal-Bench HardReported

4.5%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-SciCodeReported

Artificial Analysis SciCode

7.8%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Reasoning2 benchmarks

AA-LCRReported

Artificial Analysis Long Context Reasoning

0.0%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

CritPtReported

Critical Physics Tasks

0.0%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Knowledge6 benchmarks

AA-GPQA DiamondReported

Artificial Analysis GPQA Diamond

51.3%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-HLEReported

Artificial Analysis Humanity's Last Exam

6.9%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-Omniscience IndexProvider exact

Artificial Analysis Omniscience Index

-33.3%Display only

Source: Liquid AI: LFM2.5-8B-A1B launch postProvenance: Liquid reports LFM2.5-8B-A1B at -24.70 on AA-Omniscience Index.

AA-Omniscience AccuracyProvider exact

Artificial Analysis Omniscience Accuracy

9.4%Display only

Source: Liquid AI: LFM2.5-8B-A1B launch postProvenance: Liquid reports LFM2.5-8B-A1B at 8.67 on AA-Omniscience Accuracy.

AA-Omniscience Hallucination RateReported

Artificial Analysis Omniscience Hallucination Rate

47.0%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Artificial Analysis Intelligence IndexReported

8.3%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Math3 benchmarks

AIME26Provider exact

AIME 2026

50.0%Weighted 25%

Source: Liquid AI: LFM2.5-8B-A1B launch postProvenance: Liquid reports LFM2.5-8B-A1B at 50.00 on AIME26.

MATH-500Provider exact

MATH-500 Problem Set

88.8%Display only

Source: Liquid AI: LFM2.5-8B-A1B launch postProvenance: Liquid reports LFM2.5-8B-A1B at 88.76 on MATH500.

AIME 2025Provider exact

American Invitational Mathematics Examination 2025

42.5%Display only

Source: Liquid AI: LFM2.5-8B-A1B launch postProvenance: Liquid reports LFM2.5-8B-A1B at 42.53 on AIME25.

Inst. Following3 benchmarks

IFBenchProvider exact

Instruction Following Benchmark

56.5%Weighted 65%

Source: Liquid AI: LFM2.5-8B-A1B launch postProvenance: Liquid reports LFM2.5-8B-A1B at 56.47 on IFBench.

IFEvalProvider exact

Instruction-Following Eval

91.8%Weighted 35%

Source: Liquid AI: LFM2.5-8B-A1B launch postProvenance: Liquid reports LFM2.5-8B-A1B at 91.84 on IFEval.

AA-IFBenchReported

Artificial Analysis IFBench

55.6%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Frequently Asked Questions

How does LFM2.5-8B-A1B perform overall in AI benchmarks?

LFM2.5-8B-A1B has 18 published benchmark scores on BenchLM, but it does not yet have enough non-generated coverage to receive a global overall rank.

Is LFM2.5-8B-A1B good for knowledge and understanding?

LFM2.5-8B-A1B has visible benchmark coverage in knowledge and understanding, but BenchLM does not currently assign it a global category rank there.

Is LFM2.5-8B-A1B good for coding and programming?

LFM2.5-8B-A1B has visible benchmark coverage in coding and programming, but BenchLM does not currently assign it a global category rank there.

Is LFM2.5-8B-A1B good for mathematics?

LFM2.5-8B-A1B has visible benchmark coverage in mathematics, but BenchLM does not currently assign it a global category rank there.

Is LFM2.5-8B-A1B good for reasoning and logic?

LFM2.5-8B-A1B has visible benchmark coverage in reasoning and logic, but BenchLM does not currently assign it a global category rank there.

Is LFM2.5-8B-A1B good for agentic tool use and computer tasks?

LFM2.5-8B-A1B has visible benchmark coverage in agentic tool use and computer tasks, but BenchLM does not currently assign it a global category rank there.

Is LFM2.5-8B-A1B good for instruction following?

LFM2.5-8B-A1B ranks #110 out of 78 models in instruction following benchmarks with an average score of 40.4. There are stronger options in this category.

Is LFM2.5-8B-A1B open source?

Yes, LFM2.5-8B-A1B is an open weight model created by LiquidAI, meaning it can be downloaded and run locally or fine-tuned for specific use cases.

Does LFM2.5-8B-A1B have full benchmark coverage on BenchLM?

Not yet. LFM2.5-8B-A1B currently has 18 published benchmark scores out of the 313 benchmarks BenchLM tracks. BenchLM only exposes non-generated public benchmark rows, so missing categories stay blank until a sourced evaluation is available.

What is the context window size of LFM2.5-8B-A1B?

LFM2.5-8B-A1B has a context window of 128K, which determines how much text it can process in a single interaction.

Related Resources

Last updated: July 16, 2026 · Runtime metrics stay blank until BenchLM has a sourced snapshot.

Don't miss the next GPT moment

Which models moved up, what is new, and what it costs. One email each week.

Free. One email per week.

LFM2.5-8B-A1B

Evidence coverage

Evidence by category

Peer position

Category percentile

Category evidence

Benchmark Details

Frequently Asked Questions

How does LFM2.5-8B-A1B perform overall in AI benchmarks?

Is LFM2.5-8B-A1B good for knowledge and understanding?

Is LFM2.5-8B-A1B good for coding and programming?

Is LFM2.5-8B-A1B good for mathematics?

Is LFM2.5-8B-A1B good for reasoning and logic?

Is LFM2.5-8B-A1B good for agentic tool use and computer tasks?

Is LFM2.5-8B-A1B good for instruction following?

Is LFM2.5-8B-A1B open source?

Does LFM2.5-8B-A1B have full benchmark coverage on BenchLM?

What is the context window size of LFM2.5-8B-A1B?

Related Resources

Don't miss the next GPT moment