Model comparison

Kimi K2.5 vs Ministral 3 14B

Updated July 29, 2026. Public scores include evidence status and uncertainty. They are not guarantees for a specific workload.

Kimi K2.5

Moonshot AI

58.8/100

Supported · Public rank #60

90% interval 51.3–66.3

Ministral 3 14B

Mistral

—

Evidence status unavailable

90% interval unavailable

The public evidence has no benchmark result shared by both models, so it does not support a quality verdict. Use the documented cost, context, and runtime rows instead.

0 results are shared. Category rows based on different benchmark sets are marked directional and do not name a winner.

Which one for your work

Recommendations appear only when a shared evidence basis or an explicit operating constraint supports the call. Secondary and unsupported use cases stay disclosed below the initial list.

Chat turn cost
1K fresh input + 500 output tokens
Ministral 3 14B
Ministral 3 14B has the lower estimated token cost for this stated workload. Costs use the listed standard API rates.
Confidence: listed-rates
Cache-heavy agent loop cost
200K cached + 20K fresh input + 10K output tokens
Ministral 3 14B
Ministral 3 14B has the lower estimated token cost for this stated workload. Kimi K2.5 has no published cached-input rate, so cached tokens use its listed input rate. Ministral 3 14B has no published cached-input rate, so cached tokens use its listed input rate.
Confidence: rate-fallback

Show secondary and unsupported calls

Repository review cost
50K fresh input + 3K output tokens
Ministral 3 14B
Ministral 3 14B has the lower estimated token cost for this stated workload. Costs use the listed standard API rates.
Confidence: listed-rates
Coding work
Code generation, repair, and software-engineering tasks
Not enough matched evidence
No shared weighted benchmark basis supports a winner.
Confidence: limited
Agentic work
Tool use, computer use, and multi-step task completion
Not enough matched evidence
No shared weighted benchmark basis supports a winner.
Confidence: limited
Long documents
Prompts that approach the documented context limit
No clear pick
The documented context windows are equal.
Confidence: documented

What is actually comparable

Shared results can support a head-to-head reading. Results present for only one model describe coverage, not superiority.

45 Kimi K2.5 only

Shared results: 0
Kimi K2.5 only: 45
Ministral 3 14B only: 0
Like-for-like categories: 0 / 8

Category results, on a stated basis

Each row states whether both averages use the same weighted benchmark set. Directional and not-comparable rows remain visible, but they never receive a winner in this template.

Agentic

Not comparable

Kimi K2.5: 55.0
Ministral 3 14B: Not measured
Weighted basis: 2 vs 0 rows
Reading: Not comparable

Coding

Not comparable

Kimi K2.5: 59.4
Ministral 3 14B: Not measured
Weighted basis: 4 vs 0 rows
Reading: Not comparable

Reasoning

Not comparable

Kimi K2.5: 61.0
Ministral 3 14B: Not measured
Weighted basis: 1 vs 0 rows
Reading: Not comparable

Knowledge

Not comparable

Kimi K2.5: 56.9
Ministral 3 14B: Not measured
Weighted basis: 4 vs 0 rows
Reading: Not comparable

Math

Not comparable

Kimi K2.5: 60.6
Ministral 3 14B: Not measured
Weighted basis: 4 vs 0 rows
Reading: Not comparable

Multilingual

Not comparable

Kimi K2.5: 82.3
Ministral 3 14B: Not measured
Weighted basis: 1 vs 0 rows
Reading: Not comparable

Multimodal

Not comparable

Kimi K2.5: 78.5
Ministral 3 14B: Not measured
Weighted basis: 1 vs 0 rows
Reading: Not comparable

Instruction following

Not comparable

Kimi K2.5: 93.9
Ministral 3 14B: Not measured
Weighted basis: 1 vs 0 rows
Reading: Not comparable

Category averages with the server-provided evidence basis for Kimi K2.5 and Ministral 3 14B
Category	Kimi K2.5	Ministral 3 14B	Weighted basis	Reading
Agentic	55.0	Not measured	Not comparable2 vs 0 rows	Not comparable
Coding	59.4	Not measured	Not comparable4 vs 0 rows	Not comparable
Reasoning	61.0	Not measured	Not comparable1 vs 0 rows	Not comparable
Knowledge	56.9	Not measured	Not comparable4 vs 0 rows	Not comparable
Math	60.6	Not measured	Not comparable4 vs 0 rows	Not comparable
Multilingual	82.3	Not measured	Not comparable1 vs 0 rows	Not comparable
Multimodal	78.5	Not measured	Not comparable1 vs 0 rows	Not comparable
Instruction following	93.9	Not measured	Not comparable1 vs 0 rows	Not comparable

Shape of the matched evidence

Only shared public evidence is shown. Sparse evidence stays a ruled list rather than being closed into a radar shape.

A shared-evidence shape is not available.

BenchLM does not draw a radar or infer missing axes when the matched evidence is too sparse.

What each workload costs

Three fixed token mixes turn per-token rates into comparable decisions. Each scenario states context fit and whether cached input had to fall back to the published list-input rate.

Chat turn

1K fresh input + 500 output tokens

Kimi K2.5: $0.0021; Fits in one request
Ministral 3 14B: $0.0003; Fits in one request

Ministral 3 14B has the lower modeled cost

Costs use the listed standard API rates.

Repository review

50K fresh input + 3K output tokens

Kimi K2.5: $0.039; Fits in one request
Ministral 3 14B: $0.0106; Fits in one request

Ministral 3 14B has the lower modeled cost

Costs use the listed standard API rates.

Cache-heavy agent loop

200K cached + 20K fresh input + 10K output tokens

Kimi K2.5: $0.162; Fits in one request; Cached input priced at the published list-input rate
Ministral 3 14B: $0.046; Fits in one request; Cached input priced at the published list-input rate

Ministral 3 14B has the lower modeled cost

Kimi K2.5 has no published cached-input rate, so cached tokens use its listed input rate. Ministral 3 14B has no published cached-input rate, so cached tokens use its listed input rate.

Specification differences

Sourced differences are shown directly. Missing facts stay explicit instead of being inferred from a model name or family.

SpecificationKimi K2.5Ministral 3 14B

Context window

Maximum documented context; output-token limits may be lower.

Kimi K2.5

256K

Ministral 3 14B

256K

API model ID

Kimi K2.5

Not sourced

Ministral 3 14B

Not sourced

Cached-input rate

A missing cached-input rate falls back to the listed input rate only in the stated workload estimate.

Kimi K2.5

Not published

Ministral 3 14B

Not published

Documented inputs

Kimi K2.5

Not sourced

Ministral 3 14B

Not sourced

Documented outputs

Kimi K2.5

Not sourced

Ministral 3 14B

Not sourced

Provider availability

Kimi K2.5

Not sourced

Ministral 3 14B

Not sourced

Reasoning profile

Kimi K2.5

Non-Reasoning

Ministral 3 14B

Non-Reasoning

Weight access

Kimi K2.5

Open Weight

Ministral 3 14B

Open Weight

License

Kimi K2.5

Open Weight

Ministral 3 14B

Open Weight

Release date

Kimi K2.5

2026-02-01

Ministral 3 14B

2025-12-02

If you already use one of these models

Deployment change: The models list different providers, so authentication, endpoint behavior, limits, and feature support may change.
Quality signal: The public evidence has no benchmark result shared by both models, so it does not support a quality verdict.
Workload cost: Repository review: $0.039 vs $0.0106. Cache-heavy agent loop: $0.162 vs $0.046.
Context tradeoff: Both models list 256K.

Run the same representative tasks against both endpoints before changing production traffic.

Self-host vs API cost

Estimates at 50,000 req/day · 1000 tokens/req average.

Kimi K2.5

API / mo$2,700

Self-host / mo$5,221

Break-even132M/day

Ministral 3 14B

API / mo$300

Self-host / moNot listed

Break-even—

Proprietary model — self-hosting not applicable.

Model the full break-even

Benchmark evidence

The full public result ledger is available for audit without forcing a wide desktop table onto a phone.

Browse raw public benchmark evidence45 rows

Agentic

Terminal-Bench 2.0
Kimi K2.550.8%
Source
Ministral 3 14B—
Not directly comparable
BrowseComp
Kimi K2.560.6%
Source
Ministral 3 14B—
Not directly comparable
Claw-Eval
Kimi K2.552.3%
Source
Ministral 3 14B—
Not directly comparable
QwenClawBench
Kimi K2.554.3%
Source
Ministral 3 14B—
Not directly comparable
τ³-bench results
Kimi K2.565.7%
Source
Ministral 3 14B—
Not directly comparable
DeepSearchQA
Kimi K2.577.1%
Source
Ministral 3 14B—
Not directly comparable
DeepPlanning
Kimi K2.514.4%
Source
Ministral 3 14B—
Not directly comparable
Toolathlon
Kimi K2.527.8%
Source
Ministral 3 14B—
Not directly comparable
MCP Atlas
Kimi K2.529.5%
Source
Ministral 3 14B—
Not directly comparable
MCP-Tasks
Kimi K2.559.1%
Source
Ministral 3 14B—
Not directly comparable
WideResearch
Kimi K2.572.7%
Source
Ministral 3 14B—
Not directly comparable
Gert Labs
Kimi K2.545.88%
Source
Ministral 3 14B—
Not directly comparable
ResearchClawBench
Kimi K2.514.0%
Source
Ministral 3 14B—
Not directly comparable
JobBench
Kimi K2.58.7%
Source
Ministral 3 14B—
Not directly comparable

Coding

SWE-bench Verified
Kimi K2.576.8%
Source
Ministral 3 14B—
Not directly comparable
SWE-bench Verified*
Kimi K2.570.8%
Source
Ministral 3 14B—
Not directly comparable
LiveCodeBench v6
Kimi K2.585.0%
Source
Ministral 3 14B—
Not directly comparable
SWE-bench Pro
Kimi K2.550.7%
Source
Ministral 3 14B—
Not directly comparable
SWE Multilingual
Kimi K2.573%
Source
Ministral 3 14B—
Not directly comparable
SWE-Rebench
Kimi K2.558.5%
Source
Ministral 3 14B—
Not directly comparable
React Native Evals
Kimi K2.577.2%
Source
Ministral 3 14B—
Not directly comparable
SciCode
Kimi K2.548.7%
Source
Ministral 3 14B—
Not directly comparable

Reasoning

LongBench v2
Kimi K2.561%
Source
Ministral 3 14B—
Not directly comparable

Knowledge

GPQA
Kimi K2.587.6%
Source
Ministral 3 14B—
Not directly comparable
GPQA-D
Kimi K2.587.6%
Source
Ministral 3 14B—
Not directly comparable
SuperGPQA
Kimi K2.569.2%
Source
Ministral 3 14B—
Not directly comparable
MMLU-Pro
Kimi K2.587.1%
Source
Ministral 3 14B—
Not directly comparable
MMLU-Pro (Arcee)
Kimi K2.587.1%
Source
Ministral 3 14B—
Not directly comparable
HLE
Kimi K2.530.1%
Source
Ministral 3 14B—
Not directly comparable

Math

AIME 2025
Kimi K2.596.1%
Source
Ministral 3 14B—
Not directly comparable
AIME26
Kimi K2.595.8%
Source
Ministral 3 14B—
Not directly comparable
AIME25 (Arcee)
Kimi K2.596.3%
Source
Ministral 3 14B—
Not directly comparable
HMMT Feb 2025
Kimi K2.595.4%
Source
Ministral 3 14B—
Not directly comparable
HMMT Nov 2025
Kimi K2.591.1%
Source
Ministral 3 14B—
Not directly comparable
HMMT Feb 2026
Kimi K2.587.1%
Source
Ministral 3 14B—
Not directly comparable
MMAnswerBench
Kimi K2.581.8%
Source
Ministral 3 14B—
Not directly comparable
FrontierMath v2 (Tiers 1-3)
Kimi K2.527.900%
Source
Ministral 3 14B—
Not directly comparable
FrontierMath v2 (Tier 4)
Kimi K2.54.200%
Source
Ministral 3 14B—
Not directly comparable

Multilingual

MMLU-ProX
Kimi K2.582.3%
Source
Ministral 3 14B—
Not directly comparable
NOVA-63
Kimi K2.556.0%
Source
Ministral 3 14B—
Not directly comparable

Multimodal

MMMU-Pro
Kimi K2.578.5%
Source
Ministral 3 14B—
Not directly comparable
Video-MME
Kimi K2.587.4%
Source
Ministral 3 14B—
Not directly comparable
MMVU
Kimi K2.580.4%
Source
Ministral 3 14B—
Not directly comparable
VideoMMMU
Kimi K2.586.6%
Source
Ministral 3 14B—
Not directly comparable

Instruction following

IFEval
Kimi K2.593.9%
Source
Ministral 3 14B—
Not directly comparable

Frequently asked questions

Which is better, Kimi K2.5 or Ministral 3 14B?

The public evidence has no benchmark result shared by both models, so it does not support a quality verdict. The page therefore keeps the decision tied to the specific documented workload.

Which is better for coding, Kimi K2.5 or Ministral 3 14B?

The published evidence does not provide a shared weighted coding basis for both models, so BenchLM does not name a coding winner.

Which is better for agentic tasks, Kimi K2.5 or Ministral 3 14B?

The published evidence does not provide a shared weighted agentic tasks basis for both models, so BenchLM does not name a agentic tasks winner.

Which costs less, Kimi K2.5 or Ministral 3 14B?

For the stated presets, chat costs $0.0021 on Kimi K2.5 and $0.0003 on Ministral 3 14B; repository review costs $0.039 and $0.0106; the cache-heavy agent loop costs $0.162 and $0.046. Kimi K2.5 has no published cached-input rate, so cached tokens use its listed input rate. Ministral 3 14B has no published cached-input rate, so cached tokens use its listed input rate.

Which has the larger context window, Kimi K2.5 or Ministral 3 14B?

Both models list the same context window, 256K.

Related comparisons

Compare API pricing Read the methodology Open the model selector

Last updated July 29, 2026

Watch Kimi K2.5 vs Ministral 3 14B

One weekly email when material rank, price, or benchmark evidence changes make this matchup worth revisiting.

Read a sample issue

Join 2,000+ readers.

Kimi K2.5 vs Ministral 3 14B

Which one for your work

Chat turn cost

Cache-heavy agent loop cost

Repository review cost

Coding work

Agentic work

Long documents

What is actually comparable

Category results, on a stated basis

Agentic

Coding

Reasoning

Knowledge

Math

Multilingual

Multimodal

Instruction following

Shape of the matched evidence

What each workload costs

Chat turn

Repository review

Cache-heavy agent loop

Specification differences

Context window

API model ID

Cached-input rate

Documented inputs

Documented outputs

Provider availability

Reasoning profile

Weight access

License

Release date

Self-host vs API cost

Benchmark evidence

Agentic

Coding

Reasoning

Knowledge

Math

Multilingual

Multimodal

Instruction following

Frequently asked questions

Which is better, Kimi K2.5 or Ministral 3 14B?

Which is better for coding, Kimi K2.5 or Ministral 3 14B?

Which is better for agentic tasks, Kimi K2.5 or Ministral 3 14B?

Which costs less, Kimi K2.5 or Ministral 3 14B?

Which has the larger context window, Kimi K2.5 or Ministral 3 14B?

Related comparisons

Watch Kimi K2.5 vs Ministral 3 14B