Model comparison

Ministral 3 3B (Reasoning) vs Mistral Small 4 (Reasoning)

Updated July 30, 2026. Public scores include evidence status and uncertainty. They are not guarantees for a specific workload.

Ministral 3 3B (Reasoning)

Mistral

—

Evidence status unavailable

90% interval unavailable

Mistral Small 4 (Reasoning)

Mistral

—

Evidence status unavailable

90% interval unavailable

The public evidence has no benchmark result shared by both models, so it does not support a quality verdict. Use the documented cost, context, and runtime rows instead.

0 results are shared. Category rows based on different benchmark sets are marked directional and do not name a winner.

Which one for your work

Recommendations appear only when a shared evidence basis or an explicit operating constraint supports the call. Secondary and unsupported use cases stay disclosed below the initial list.

Chat turn cost
1K fresh input + 500 output tokens
Ministral 3 3B (Reasoning)
Ministral 3 3B (Reasoning) has the lower estimated token cost for this stated workload. Costs use the listed standard API rates.
Confidence: listed-rates
Cache-heavy agent loop cost
200K cached + 20K fresh input + 10K output tokens
Ministral 3 3B (Reasoning)
Ministral 3 3B (Reasoning) has the lower estimated token cost for this stated workload. Ministral 3 3B (Reasoning) has no published cached-input rate, so cached tokens use its listed input rate. Mistral Small 4 (Reasoning) has no published cached-input rate, so cached tokens use its listed input rate.
Confidence: rate-fallback

Show secondary and unsupported calls

Repository review cost
50K fresh input + 3K output tokens
Ministral 3 3B (Reasoning)
Ministral 3 3B (Reasoning) has the lower estimated token cost for this stated workload. Costs use the listed standard API rates.
Confidence: listed-rates
Coding work
Code generation, repair, and software-engineering tasks
Not enough matched evidence
No shared weighted benchmark basis supports a winner.
Confidence: limited
Agentic work
Tool use, computer use, and multi-step task completion
Not enough matched evidence
No shared weighted benchmark basis supports a winner.
Confidence: limited
Long documents
Prompts that approach the documented context limit
No clear pick
The documented context windows are equal.
Confidence: documented

What is actually comparable

Shared results can support a head-to-head reading. Results present for only one model describe coverage, not superiority.

Evidence parity totals are not available.

Shared results: 0
Ministral 3 3B (Reasoning) only: 0
Mistral Small 4 (Reasoning) only: 0
Like-for-like categories: 0 / 8

Category results, on a stated basis

Each row states whether both averages use the same weighted benchmark set. Directional and not-comparable rows remain visible, but they never receive a winner in this template.

Agentic

Not comparable

Ministral 3 3B (Reasoning): Not measured
Mistral Small 4 (Reasoning): Not measured
Weighted basis: 0 vs 0 rows
Reading: Not comparable

Coding

Not comparable

Ministral 3 3B (Reasoning): Not measured
Mistral Small 4 (Reasoning): Not measured
Weighted basis: 0 vs 0 rows
Reading: Not comparable

Reasoning

Not comparable

Ministral 3 3B (Reasoning): Not measured
Mistral Small 4 (Reasoning): Not measured
Weighted basis: 0 vs 0 rows
Reading: Not comparable

Knowledge

Not comparable

Ministral 3 3B (Reasoning): Not measured
Mistral Small 4 (Reasoning): Not measured
Weighted basis: 0 vs 0 rows
Reading: Not comparable

Math

Not comparable

Ministral 3 3B (Reasoning): Not measured
Mistral Small 4 (Reasoning): Not measured
Weighted basis: 0 vs 0 rows
Reading: Not comparable

Multilingual

Not comparable

Ministral 3 3B (Reasoning): Not measured
Mistral Small 4 (Reasoning): Not measured
Weighted basis: 0 vs 0 rows
Reading: Not comparable

Multimodal

Not comparable

Ministral 3 3B (Reasoning): Not measured
Mistral Small 4 (Reasoning): Not measured
Weighted basis: 0 vs 0 rows
Reading: Not comparable

Instruction following

Not comparable

Ministral 3 3B (Reasoning): Not measured
Mistral Small 4 (Reasoning): Not measured
Weighted basis: 0 vs 0 rows
Reading: Not comparable

Category averages with the server-provided evidence basis for Ministral 3 3B (Reasoning) and Mistral Small 4 (Reasoning)
Category	Ministral 3 3B (Reasoning)	Mistral Small 4 (Reasoning)	Weighted basis	Reading
Agentic	Not measured	Not measured	Not comparable0 vs 0 rows	Not comparable
Coding	Not measured	Not measured	Not comparable0 vs 0 rows	Not comparable
Reasoning	Not measured	Not measured	Not comparable0 vs 0 rows	Not comparable
Knowledge	Not measured	Not measured	Not comparable0 vs 0 rows	Not comparable
Math	Not measured	Not measured	Not comparable0 vs 0 rows	Not comparable
Multilingual	Not measured	Not measured	Not comparable0 vs 0 rows	Not comparable
Multimodal	Not measured	Not measured	Not comparable0 vs 0 rows	Not comparable
Instruction following	Not measured	Not measured	Not comparable0 vs 0 rows	Not comparable

Shape of the matched evidence

Only shared public evidence is shown. Sparse evidence stays a ruled list rather than being closed into a radar shape.

A shared-evidence shape is not available.

BenchLM does not draw a radar or infer missing axes when the matched evidence is too sparse.

What each workload costs

Three fixed token mixes turn per-token rates into comparable decisions. Each scenario states context fit and whether cached input had to fall back to the published list-input rate.

Chat turn

1K fresh input + 500 output tokens

Ministral 3 3B (Reasoning): $0.00015; Fits in one request
Mistral Small 4 (Reasoning): $0.00045; Fits in one request

Ministral 3 3B (Reasoning) has the lower modeled cost

Costs use the listed standard API rates.

Repository review

50K fresh input + 3K output tokens

Ministral 3 3B (Reasoning): $0.0053; Fits in one request
Mistral Small 4 (Reasoning): $0.0093; Fits in one request

Ministral 3 3B (Reasoning) has the lower modeled cost

Costs use the listed standard API rates.

Cache-heavy agent loop

200K cached + 20K fresh input + 10K output tokens

Ministral 3 3B (Reasoning): $0.023; Fits in one request; Cached input priced at the published list-input rate
Mistral Small 4 (Reasoning): $0.039; Fits in one request; Cached input priced at the published list-input rate

Ministral 3 3B (Reasoning) has the lower modeled cost

Ministral 3 3B (Reasoning) has no published cached-input rate, so cached tokens use its listed input rate. Mistral Small 4 (Reasoning) has no published cached-input rate, so cached tokens use its listed input rate.

Specification differences

Sourced differences are shown directly. Missing facts stay explicit instead of being inferred from a model name or family.

SpecificationMinistral 3 3B (Reasoning)Mistral Small 4 (Reasoning)

Context window

Maximum documented context; output-token limits may be lower.

Ministral 3 3B (Reasoning)

256K

Mistral Small 4 (Reasoning)

256K

API model ID

Ministral 3 3B (Reasoning)

Not sourced

Mistral Small 4 (Reasoning)

Not sourced

Cached-input rate

A missing cached-input rate falls back to the listed input rate only in the stated workload estimate.

Ministral 3 3B (Reasoning)

Not published

Mistral Small 4 (Reasoning)

Not published

Documented inputs

Ministral 3 3B (Reasoning)

Not sourced

Mistral Small 4 (Reasoning)

Not sourced

Documented outputs

Ministral 3 3B (Reasoning)

Not sourced

Mistral Small 4 (Reasoning)

Not sourced

Provider availability

Ministral 3 3B (Reasoning)

Not sourced

Mistral Small 4 (Reasoning)

Not sourced

Reasoning profile

Ministral 3 3B (Reasoning)

Reasoning

Mistral Small 4 (Reasoning)

Reasoning

Weight access

Ministral 3 3B (Reasoning)

Open Weight

Mistral Small 4 (Reasoning)

Open Weight

License

Ministral 3 3B (Reasoning)

Open Weight

Mistral Small 4 (Reasoning)

Open Weight

Release date

Ministral 3 3B (Reasoning)

2025-12-02

Mistral Small 4 (Reasoning)

2026-02-20

If you already use one of these models

Deployment change: Both entries list Mistral as the provider. Confirm endpoint, model ID, limits, and feature support before switching.
Quality signal: The public evidence has no benchmark result shared by both models, so it does not support a quality verdict.
Workload cost: Repository review: $0.0053 vs $0.0093. Cache-heavy agent loop: $0.023 vs $0.039.
Context tradeoff: Both models list 256K.

Run the same representative tasks against both endpoints before changing production traffic.

Frequently asked questions

Which is better, Ministral 3 3B (Reasoning) or Mistral Small 4 (Reasoning)?

The public evidence has no benchmark result shared by both models, so it does not support a quality verdict. The page therefore keeps the decision tied to the specific documented workload.

Which is better for coding, Ministral 3 3B (Reasoning) or Mistral Small 4 (Reasoning)?

The published evidence does not provide a shared weighted coding basis for both models, so BenchLM does not name a coding winner.

Which is better for agentic tasks, Ministral 3 3B (Reasoning) or Mistral Small 4 (Reasoning)?

The published evidence does not provide a shared weighted agentic tasks basis for both models, so BenchLM does not name a agentic tasks winner.

Which costs less, Ministral 3 3B (Reasoning) or Mistral Small 4 (Reasoning)?

For the stated presets, chat costs $0.00015 on Ministral 3 3B (Reasoning) and $0.00045 on Mistral Small 4 (Reasoning); repository review costs $0.0053 and $0.0093; the cache-heavy agent loop costs $0.023 and $0.039. Ministral 3 3B (Reasoning) has no published cached-input rate, so cached tokens use its listed input rate. Mistral Small 4 (Reasoning) has no published cached-input rate, so cached tokens use its listed input rate.

Which has the larger context window, Ministral 3 3B (Reasoning) or Mistral Small 4 (Reasoning)?

Both models list the same context window, 256K.

Related comparisons

Compare API pricing Read the methodology Open the model selector

Last updated July 30, 2026

Watch Ministral 3 3B (Reasoning) vs Mistral Small 4 (Reasoning)

One weekly email when material rank, price, or benchmark evidence changes make this matchup worth revisiting.

Read a sample issue

Join 2,000+ readers.

Ministral 3 3B (Reasoning) vs Mistral Small 4 (Reasoning)

Which one for your work

Chat turn cost

Cache-heavy agent loop cost

Repository review cost

Coding work

Agentic work

Long documents

What is actually comparable

Category results, on a stated basis

Agentic

Coding

Reasoning

Knowledge

Math

Multilingual

Multimodal

Instruction following

Shape of the matched evidence

What each workload costs

Chat turn

Repository review

Cache-heavy agent loop

Specification differences

Context window

API model ID

Cached-input rate

Documented inputs

Documented outputs

Provider availability

Reasoning profile

Weight access

License

Release date

Frequently asked questions

Which is better, Ministral 3 3B (Reasoning) or Mistral Small 4 (Reasoning)?

Which is better for coding, Ministral 3 3B (Reasoning) or Mistral Small 4 (Reasoning)?

Which is better for agentic tasks, Ministral 3 3B (Reasoning) or Mistral Small 4 (Reasoning)?

Which costs less, Ministral 3 3B (Reasoning) or Mistral Small 4 (Reasoning)?

Which has the larger context window, Ministral 3 3B (Reasoning) or Mistral Small 4 (Reasoning)?

Related comparisons

Watch Ministral 3 3B (Reasoning) vs Mistral Small 4 (Reasoning)