Model profile

MiniMax M3

Name: MiniMax M3
Author: MiniMax

MiniMaxCurrentReleased Jun 1, 2026

Data verified July 16, 2026

Overall Score

70Prov. #28 of 78Verified #18 of 32

Arena Elo

1445

Categories Ranked

2of 8

Price (1M tokens)

$0.3 in / $1.2 out

API pricing

Speed

Not listed

Context

Evidence coverage

45 of 313 tracked benchmarks are published. 22 are verified and 23 provisional. 7 of 8 categories are measured.

Updated July 16, 2026Methodology

Published / tracked: 45 / 313
Verified: 22
Provisional: 23
Categories measured: 7 / 8

Agentic16 benchmarks
Mixed evidence
Coding11 benchmarks
Mixed evidence
Reasoning2 benchmarks
Reported
Knowledge7 benchmarks
Reported
Math1 benchmark
Verified
Multilingual0 benchmarks
Not measured
Multimodal7 benchmarks
Mixed evidence
Inst. Following1 benchmark
Reported

Open WeightSelf-hostNon-Reasoning

Confidence:

Medium

base

According to BenchLM.ai, MiniMax M3 ranks #28 out of 78 models on the provisional leaderboard with an overall score of 70/100. It also ranks #18 out of 32 on the verified leaderboard. This places it in the mid-tier of AI models, with strengths in specific benchmark categories.

MiniMax M3 is a open weight model with a 1M token context window. It processes queries without explicit chain-of-thought reasoning, offering faster response times and lower token usage.

BenchLM links it directly to MiniMax M2.7 as the earlier related model in that lineage. This profile currently has 45 of 313 tracked benchmarks. BenchLM only exposes non-generated benchmark rows publicly, so missing categories stay blank until a sourced evaluation is available.

Its strongest category is Agentic (#20), while its weakest is Multimodal & Grounded (#72). This performance profile makes it particularly useful for coding agents, browser research, and computer-use workflows.

Peer position

Exact provisional scores and ranks for the closest listed peers.

Range 70.0–71.0

GPT-5.1-Codex-Max
OpenAI
#2371.0
GPT-5.1-Codex-Max is #23 with a score of 71.0.
Compare
GPT-5.1
OpenAI
#2571.0
GPT-5.1 is #25 with a score of 71.0.
Compare
Claude Opus 4.5
Anthropic
#2671.0
Claude Opus 4.5 is #26 with a score of 71.0.
Compare
GPT-5 (high)
OpenAI
#2771.0
GPT-5 (high) is #27 with a score of 71.0.
Compare
MiniMax M3Current model
MiniMax
#2870.0
MiniMax M3 is #28 with a score of 70.0.
Kimi K2.5 (Reasoning)
Moonshot AI
#2970.0
Kimi K2.5 (Reasoning) is #29 with a score of 70.0.
Compare
DeepSeek V4 Flash (Max)
DeepSeek
#3070.0
DeepSeek V4 Flash (Max) is #30 with a score of 70.0.
Compare

Category percentile

Relative position among models eligible for each sourced category. A higher percentile means a stronger position within that category's ranked cohort; 100 is highest.

Agentic84%
Eligible cohort rank #20 of 121Category score 77.9
Multimodal36%
Eligible cohort rank #72 of 112Category score 49.8

Category evidence

Scores and ranks appear only where this model has published benchmark evidence. Categories without displayable source records remain not measured.

Category scores, ranks, weighting, benchmark coverage, and evidence status
Category	Score	Rank	Percentile	Weight	Benchmarks	Evidence
AgenticRank #20 of 121Percentile 84thWeight 22%16 benchmarksMixed sources	77.9	#20 of 121	84th	22%	16 benchmarks	Mixed sources
CodingRank Not rankedWeight 20%11 benchmarksMixed sources	72.9	Not ranked	Not available	20%	11 benchmarks	Mixed sources
ReasoningRank Not rankedWeight 17%2 benchmarksReported	0.0	Not ranked	Not available	17%	2 benchmarks	Reported
KnowledgeRank Not rankedWeight 12%7 benchmarksReported	0.0	Not ranked	Not available	12%	7 benchmarks	Reported
MathRank Not rankedWeight 5%1 benchmarkVerified	68.8	Not ranked	Not available	5%	1 benchmark	Verified
MultilingualWeight 7%0 benchmarksNot measured	Not measured	Not ranked	Not available	7%	0 benchmarks	Not measured
MultimodalRank #72 of 112Percentile 36thWeight 12%7 benchmarksMixed sources	49.8	#72 of 112	36th	12%	7 benchmarks	Mixed sources
Inst. FollowingRank Not rankedWeight 5%1 benchmarkReported	0.0	Not ranked	Not available	5%	1 benchmark	Reported

Chatbot Arena performance

Scroll horizontally to inspect confidence intervals and vote counts.

Chatbot Arena Elo, confidence interval, and vote count by evaluation view
View	Elo	Confidence interval	Votes
Text Overall	1445	±5.5	23,823
Coding	1499	±8.3	6,933
Math	1444	±18.0	1,125
Instruction Following	1439	±7.9	7,909
Creative Writing	1405	±10.4	3,792
Multi-turn	1453	±9.9	4,426
Hard Prompts	1466	±6.5	15,762
Hard Prompts (English)	1476	±8.0	7,646
Longer Query	1455	±7.5	10,419

Benchmark Details

Rows below have a displayable published verification record. Each source link and provenance note remains in the page HTML while its category is closed. Source-unverified manual rows and generated rows stay hidden.

Agentic16 benchmarks

Terminal-Bench 2.0Provider exact

66%Weighted 38%

Source: MiniMax: MiniMax M3Provenance: MiniMax reports MiniMax M3 at 66.0% on Terminal-Bench 2.1.

OSWorld-VerifiedProvider exact

70.1%Weighted 34%

Source: MiniMax: MiniMax M3Provenance: MiniMax reports MiniMax M3 at 70.06% on OSWorld-Verified.

BrowseCompProvider exact

83.5%Weighted 28%

Source: MiniMax: MiniMax M3Provenance: MiniMax reports MiniMax M3 at 83.52% on BrowseComp.

MCP AtlasProvider exact

74.2%Display only

Source: MiniMax: MiniMax M3Provenance: MiniMax reports MiniMax M3 at 74.2% on MCP Atlas.

Claw-EvalProvider exact

74.5%Display only

Source: MiniMax: MiniMax M3Provenance: MiniMax reports MiniMax M3 at 74.5% on Claw-Eval.

AA Agentic IndexReported

Artificial Analysis Agentic Index

35.4%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

τ²-bench resultsReported

τ²-Bench Tool-Agent-User Evaluation

88.9%Display only

Source: Artificial Analysis: tau2-bench leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

GDPval-AAReported

GDPval-AA normalized

44.7%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

GDPval-AAReported

1395Display only

Source: Artificial Analysis: gdpval-aa leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

GDPval rubricsProvider exact

74.7%Display only

Source: MiniMaxAI/MiniMax-M3 model cardProvenance: MiniMax reports MiniMax M3 at 74.7% on GDPval rubrics in the model-card comparison chart. BenchLM stores this separately from AA GDPval Elo and normalized GDPval rows.

BankerToolBenchProvider exact

76.1%Display only

Source: MiniMaxAI/MiniMax-M3 model cardProvenance: MiniMax reports MiniMax M3 at 76.1% on BankerToolBench in the model-card comparison chart.

ResearchClawBenchBenchmark exact

19.8%Display only

Source: ResearchClawBench leaderboardProvenance: ResearchClawBench reports this model as ResearchHarness (MiniMax-M3) in the official Pass@1 leaderboard. BenchLM stores the one-decimal RADS average on the local ResearchClawBench display key and excludes it from weighted rankings.

OSWorld 2.0Benchmark exact

4.6%Display only

Source: OSWorld 2.0 paperProvenance: OSWorld 2.0 reports MiniMax M3 single action on its 500-step main table. BenchLM stores the binary completion score and notes the corresponding partial score was 22.3%.

AA BriefcaseReported

Artificial Analysis Briefcase

1110Display only

Source: Artificial Analysis: aa-briefcase leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

AA EnterpriseOps-GymReported

Artificial Analysis EnterpriseOps-Gym

32.1%Display only

Source: Artificial Analysis: enterprise-ops-gym-aa leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

AA Harvey LABReported

Artificial Analysis Harvey LAB-AA

6.7%Display only

Source: Artificial Analysis: harvey-lab-aa leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

Coding11 benchmarks

SWE-bench VerifiedProvider exact

Software Engineering Benchmark Verified

80.5%Weighted 16%

Source: MiniMax: MiniMax M3Provenance: MiniMax reports MiniMax M3 at 80.5% on SWE-Bench Verified.

SWE-bench ProProvider exact

59%Weighted 10%

Source: MiniMax: MiniMax M3Provenance: MiniMax reports MiniMax M3 at 59.0% on SWE-Bench Pro.

Terminal-Bench 2.0Provider exact

66.0%Display only

Source: MiniMax: MiniMax M3Provenance: MiniMax reports MiniMax M3 at 66.0% on Terminal-Bench 2.1. BenchLM stores this on the existing Terminal-Bench display key in both coding and agentic views.

NL2RepoProvider exact

42.1%Display only

Source: MiniMax: MiniMax M3Provenance: MiniMax reports MiniMax M3 at 42.13 on NL2Repo.

AA Coding IndexReported

Artificial Analysis Coding Index

58.6%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Terminal-Bench HardReported

42.4%Display only

Source: Artificial Analysis: terminalbench-hard leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

AA-SciCodeReported

Artificial Analysis SciCode

45.4%Display only

Source: Artificial Analysis: scicode leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

VIBE V2Provider exact

50.1%Display only

Source: MiniMaxAI/MiniMax-M3 model cardProvenance: MiniMax reports MiniMax M3 at 50.1% on VIBE V2 in the model-card comparison chart.

SVG-BenchProvider exact

63.7%Display only

Source: MiniMaxAI/MiniMax-M3 model cardProvenance: MiniMax reports MiniMax M3 at 63.7% on SVG-Bench in the model-card comparison chart.

KernelBench HardProvider exact

28.8%Display only

Source: MiniMaxAI/MiniMax-M3 model cardProvenance: MiniMax reports MiniMax M3 at 28.8% on KernelBench Hard in the model-card comparison chart.

AA Terminal-Bench 2.1Reported

Artificial Analysis Terminal-Bench v2.1

65.2%Display only

Source: Artificial Analysis: terminalbench-v2-1 leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

Reasoning2 benchmarks

AA-LCRReported

Artificial Analysis Long Context Reasoning

74.0%Display only

Source: Artificial Analysis: artificial-analysis-long-context-reasoning leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

CritPtReported

Critical Physics Tasks

3.7%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

Knowledge7 benchmarks

Artificial Analysis Intelligence IndexReported

44.4%Display only

Source: Artificial Analysis: artificial-analysis-intelligence-index leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

AA-GPQA DiamondReported

Artificial Analysis GPQA Diamond

92.9%Display only

Source: Artificial Analysis: gpqa-diamond leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

AA-HLEReported

Artificial Analysis Humanity's Last Exam

37.1%Display only

Source: Artificial Analysis: humanitys-last-exam leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

AA-Omniscience IndexReported

Artificial Analysis Omniscience Index

1.4%Display only

Source: Artificial Analysis: omniscience leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

AA-Omniscience AccuracyReported

Artificial Analysis Omniscience Accuracy

15.0%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA-Omniscience Hallucination RateReported

Artificial Analysis Omniscience Hallucination Rate

16.1%Display only

Source: Artificial Analysis model benchmarksProvenance: Display-only row synced from the current Artificial Analysis model payload. It is excluded from BenchLM weighted scoring.

AA Openness IndexReported

Artificial Analysis Openness Index

33.3%Display only

Source: Artificial Analysis: artificial-analysis-openness-index leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

Math1 benchmark

USAMO 2026Provider exact

United States of America Mathematical Olympiad 2026

85.7%Weighted 10%

Source: MiniMax: MiniMax M3Provenance: MiniMax reports MiniMax M3 at 36/42 on USAMO 2026; BenchLM stores the equivalent 85.71% score.

Multimodal7 benchmarks

MMMU-ProProvider exact

Massive Multi-discipline Multimodal Understanding Pro

78.1%Weighted 45%

Source: MiniMax: MiniMax M3Provenance: MiniMax reports MiniMax M3 at 78.1% on MMMU-Pro.

OfficeQA ProProvider exact

45.1%Weighted 30%

Source: MiniMax: MiniMax M3Provenance: MiniMax reports MiniMax M3 at 45.1% on OfficeQA Pro.

OmniDocBench 1.5Provider exact

91.6%Display only

Source: MiniMax: MiniMax M3Provenance: MiniMax reports MiniMax M3 at 91.6% on OmniDocBench.

VideoMMMUProvider exact

84.6%Display only

Source: MiniMax: MiniMax M3Provenance: MiniMax reports MiniMax M3 at 84.6% on Video-MMMU.

Video-MME (with subtitle)Provider exact

Video-MME with subtitle

85.4%Display only

Source: MiniMax: MiniMax M3Provenance: MiniMax reports MiniMax M3 at 85.4% on VideoMME with subtitles.

Design Arena WebsiteReported

Design Arena Website Elo

1294Display only

Source: OpenRouter model benchmarksProvenance: Display-only Design Arena Website Elo synced from OpenRouter model benchmark metadata. It is excluded from BenchLM weighted scoring.

AA-MMMU-ProReported

Artificial Analysis MMMU-Pro

78.6%Display only

Source: Artificial Analysis: mmmu-pro leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

Inst. Following1 benchmark

AA-IFBenchReported

Artificial Analysis IFBench

82.9%Display only

Source: Artificial Analysis: ifbench leaderboardProvenance: Display-only row synced from the current Artificial Analysis evaluation leaderboard. It is excluded from BenchLM weighted scoring.

MiniMax M3 Family

Base entry

Related Earlier Model

MiniMax M2.7

Frequently Asked Questions

How does MiniMax M3 perform overall in AI benchmarks?

MiniMax M3 currently ranks #28 out of 78 models on BenchLM's provisional leaderboard with an overall score of 70. It also ranks #18 out of 32 on the verified leaderboard. It is created by MiniMax and features a 1M context window.

Is MiniMax M3 good for knowledge and understanding?

MiniMax M3 has visible benchmark coverage in knowledge and understanding, but BenchLM does not currently assign it a global category rank there.

Is MiniMax M3 good for coding and programming?

MiniMax M3 has visible benchmark coverage in coding and programming, but BenchLM does not currently assign it a global category rank there.

Is MiniMax M3 good for mathematics?

MiniMax M3 has visible benchmark coverage in mathematics, but BenchLM does not currently assign it a global category rank there.

Is MiniMax M3 good for reasoning and logic?

MiniMax M3 has visible benchmark coverage in reasoning and logic, but BenchLM does not currently assign it a global category rank there.

Is MiniMax M3 good for agentic tool use and computer tasks?

MiniMax M3 ranks #20 out of 78 models in agentic tool use and computer tasks benchmarks with an average score of 77.9. There are stronger options in this category.

Is MiniMax M3 good for multimodal and grounded tasks?

MiniMax M3 ranks #72 out of 78 models in multimodal and grounded tasks benchmarks with an average score of 49.8. There are stronger options in this category.

Is MiniMax M3 good for instruction following?

MiniMax M3 has visible benchmark coverage in instruction following, but BenchLM does not currently assign it a global category rank there.

Is MiniMax M3 open source?

Yes, MiniMax M3 is an open weight model created by MiniMax, meaning it can be downloaded and run locally or fine-tuned for specific use cases.

Does MiniMax M3 have full benchmark coverage on BenchLM?

Not yet. MiniMax M3 currently has 45 published benchmark scores out of the 313 benchmarks BenchLM tracks. BenchLM only exposes non-generated public benchmark rows, so missing categories stay blank until a sourced evaluation is available.

What is the context window size of MiniMax M3?

MiniMax M3 has a context window of 1M, which determines how much text it can process in a single interaction.

Related Resources

Last updated: July 16, 2026 · Runtime metrics stay blank until BenchLM has a sourced snapshot.

Don't miss the next GPT moment

Which models moved up, what is new, and what it costs. One email each week.

Free. One email per week.

MiniMax M3

Evidence coverage

Evidence by category

Peer position

Category percentile

Category evidence

Chatbot Arena performance

Benchmark Details

MiniMax M3 Family

Frequently Asked Questions

How does MiniMax M3 perform overall in AI benchmarks?

Is MiniMax M3 good for knowledge and understanding?

Is MiniMax M3 good for coding and programming?

Is MiniMax M3 good for mathematics?

Is MiniMax M3 good for reasoning and logic?

Is MiniMax M3 good for agentic tool use and computer tasks?

Is MiniMax M3 good for multimodal and grounded tasks?

Is MiniMax M3 good for instruction following?

Is MiniMax M3 open source?

Does MiniMax M3 have full benchmark coverage on BenchLM?

What is the context window size of MiniMax M3?

Related Resources

Don't miss the next GPT moment