GPT-4.1 mini Benchmark Scores & Performance

Benchmark analysis of GPT-4.1 mini by OpenAI across 5 tests.

According to BenchLM.ai, GPT-4.1 mini ranks #91 out of 100 models with an overall score of 35/100. While not a frontier model, it offers specific advantages depending on the use case.

GPT-4.1 mini is a proprietary model with a 1M token context window. It processes queries without explicit chain-of-thought reasoning, offering faster response times and lower token usage.

GPT-4.1 mini sits inside the GPT-4.1 family alongside GPT-4.1, GPT-4.1 nano. BenchLM links it directly to GPT-4o mini as the earlier related model in that lineage. This profile currently has 5 of 22 tracked benchmarks, so the overall score is conservative until the rest of the suite is filled in.

Its strongest category is Instruction Following (#22), while its weakest is Mathematics (#95). This performance profile makes it a well-rounded choice across a range of tasks.

Creator

OpenAI

Source Type

Proprietary

Reasoning

Non-Reasoning

Context Window

1M

Overall Score

35#91 of 100

Family & Lineage

Family

GPT-4.1

Mini

Canonical Entry

GPT-4.1

Related Earlier Model

GPT-4o mini

Knowledge Benchmarks

MMLU
87.5
GPQA
64.2

Coding Benchmarks

SWE-bench Verified
23.6

Mathematics Benchmarks

AIME 2024
23.1

Instruction Following Benchmarks

IFEval
88.5

Frequently Asked Questions

How does GPT-4.1 mini perform overall in AI benchmarks?

GPT-4.1 mini ranks #91 out of 100 models with an overall score of 35. It is created by OpenAI and features a 1M context window.

Is GPT-4.1 mini good for knowledge and understanding?

GPT-4.1 mini ranks #28 out of 100 models in knowledge and understanding benchmarks with an average score of 75.9. There are stronger options in this category.

Is GPT-4.1 mini good for coding and programming?

GPT-4.1 mini ranks #78 out of 100 models in coding and programming benchmarks with an average score of 23.6. There are stronger options in this category.

Is GPT-4.1 mini good for mathematics?

GPT-4.1 mini ranks #95 out of 100 models in mathematics benchmarks with an average score of 23.1. There are stronger options in this category.

Is GPT-4.1 mini good for instruction following?

GPT-4.1 mini ranks #22 out of 100 models in instruction following benchmarks with an average score of 88.5. There are stronger options in this category.

Which sibling models are related to GPT-4.1 mini?

GPT-4.1 mini belongs to the GPT-4.1 family. Related variants on BenchLM include GPT-4.1, GPT-4.1 nano.

Does GPT-4.1 mini have full benchmark coverage on BenchLM?

Not yet. GPT-4.1 mini currently has 5 sourced benchmark scores out of the 22 benchmarks BenchLM tracks, so its overall score is intentionally conservative until more results are added.

What is the context window size of GPT-4.1 mini?

GPT-4.1 mini has a context window of 1M tokens, which determines how much text it can process in a single interaction.

Last updated: March 9, 2026

Weekly LLM Updates

New model releases, benchmark scores, and leaderboard changes. Every Friday.

Free. Your signup is stored with a derived country code for compliance routing.