Benchmark analysis of GPT-4.1 mini by OpenAI across 5 tests.
According to BenchLM.ai, GPT-4.1 mini ranks #91 out of 100 models with an overall score of 35/100. While not a frontier model, it offers specific advantages depending on the use case.
GPT-4.1 mini is a proprietary model with a 1M token context window. It processes queries without explicit chain-of-thought reasoning, offering faster response times and lower token usage.
GPT-4.1 mini sits inside the GPT-4.1 family alongside GPT-4.1, GPT-4.1 nano. BenchLM links it directly to GPT-4o mini as the earlier related model in that lineage. This profile currently has 5 of 22 tracked benchmarks, so the overall score is conservative until the rest of the suite is filled in.
Its strongest category is Instruction Following (#22), while its weakest is Mathematics (#95). This performance profile makes it a well-rounded choice across a range of tasks.
Creator
OpenAI
Source Type
ProprietaryReasoning
Non-ReasoningContext Window
1M
Overall Score
Sibling Models
GPT-4.1 mini ranks #91 out of 100 models with an overall score of 35. It is created by OpenAI and features a 1M context window.
GPT-4.1 mini ranks #28 out of 100 models in knowledge and understanding benchmarks with an average score of 75.9. There are stronger options in this category.
GPT-4.1 mini ranks #78 out of 100 models in coding and programming benchmarks with an average score of 23.6. There are stronger options in this category.
GPT-4.1 mini ranks #95 out of 100 models in mathematics benchmarks with an average score of 23.1. There are stronger options in this category.
GPT-4.1 mini ranks #22 out of 100 models in instruction following benchmarks with an average score of 88.5. There are stronger options in this category.
GPT-4.1 mini belongs to the GPT-4.1 family. Related variants on BenchLM include GPT-4.1, GPT-4.1 nano.
Not yet. GPT-4.1 mini currently has 5 sourced benchmark scores out of the 22 benchmarks BenchLM tracks, so its overall score is intentionally conservative until more results are added.
GPT-4.1 mini has a context window of 1M tokens, which determines how much text it can process in a single interaction.
New model releases, benchmark scores, and leaderboard changes. Every Friday.
Free. Your signup is stored with a derived country code for compliance routing.