GPT-4o mini Benchmark Scores & Performance

Benchmark analysis of GPT-4o mini by OpenAI across 3 tests.

According to BenchLM.ai, GPT-4o mini ranks #75 out of 100 models with an overall score of 43/100. While not a frontier model, it offers specific advantages depending on the use case.

GPT-4o mini is a proprietary model with a 128K token context window. It processes queries without explicit chain-of-thought reasoning, offering faster response times and lower token usage.

GPT-4o mini sits inside the GPT-4o family alongside GPT-4o. This profile currently has 3 of 22 tracked benchmarks, so the overall score is conservative until the rest of the suite is filled in.

Its strongest category is Coding (#4), while its weakest is Mathematics (#97). This performance profile makes it particularly well-suited for software development and code generation tasks.

Creator

OpenAI

Source Type

Proprietary

Reasoning

Non-Reasoning

Context Window

128K

Overall Score

43#75 of 100

Family & Lineage

Family

GPT-4o

Mini

Canonical Entry

GPT-4o

Sibling Models

Knowledge Benchmarks

MMLU
82

Coding Benchmarks

HumanEval
87.2

Multilingual Benchmarks

MGSM
87

Frequently Asked Questions

How does GPT-4o mini perform overall in AI benchmarks?

GPT-4o mini ranks #75 out of 100 models with an overall score of 43. It is created by OpenAI and features a 128K context window.

Is GPT-4o mini good for knowledge and understanding?

GPT-4o mini ranks #11 out of 100 models in knowledge and understanding benchmarks with an average score of 82. There are stronger options in this category.

Is GPT-4o mini good for coding and programming?

GPT-4o mini ranks #4 out of 100 models in coding and programming benchmarks with an average score of 87.2. It is among the top performers in this category.

Is GPT-4o mini good for multilingual tasks?

GPT-4o mini ranks #24 out of 100 models in multilingual tasks benchmarks with an average score of 87. There are stronger options in this category.

Which sibling models are related to GPT-4o mini?

GPT-4o mini belongs to the GPT-4o family. Related variants on BenchLM include GPT-4o.

Does GPT-4o mini have full benchmark coverage on BenchLM?

Not yet. GPT-4o mini currently has 3 sourced benchmark scores out of the 22 benchmarks BenchLM tracks, so its overall score is intentionally conservative until more results are added.

What is the context window size of GPT-4o mini?

GPT-4o mini has a context window of 128K tokens, which determines how much text it can process in a single interaction.

Last updated: March 9, 2026

Weekly LLM Updates

New model releases, benchmark scores, and leaderboard changes. Every Friday.

Free. Your signup is stored with a derived country code for compliance routing.