GPT-5.4 Pro Benchmark Scores & Performance

Benchmark analysis of GPT-5.4 Pro by OpenAI across 22 tests.

According to BenchLM.ai, GPT-5.4 Pro ranks #1 out of 100 models with an overall score of 94/100. This places it among the top tier of AI models available in 2026, competing directly with the strongest models from all major AI labs.

GPT-5.4 Pro is a proprietary model with a 1.05M token context window. It uses explicit chain-of-thought reasoning, which typically improves performance on math and complex reasoning tasks at the cost of higher latency and token usage.

GPT-5.4 Pro sits inside the GPT-5.4 family alongside GPT-5.4.

Its strongest category is Knowledge (#1), while its weakest is Knowledge (#1). This performance profile makes it particularly effective for knowledge-intensive tasks like research, analysis, and factual Q&A.

Creator

OpenAI

Source Type

Proprietary

Reasoning

Reasoning

Context Window

1.05M

Overall Score

94#1 of 100

Arena Elo

1472

Family & Lineage

Family

GPT-5.4

Pro

Canonical Entry

GPT-5.4

Sibling Models

Knowledge Benchmarks

MMLU
99
GPQA
99
SuperGPQA
97
OpenBookQA
94
MMLU-Pro
94
HLE
50

Coding Benchmarks

HumanEval
95
SWE-bench Verified
86
LiveCodeBench
86

Mathematics Benchmarks

AIME 2023
99
AIME 2024
99
AIME 2025
99
HMMT Feb 2023
96
HMMT Feb 2024
98
HMMT Feb 2025
97
BRUMO 2025
97
MATH-500
99

Reasoning Benchmarks

SimpleQA
97
MuSR
95
BBH
98

Instruction Following Benchmarks

IFEval
97

Multilingual Benchmarks

MGSM
97

Frequently Asked Questions

How does GPT-5.4 Pro perform overall in AI benchmarks?

GPT-5.4 Pro ranks #1 out of 100 models with an overall score of 94. It is created by OpenAI and features a 1.05M context window.

Is GPT-5.4 Pro good for knowledge and understanding?

GPT-5.4 Pro ranks #1 out of 100 models in knowledge and understanding benchmarks with an average score of 88.8. It is among the top performers in this category.

Is GPT-5.4 Pro good for coding and programming?

GPT-5.4 Pro ranks #1 out of 100 models in coding and programming benchmarks with an average score of 89. It is among the top performers in this category.

Is GPT-5.4 Pro good for mathematics?

GPT-5.4 Pro ranks #1 out of 100 models in mathematics benchmarks with an average score of 98. It is among the top performers in this category.

Is GPT-5.4 Pro good for reasoning and logic?

GPT-5.4 Pro ranks #1 out of 100 models in reasoning and logic benchmarks with an average score of 96.7. It is among the top performers in this category.

Is GPT-5.4 Pro good for instruction following?

GPT-5.4 Pro ranks #1 out of 100 models in instruction following benchmarks with an average score of 97. It is among the top performers in this category.

Is GPT-5.4 Pro good for multilingual tasks?

GPT-5.4 Pro ranks #1 out of 100 models in multilingual tasks benchmarks with an average score of 97. It is among the top performers in this category.

Which sibling models are related to GPT-5.4 Pro?

GPT-5.4 Pro belongs to the GPT-5.4 family. Related variants on BenchLM include GPT-5.4.

What is the context window size of GPT-5.4 Pro?

GPT-5.4 Pro has a context window of 1.05M tokens, which determines how much text it can process in a single interaction.

Last updated: March 9, 2026

Weekly LLM Updates

New model releases, benchmark scores, and leaderboard changes. Every Friday.

Free. Your signup is stored with a derived country code for compliance routing.