o3-pro Benchmark Scores & Performance

Benchmark analysis of o3-pro by OpenAI across 14 tests.

Creator

OpenAI

Source Type

Proprietary

Reasoning

Reasoning

Context Window

200K

Overall Score

68#25 of 88

Knowledge Benchmarks

MMLU
88
GPQA
89
SuperGPQA
87
OpenBookQA
85

Coding Benchmarks

HumanEval
80

Mathematics Benchmarks

AIME 2023
90
AIME 2024
92
AIME 2025
91
HMMT Feb 2023
86
HMMT Feb 2024
88
HMMT Feb 2025
87
BRUMO 2025
89

Reasoning Benchmarks

SimpleQA
86
MuSR
84

Frequently Asked Questions

How does o3-pro perform overall in AI benchmarks?

o3-pro ranks #25 out of 88 models with an overall score of 68. It is created by OpenAI and features a 200K context window.

Is o3-pro good for knowledge and understanding?

o3-pro ranks #22 out of 88 models in knowledge and understanding benchmarks with an average score of 87.3. There are stronger options in this category.

Is o3-pro good for coding and programming?

o3-pro ranks #24 out of 88 models in coding and programming benchmarks with an average score of 80. There are stronger options in this category.

Is o3-pro good for mathematics?

o3-pro ranks #23 out of 88 models in mathematics benchmarks with an average score of 89. There are stronger options in this category.

Is o3-pro good for reasoning and logic?

o3-pro ranks #22 out of 88 models in reasoning and logic benchmarks with an average score of 85. There are stronger options in this category.

What is the context window size of o3-pro?

o3-pro has a context window of 200K tokens, which determines how much text it can process in a single interaction.