Skip to main content

o3-pro

OpenAIEstablishedReleased Apr 16, 2025
Overall Score
Est. 57Prov. #55 of 119
Arena Elo
1242
Categories Ranked
8of 8
Price (1M tokens)
$20 in / $80 out
Speed
27tok/s
Context
200K
ProprietaryReasoning
Confidence
pro

According to BenchLM.ai, o3-pro ranks #55 out of 119 models on the provisional leaderboard with an overall score of 57/100. It does not yet have enough sourced coverage for BenchLM's verified leaderboard. While not a frontier model, it offers specific advantages depending on the use case.

o3-pro is a proprietary model with a 200K token context window. It uses explicit chain-of-thought reasoning, which typically improves performance on math and complex reasoning tasks at the cost of higher latency and token usage.

o3-pro sits inside the o3 family alongside o3, o3-mini. This profile currently has 2 of 225 tracked benchmarks. BenchLM only exposes non-generated benchmark rows publicly, so missing categories stay blank until a sourced evaluation is available.

Its strongest category is Mathematics (#17), while its weakest is Instruction Following (#74). This performance profile makes it particularly strong for mathematical reasoning, scientific computing, and quantitative analysis.

Ranking Distribution

Category rank across 8 benchmark categories — sorted by best rank

Category Performance

Scores across all benchmark categories (0-100 scale)

Category Breakdown

Agentic

#28
62.9/ 100
Weight: 22%0 benchmarks
Terminal-Bench 2.0BrowseCompOSWorld-VerifiedGAIATAU-benchWebArena

Coding

#49
55.1/ 100
Weight: 20%0 benchmarks
SWE-bench VerifiedLiveCodeBenchSWE-bench ProSWE-RebenchSciCode

Reasoning

#23
70.4/ 100
Weight: 17%0 benchmarks
MuSRLongBench v2MRCRv2ARC-AGI-2

Knowledge

#39
66.8/ 100
Weight: 12%2 benchmarks
GPQASuperGPQAMMLU-ProHLEFrontierScienceSimpleQA

Math

#17
86.4/ 100
Weight: 5%0 benchmarks
AIME 2025BRUMO 2025MATH-500FrontierMath

Multilingual

#41
66.3/ 100
Weight: 7%0 benchmarks
MGSMMMLU-ProX

Multimodal

#43
64.1/ 100
Weight: 12%0 benchmarks
MMMU-ProOfficeQA ProCharXivCharXiv w/o tools

Inst. Following

#74
54.9/ 100
Weight: 5%0 benchmarks
IFEvalIFBench

Chatbot Arena Performance

Text Overall1242

Benchmark Details

Only benchmark rows with an attached exact-source record are shown here. Source-unverified manual rows and generated rows are hidden from model pages.

o3 Family

Pro

Canonical Entry

o3

Frequently Asked Questions

How does o3-pro perform overall in AI benchmarks?

o3-pro currently ranks #55 out of 119 models on BenchLM's provisional leaderboard with an overall score of 57 (estimated). It is created by OpenAI and features a 200K context window.

Is o3-pro good for knowledge and understanding?

o3-pro ranks #39 out of 119 models in knowledge and understanding benchmarks with an average score of 66.8. There are stronger options in this category.

Which sibling models are related to o3-pro?

o3-pro belongs to the o3 family. Related variants on BenchLM include o3, o3-mini.

Does o3-pro have full benchmark coverage on BenchLM?

Not yet. o3-pro currently has 2 published benchmark scores out of the 225 benchmarks BenchLM tracks. BenchLM only exposes non-generated public benchmark rows, so missing categories stay blank until a sourced evaluation is available.

What is the context window size of o3-pro?

o3-pro has a context window of 200K, which determines how much text it can process in a single interaction.

Last updated: June 2, 2026 · Runtime metrics stay blank until BenchLM has a sourced snapshot.

Don't miss the next GPT moment

Which models moved up, what’s new, and what it costs. One email a week, 3-min read.

Free. One email per week.