Skip to main content

BrowseComp

A benchmark for web-browsing agents that must search, inspect sources, gather evidence, and return the correct answer to research-oriented questions.

Top models on BrowseComp — June 13, 2026

As of June 13, 2026, GPT-5.5 Pro leads the BrowseComp leaderboard with 90.1% , followed by GPT-5.4 Pro (89.3%) and Claude Mythos 5 (88%).

26 modelsAgentic18% of category scoreCurrentUpdated June 13, 2026

According to BenchLM.ai, GPT-5.5 Pro leads the BrowseComp benchmark with a score of 90.1%, followed by GPT-5.4 Pro (89.3%) and Claude Mythos 5 (88%). The top models are clustered within 2.1 points, suggesting this benchmark is nearing saturation for frontier models.

26 models have been evaluated on BrowseComp. The benchmark falls in the Agentic category. This category carries a 22% weight in BenchLM.ai's overall scoring system. Within that category, BrowseComp contributes 18% of the category score, so strong performance here directly affects a model's overall ranking.

About BrowseComp

Year

2025

Tasks

Research questions requiring browsing

Format

Web search and evidence synthesis

Difficulty

Hard web research

BrowseComp is designed to measure real web research behavior, not just latent world knowledge. It rewards models that can plan searches, inspect multiple pages, and avoid shallow answer synthesis.

BenchLM freshness & provenance

Version

BrowseComp 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

Current

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Leaderboard (26 models)

1
90.1%
2
89.3%
3
88%
4
86.9%
5
84.4%
6
84.3%
7
83.7%
8
83.5%
9
83.4%
10
83.2%
11
82.7%
12
80.4%
13
79.3%
14
75.8%
15
73.2%
16
68%
17
65.8%
18
63.8%
19
62%
20
61%
21
61%
22
60.6%
23
60.6%
24
53.5%
25
52%
26
44.4%

FAQ

What does BrowseComp measure?

A benchmark for web-browsing agents that must search, inspect sources, gather evidence, and return the correct answer to research-oriented questions.

Which model scores highest on BrowseComp?

GPT-5.5 Pro by OpenAI currently leads with a score of 90.1% on BrowseComp.

How many models are evaluated on BrowseComp?

26 AI models have been evaluated on BrowseComp on BenchLM.

Last updated: June 13, 2026 · BenchLM version BrowseComp 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.