OpenAI MRCR v2 8-needle 64K-128K (MRCR v2 64K-128K)

MRCR v2 slice focused on long-context retrieval at 64K-128K lengths.

Top Models on MRCR v2 64K-128K — March 2026

As of March 2026, GPT-5.4 leads the MRCR v2 64K-128K leaderboard with 86% , followed by GPT-5.4 mini (47.7%) and GPT-5.4 nano (44.2%).

4 modelsReasoningUpdated March 17, 2026

According to BenchLM.ai, GPT-5.4 leads the MRCR v2 64K-128K benchmark with a score of 86%, followed by GPT-5.4 mini (47.7%) and GPT-5.4 nano (44.2%). There is significant spread across the leaderboard, making this benchmark effective at differentiating model capabilities.

4 models have been evaluated on MRCR v2 64K-128K. The benchmark falls in the Reasoning category, which carries a 17% weight in BenchLM.ai's overall scoring system. MRCR v2 64K-128K is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About MRCR v2 64K-128K

Year

2026

Tasks

8-needle retrieval tasks

Format

Long-context retrieval

Difficulty

Long-context reasoning

Measures whether models can recover the right details when multiple relevant items are buried in long contexts.

Introducing GPT-5.4 mini and nano

Leaderboard (4 models)

#1GPT-5.4
86%
#2GPT-5.4 mini
47.7%
#3GPT-5.4 nano
44.2%
#4GPT-5 mini
35.1%

FAQ

What does MRCR v2 64K-128K measure?

MRCR v2 slice focused on long-context retrieval at 64K-128K lengths.

Which model scores highest on MRCR v2 64K-128K?

GPT-5.4 by OpenAI currently leads with a score of 86% on MRCR v2 64K-128K.

How many models are evaluated on MRCR v2 64K-128K?

4 AI models have been evaluated on MRCR v2 64K-128K on BenchLM.

Last updated: March 17, 2026

Weekly LLM Benchmark Digest

Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.

Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.