OpenAI MRCR v2 8-needle 128K-256K (MRCR v2 128K-256K)

MRCR v2 slice focused on very long contexts at 128K-256K lengths.

Top Models on MRCR v2 128K-256K — March 2026

As of March 2026, GPT-5.4 leads the MRCR v2 128K-256K leaderboard with 79.3% , followed by GPT-5.4 mini (33.6%) and GPT-5.4 nano (33.1%).

4 modelsReasoningUpdated March 17, 2026

According to BenchLM.ai, GPT-5.4 leads the MRCR v2 128K-256K benchmark with a score of 79.3%, followed by GPT-5.4 mini (33.6%) and GPT-5.4 nano (33.1%). There is significant spread across the leaderboard, making this benchmark effective at differentiating model capabilities.

4 models have been evaluated on MRCR v2 128K-256K. The benchmark falls in the Reasoning category, which carries a 17% weight in BenchLM.ai's overall scoring system. MRCR v2 128K-256K is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About MRCR v2 128K-256K

Year

2026

Tasks

8-needle retrieval tasks

Format

Very-long-context retrieval

Difficulty

Very long-context reasoning

A harder MRCR setting that stresses memory discipline and retrieval deeper into long contexts.

Introducing GPT-5.4 mini and nano

Leaderboard (4 models)

#1GPT-5.4
79.3%
#2GPT-5.4 mini
33.6%
#3GPT-5.4 nano
33.1%
#4GPT-5 mini
19.4%

FAQ

What does MRCR v2 128K-256K measure?

MRCR v2 slice focused on very long contexts at 128K-256K lengths.

Which model scores highest on MRCR v2 128K-256K?

GPT-5.4 by OpenAI currently leads with a score of 79.3% on MRCR v2 128K-256K.

How many models are evaluated on MRCR v2 128K-256K?

4 AI models have been evaluated on MRCR v2 128K-256K on BenchLM.

Last updated: March 17, 2026

Weekly LLM Benchmark Digest

Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.

Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.