Skip to main content

Best Large Context Window LLMs in 2026

AI models with the largest context windows (200K+ tokens), ranked by benchmark performance.

Unless noted otherwise, ranking surfaces on this page use BenchLM's provisional leaderboard lane rather than the stricter sourced-only verified leaderboard.

Bottom line: A large context window means nothing if the model can't actually use it. Claude Mythos Preview and Gemini 3.1 Pro both have 1M+ context and the benchmarks to back it up.

According to BenchLM.ai, Claude Mythos Preview leads this ranking with a score of 99, followed by Claude Opus 4.8 (95) and Gemini 3.1 Pro (92). There is meaningful separation between the top models, suggesting genuine performance differences.

The best open-weight option is DeepSeek V4 Pro (Max) (ranked #12 with a score of 87). Proprietary models hold a clear advantage in this category, though open-weight options may suffice for less demanding use cases.

This ranking is based on provisional overall weighted scores across BenchLM.ai's scoring formula tracked by BenchLM.ai. For detailed model profiles, click any model name below. To compare two specific models head-to-head, use the "vs #" links.

What changed

Claude Mythos Preview leads large-context models with 1M context and the highest overall score.

Gemini 3.1 Pro 1M context with strong reasoning (97) — best non-reasoning large-context model.

GPT-5.4 1.05M context — largest window among the top 3 overall models.

How to choose

Full Rankings (71 models)

Claude Mythos Preview
Anthropic·Proprietary·1M

99

prov. overall

Claude Opus 4.8
Anthropic·Proprietary·1M

95

prov. overall

Gemini 3.1 Pro
Google·Proprietary·1M

92

prov. overall

4
GPT-5.5
OpenAI·Proprietary·1M

91

prov. overall

5
Qwen3.7 Max
Alibaba·Proprietary·1M

91

prov. overall

6
GPT-5.4 Pro
OpenAI·Proprietary·1.05M

91

prov. overall

7
Gemini 3 Pro Deep Think
Google·Proprietary·2M

90

prov. overall

8
Grok 4.1
xAI·Proprietary·1M

90

prov. overall

9
GPT-5.4
OpenAI·Proprietary·1.05M

89

prov. overall

10
Claude Opus 4.6
Anthropic·Proprietary·1M

87

prov. overall

11
Gemini 3.5 Flash
Google·Proprietary·1M

87

prov. overall

12
DeepSeek V4 Pro (Max)
DeepSeek·Open Weight·1M

87

prov. overall

13
GPT-5.3 Codex
OpenAI·Proprietary·400K

86

prov. overall

14
Claude Opus 4.7 (Adaptive)
Anthropic·Proprietary·1M

85

prov. overall

15
Kimi K2.6
Moonshot AI·Open Weight·256K

84

prov. overall

16
Claude Sonnet 4.6
Anthropic·Proprietary·200K

83

prov. overall

17
DeepSeek V4 Pro (High)
DeepSeek·Open Weight·1M

83

prov. overall

18
o1-preview
OpenAI·Proprietary·200K

83

prov. overall

19
GLM-5.1
Z.AI·Open Weight·203K

82

prov. overall

20
Gemini 3 Pro
Google·Proprietary·2M

81

prov. overall

21
GLM-5 (Reasoning)
Z.AI·Open Weight·200K

80

prov. overall

22
GPT-5.2
OpenAI·Proprietary·400K

79

prov. overall

23
GPT-5.1
OpenAI·Proprietary·200K

78

prov. overall

24
Claude Opus 4.5
Anthropic·Proprietary·200K

76

prov. overall

25
MiniMax M3
MiniMax·Open Weight·1M

76

prov. overall

26
GPT-5.2-Codex
OpenAI·Proprietary·400K

76

prov. overall

27
DeepSeek V4 Flash (Max)
DeepSeek·Open Weight·1M

75

prov. overall

28
GPT-5.1-Codex-Max
OpenAI·Proprietary·400K

75

prov. overall

29
Qwen3.6 Plus
Alibaba·Proprietary·1M

73

prov. overall

30
Qwen3.6-27B
Alibaba·Open Weight·262K

73

prov. overall

31
Grok 4.20
xAI·Proprietary·2M

72

prov. overall

32
DeepSeek V4 Flash (High)
DeepSeek·Open Weight·1M

71

prov. overall

33
DeepSeek V4 Pro
DeepSeek·Open Weight·1M

69

prov. overall

34
Grok 4.1 Fast
xAI·Proprietary·1M

69

prov. overall

35
GLM-4.7
Z.AI·Open Weight·200K

68

prov. overall

36
GLM-5
Z.AI·Open Weight·200K

67

prov. overall

37
Qwen3.6-35B-A3B
Alibaba·Open Weight·262K

66

prov. overall

38
Claude Sonnet 4.5
Anthropic·Proprietary·200K

65

prov. overall

39
Kimi K2.5
Moonshot AI·Open Weight·256K

64

prov. overall

40
Qwen3.5-122B-A10B
Alibaba·Open Weight·262K

64

prov. overall

41
Gemini 2.5 Pro
Google·Proprietary·1M

64

prov. overall

42
Qwen3.5-27B
Alibaba·Open Weight·262K

62

prov. overall

43
MiMo-V2-Flash
Xiaomi·Open Weight·256K

59

prov. overall

44
DeepSeek V4 Flash
DeepSeek·Open Weight·1M

57

prov. overall

45
GPT-4.1
OpenAI·Proprietary·1M

57

prov. overall

46
o1
OpenAI·Proprietary·200K

57

prov. overall

47
o3
OpenAI·Proprietary·200K

57

prov. overall

48
o3-pro
OpenAI·Proprietary·200K

57

prov. overall

49
Qwen3.5-35B-A3B
Alibaba·Open Weight·262K

56

prov. overall

50
Claude Haiku 4.5
Anthropic·Proprietary·200K

56

prov. overall

51
Gemini 3 Flash
Google·Proprietary·1M

56

prov. overall

52
o3-mini
OpenAI·Proprietary·200K

55

prov. overall

53
MiniMax M2.7
MiniMax·Open Weight·200K

54

prov. overall

54
Claude 4.1 Opus
Anthropic·Proprietary·200K

51

prov. overall

55
Qwen2.5-1M
Alibaba·Open Weight·1M

51

prov. overall

56
Claude 4 Sonnet
Anthropic·Proprietary·200K

50

prov. overall

57
Gemini 3.1 Flash-Lite
Google·Proprietary·1M

48

prov. overall

58
GPT-4.1 mini
OpenAI·Proprietary·1M

45

prov. overall

59
o4-mini (high)
OpenAI·Proprietary·200K

44

prov. overall

60
Claude 4.1 Opus Thinking
Anthropic·Proprietary·200K

43

prov. overall

61
Nemotron 3 Super 100B
NVIDIA·Open Weight·1M

43

prov. overall

62
Claude 3.5 Sonnet
Anthropic·Proprietary·200K

40

prov. overall

63
Grok Code Fast 1
xAI·Proprietary·256K

39

prov. overall

64
Gemini 2.5 Flash
Google·Proprietary·1M

37

prov. overall

65
Gemini 1.5 Pro
Google·Proprietary·2M

35

prov. overall

66
Claude 3 Opus
Anthropic·Proprietary·200K

34

prov. overall

67
o1-pro
OpenAI·Proprietary·200K

29

prov. overall

68
GPT-4.1 nano
OpenAI·Proprietary·1M

27

prov. overall

69
Claude 3 Haiku
Anthropic·Proprietary·200K

23

prov. overall

70
Llama 4 Scout
Meta·Open Weight·10M

22

prov. overall

71
Llama 4 Maverick
Meta·Open Weight·1M

17

prov. overall

These rankings update weekly

Get notified when models move. One email a week with what changed and why.

Free. No spam. Unsubscribe anytime.

Key Takeaways

The top model is Claude Mythos Preview by Anthropic with a provisional score of 99.

The best open-weight model is DeepSeek V4 Pro (Max) at position #12.

71 models are included in this ranking.

Score in Context

What these scores mean

Models are filtered by context window (200K+ tokens) and ranked by overall BenchLM score. A large context window alone is not enough — check long-context benchmark scores for actual retrieval and reasoning quality.

Known limitations

Context window size is self-reported by providers. Actual usable context may be smaller due to edge degradation. Long-context benchmarks test specific patterns — real workloads may differ.

Last updated: June 2, 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.