Best Multimodal & Grounded AI Models in 2026

Multimodal and grounded benchmarks test whether a model can reason over visual content — images, charts, documents, screenshots, and spreadsheets — not just process plain text. This category carries a 12% weight in BenchLM.ai's overall score. MMMU-Pro tests frontier-difficulty visual reasoning, while OfficeQA Pro focuses on enterprise document workflows. For products where users upload images, share PDFs, or need models to read dashboards and data tables, scores here are a better predictor of real performance than chat-only benchmarks. Most top proprietary models are competitive; open-weight models show wider spread.

According to BenchLM.ai, GPT-5.2 Pro leads this ranking with a score of 96, followed by GPT-5.4 (95.5) and GPT-5.2 (95). The top three are separated by just a few points — any of them would perform well for this use case.

The best open-weight option is GLM-5 (Reasoning) (ranked #30 with a score of 78.5). Proprietary models hold a clear advantage in this category, though open-weight options may suffice for less demanding use cases.

This ranking is based on average scores across all multimodalGrounded benchmarks tracked by BenchLM.ai. For detailed model profiles, click any model name below. To compare two specific models head-to-head, use the "vs #" links.

GPT-5.2 Pro
OpenAIProprietary400K

96

avg

GPT-5.4
OpenAIProprietary1.05M

95.5

avg

GPT-5.2
OpenAIProprietary400K

95

avg

4
GPT-5.3 Instant
OpenAIProprietary128K

95

avg

5
Gemini 3.1 Pro
GoogleProprietary1M

95

avg

6
Gemini 3 Pro Deep Think
GoogleProprietary2M

95

avg

7
GPT-5.4 Pro
OpenAIProprietary1.05M

94.9

avg

8
Claude Opus 4.6
AnthropicProprietary1M

94.6

avg

9
Grok 4.1
xAIProprietary1M

93.2

avg

10
GPT-5.2 Instant
OpenAIProprietary128K

93.1

avg

11
Gemini 3 Pro
GoogleProprietary2M

93.1

avg

12
Claude Sonnet 4.6
AnthropicProprietary200K

91.9

avg

13
GPT-5.1
OpenAIProprietary200K

91.8

avg

14
Claude Sonnet 4.5
AnthropicProprietary200K

91.4

avg

15
GPT-5.3 Codex
OpenAIProprietary400K

91.3

avg

16
Claude Opus 4.5
AnthropicProprietary200K

90.9

avg

17
GPT-5 (high)
OpenAIProprietary128K

89.4

avg

18
GPT-5.3-Codex-Spark
OpenAIProprietary256K

88.3

avg

19
GPT-5.1-Codex-Max
OpenAIProprietary400K

88.2

avg

20
GPT-5 (medium)
OpenAIProprietary128K

88.1

avg

21
GPT-5.2-Codex
OpenAIProprietary400K

87.6

avg

22
Grok 4.1 Fast
xAIProprietary1M

87.4

avg

23
Gemini 2.5 Pro
GoogleProprietary1M

85.1

avg

24
GPT-5 mini
OpenAIProprietary128K

83.8

avg

25
Claude 4.1 Opus
AnthropicProprietary200K

80.7

avg

26
Claude 4 Sonnet
AnthropicProprietary200K

79.7

avg

27
Seed 1.6
ByteDanceProprietary256K

79.6

avg

28
Seed-2.0-Lite
ByteDanceProprietary256K

79.6

avg

29
Gemini 3 Flash
GoogleProprietary1M

79.6

avg

30
GLM-5 (Reasoning)
Zhipu AIOpen Weight200K

78.5

avg

31
Claude Haiku 4.5
AnthropicProprietary200K

78.4

avg

32
Grok 4
xAIProprietary128K

78.2

avg

33
MiMo-V2-Flash
XiaomiOpen Weight128K

75.8

avg

34
o1-preview
OpenAIProprietary200K

75.6

avg

35
Mistral Large 3
MistralProprietary128K

75.5

avg

36
Claude 3.5 Sonnet
AnthropicProprietary200K

74.8

avg

37
o3-mini
OpenAIProprietary200K

74.4

avg

38
Kimi K2.5 (Reasoning)
Moonshot AIProprietary128K

74.3

avg

39
o3-pro
OpenAIProprietary200K

74.1

avg

40
Gemini 1.5 Pro
GoogleProprietary2M

74.1

avg

41
GPT-4.1
OpenAIProprietary1M

73.6

avg

42
Seed 1.6 Flash
ByteDanceProprietary256K

73.1

avg

43
Gemini 3.1 Flash-Lite
GoogleProprietary1M

73.1

avg

44
Seed-2.0-Mini
ByteDanceProprietary256K

73.1

avg

45
o3
OpenAIProprietary200K

72.3

avg

46
GPT-4o
OpenAIProprietary128K

72.2

avg

47
Ministral 3 14B (Reasoning)
MistralOpen Weight128K

71.5

avg

48
DeepSeek V3.2 (Thinking)
DeepSeekOpen Weight128K

71

avg

49
Qwen3.5 397B (Reasoning)
AlibabaOpen Weight128K

70.8

avg

50
o1
OpenAIProprietary200K

70.7

avg

51
GLM-4.7
Zhipu AIOpen Weight200K

70.5

avg

52
Ministral 3 14B
MistralOpen Weight128K

70.5

avg

53
Claude 3 Opus
AnthropicProprietary200K

70.3

avg

54
GPT-4.1 mini
OpenAIProprietary1M

69.6

avg

55
GLM-5
Zhipu AIOpen Weight200K

69.2

avg

56
Claude 3 Haiku
AnthropicProprietary200K

68.7

avg

57
Qwen2.5-1M
AlibabaOpen Weight1M

68.4

avg

58
Mercury 2
InceptionProprietary128K

68.3

avg

59
o4-mini (high)
OpenAIProprietary200K

68.3

avg

60
DeepSeekMath V2
DeepSeekOpen Weight128K

68.1

avg

61
Gemini 1.0 Pro
GoogleProprietary32K

68.1

avg

62
Gemini 2.5 Flash
GoogleProprietary1M

67.7

avg

63
Nemotron 3 Ultra 500B
NVIDIAOpen Weight10M

66.9

avg

64
Step 3.5 Flash
StepFunOpen Weight256K

66.7

avg

65
Qwen2.5-72B
AlibabaOpen Weight128K

66.7

avg

66
DeepSeek V3.2
DeepSeekOpen Weight128K

66

avg

67
Aion-2.0
Aion LabsProprietary128K

66

avg

68
Kimi K2.5
Moonshot AIOpen Weight128K

64.6

avg

69
DeepSeek LLM 2.0
DeepSeekOpen Weight128K

64.5

avg

70
GLM-4.7-Flash
Zhipu AIOpen Weight200K

62.5

avg

71
Llama 3.1 405B
MetaOpen Weight128K

62.3

avg

72
MiniMax M2.5
MiniMaxProprietary128K

62

avg

73
Qwen3.5 397B
AlibabaOpen Weight128K

61.4

avg

74
Mistral Large 2
MistralProprietary128K

61

avg

75
Nemotron 3 Super 120B A12B
NVIDIAOpen Weight256K

60.4

avg

76
Nemotron 3 Super 100B
NVIDIAOpen Weight1M

60.4

avg

77
GPT-4o mini
OpenAIProprietary128K

60.2

avg

78
GPT-4.1 nano
OpenAIProprietary1M

59.3

avg

79
Claude 4.1 Opus Thinking
AnthropicProprietary200K

59.3

avg

80
DeepSeek Coder 2.0
DeepSeekOpen Weight128K

58.6

avg

81
Llama 4 Scout
MetaOpen Weight10M

57.8

avg

82
Llama 4 Maverick
MetaOpen Weight1M

56.8

avg

83
GPT-5 nano
OpenAIProprietary400K

56.7

avg

84
GPT-4 Turbo
OpenAIProprietary128K

55.3

avg

85
Llama 4 Behemoth
MetaOpen Weight32K

55.1

avg

86
Moonshot v1
Moonshot AIProprietary128K

52.6

avg

87
Llama 3 70B
MetaOpen Weight128K

52.3

avg

88
Qwen2.5-VL-32B
AlibabaOpen Weight32K

52.2

avg

89
Z-1
ZProprietary128K

50.5

avg

90
Grok Code Fast 1
xAIProprietary256K

50.4

avg

91
Nemotron-4 15B
NVIDIAOpen Weight32K

49.6

avg

92
GPT-OSS 120B
OpenAIOpen Weight128K

48.8

avg

93
o1-pro
OpenAIProprietary200K

48.5

avg

94
Mistral 8x7B
MistralOpen Weight32K

48.3

avg

95
DeepSeek-R1
DeepSeekOpen Weight128K

47.5

avg

96
Phi-4
MicrosoftOpen Weight16K

46.8

avg

97
Nemotron 3 Nano 30B
NVIDIAOpen Weight32K

45.2

avg

98
Nemotron Ultra 253B
NVIDIAOpen Weight32K

44.7

avg

99
Grok 3 [Beta]
xAIProprietary128K

43.2

avg

100
Qwen3 235B 2507 (Reasoning)
AlibabaOpen Weight128K

42.1

avg

101
Gemma 3 27B
GoogleOpen Weight32K

41.7

avg

102
LFM2-24B-A2B
LiquidAIProprietary32K

41.7

avg

103
Qwen3 235B 2507
AlibabaOpen Weight128K

41.6

avg

104
DeepSeek V3.1 (Reasoning)
DeepSeekOpen Weight128K

41.5

avg

105
Nova Pro
Nova AIProprietary128K

41.1

avg

106
GLM-4.5
TsinghuaProprietary128K

41

avg

107
GLM-4.5-Air
TsinghuaProprietary128K

39.6

avg

108
DeepSeek V3.1
DeepSeekOpen Weight128K

39.5

avg

109
Kimi K2
Moonshot AIProprietary128K

39.5

avg

110
MiniMax M1 80k
MiniMaxProprietary80K

39

avg

111
GPT-OSS 20B
OpenAIOpen Weight128K

36

avg

112
DBRX Instruct
DatabricksOpen Weight32K

35.6

avg

113
Mixtral 8x22B Instruct v0.1
MistralOpen Weight64K

35.5

avg

114
Ministral 3 8B (Reasoning)
MistralOpen Weight128K

33.4

avg

115
LFM2.5-1.2B-Thinking
LiquidAIProprietary32K

32.4

avg

116
Ministral 3 8B
MistralOpen Weight128K

32.4

avg

117
Mistral 7B v0.3
MistralOpen Weight32K

32.4

avg

118
LFM2.5-1.2B-Instruct
LiquidAIProprietary32K

32.4

avg

119
Mistral 8x7B v0.2
MistralOpen Weight32K

32.3

avg

120
Ministral 3 3B (Reasoning)
MistralOpen Weight128K

30.4

avg

121
Ministral 3 3B
MistralOpen Weight128K

30.4

avg

Key Takeaways

  • According to BenchLM.ai, the top model is GPT-5.2 Pro by OpenAI with a score of 96.
  • The best open-weight model in this ranking is GLM-5 (Reasoning) at position #30.
  • 121 models are included in this ranking.
Last updated: March 12, 2026

Weekly LLM Benchmark Digest

Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.

Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.