Artificial Analysis MMMU-Pro (AA-MMMU-Pro)

A display-only Artificial Analysis MMMU-Pro score.

Benchmark score on AA-MMMU-Pro — July 4, 2026

BenchLM mirrors the published score view for AA-MMMU-Pro. Gemini 3.5 Flash leads the public snapshot at 84.3% , followed by Gemini 3.1 Pro (82.4%) and Muse Spark (80.5%). BenchLM does not use these results to rank models overall.

Gemini 3.5 Flash

Google

Overall 81Context 1M

Gemini 3.1 Pro

Google

Overall 88Context 1M

Muse Spark

Meta

Overall —Context 262K

72 modelsMultimodal & GroundedCurrentDisplay onlyUpdated July 4, 2026

The published AA-MMMU-Pro snapshot is tightly clustered at the top: Gemini 3.5 Flash sits at 84.3%, while the third row is only 3.8 points behind. The broader top-10 spread is 5.9 points, so many of the published scores sit in a relatively narrow band.

72 models have been evaluated on AA-MMMU-Pro. The benchmark falls in the Multimodal & Grounded category. This category carries a 12% weight in BenchLM.ai's overall scoring system. AA-MMMU-Pro is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About AA-MMMU-Pro

Year

2026

Tasks

Multimodal academic reasoning

Format

Image + text question answering

Difficulty

Frontier multimodal

BenchLM stores the Artificial Analysis MMMU-Pro result separately from the weighted MMMU-Pro lane so AA refreshes remain display-only.

Artificial Analysis MMMU-Pro Benchmark Leaderboard

BenchLM freshness & provenance

Version

AA-MMMU-Pro 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (72 models)

1

Gemini 3.5 Flash

GoogleClosed

84.3%

2

GoogleClosed

82.4%

3

MetaClosed

80.5%

4

GoogleClosed

80.2%

5

OpenAIClosed

79.9%

6

Moonshot AIOpen

79.4%

7

Claude Opus 4.7 (Adaptive)

AnthropicClosed

78.8%

8

GoogleClosed

78.6%

9

OpenAIClosed

78.5%

10

OpenAIClosed

78.4%

11

xAIClosed

78.1%

12

AlibabaClosed

78.0%

13

Qwen3.5 397B (Reasoning)

AlibabaOpen

77.3%

14

Claude Opus 4.7

AnthropicClosed

76.4%

15

OpenAIClosed

76.3%

16

OpenAIClosed

75.5%

17

Gemini 3.1 Flash-Lite

GoogleClosed

75.5%

18

Kimi K2.5 (Reasoning)

Moonshot AIClosed

75.4%

19

Moonshot AIOpen

75.4%

20

Claude Opus 4.6 (Adaptive)

AnthropicClosed

75.4%

21

StepFunOpen

75.3%

22

Qwen3.6-35B-A3B

AlibabaOpen

75.0%

23

Qwen3.5-122B-A10B

AlibabaOpen

75.0%

24

AlibabaOpen

75.0%

25

GoogleClosed

74.9%

26

AlibabaOpen

74.6%

27

OpenAIClosed

74.3%

28

OpenAIClosed

74.2%

29

Claude Opus 4.5 Thinking

AnthropicClosed

74.0%

30

GoogleOpen

73.4%

31

OpenAIClosed

73.3%

32

Z.AIClosed

72.8%

33

Qwen3.5-35B-A3B

AlibabaOpen

72.7%

34

Claude Opus 4.6

AnthropicClosed

72.5%

35

GPT-5.1-Codex-Max

OpenAIClosed

72.5%

36

OpenAIClosed

72.5%

37

Claude Opus 4.5

AnthropicClosed

71.2%

38

Claude Sonnet 4.6

AnthropicClosed

70.6%

39

OpenAIClosed

70.1%

40

XiaomiClosed

69.9%

41

GoogleOpen

69.7%

42

Gemma 4 26B A4B

GoogleOpen

69.2%

43

xAIClosed

68.8%

44

Claude 4.1 Opus Thinking

AnthropicClosed

67.9%

45

Gemini 2.5 Flash

GoogleClosed

65.5%

46

OpenAIClosed

65.4%

47

Mistral Medium 3.5 128B

MistralOpen

64.9%

48

Grok 4.1 Fast (Reasoning)

xAIClosed

63.3%

49

CohereOpen

63.2%

50

Claude 4 Sonnet

AnthropicClosed

62.4%

51

Llama 4 Maverick

MetaOpen

62.1%

52

Grok 4 Fast (Reasoning)

xAIClosed

61.8%

53

OpenAIClosed

61.2%

54

OpenAIClosed

58.7%

55

Mistral Small 4 (Reasoning)

MistralOpen

56.8%

56

Mistral Small 4

MistralOpen

56.8%

57

Mistral Large 3

MistralClosed

55.7%

58

GoogleClosed

55.0%

59

Nemotron 3 Nano Omni 30B A3B

NVIDIAOpen

53.2%

60

Mistral Medium 3

MistralClosed

53.0%

61

MetaOpen

52.9%

62

AlibabaOpen

52.7%

63

GoogleOpen

51.4%

64

xAIClosed

48.4%

65

GoogleOpen

48.0%

66

AlibabaClosed

44.8%

67

GoogleOpen

44.6%

68

AmazonClosed

44.3%

69

OpenAIClosed

41.5%

70

OpenAIClosed

40.1%

71

AnthropicClosed

30.8%

72

LFM2.5-VL-1.6B-Extract

LiquidAIOpen

26.5%

FAQ

What does AA-MMMU-Pro measure?

A display-only Artificial Analysis MMMU-Pro score.

Which model scores highest on AA-MMMU-Pro?

Gemini 3.5 Flash by Google currently leads with a score of 84.3% on AA-MMMU-Pro.

How many models are evaluated on AA-MMMU-Pro?

72 AI models have been evaluated on AA-MMMU-Pro on BenchLM.

Compare Top Models on AA-MMMU-Pro

Gemini 3.5 Flash vs Gemini 3.1 Pro Gemini 3.1 Pro vs Muse Spark Muse Spark vs Gemini 3 Pro Gemini 3 Pro vs GPT-5.5

Last updated: July 4, 2026 · BenchLM version AA-MMMU-Pro 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.