Qwen2.5-VL-32B Benchmark Scores & Performance

Benchmark analysis of Qwen2.5-VL-32B by Alibaba across 14 tests.

Creator

Alibaba

Source Type

Open Weight

Reasoning

Non-Reasoning

Context Window

32K

Overall Score

34#74 of 88

Knowledge Benchmarks

MMLU
43
GPQA
42
SuperGPQA
40
OpenBookQA
38

Coding Benchmarks

HumanEval
35

Mathematics Benchmarks

AIME 2023
43
AIME 2024
45
AIME 2025
44
HMMT Feb 2023
39
HMMT Feb 2024
41
HMMT Feb 2025
40
BRUMO 2025
42

Reasoning Benchmarks

SimpleQA
41
MuSR
39

Frequently Asked Questions

How does Qwen2.5-VL-32B perform overall in AI benchmarks?

Qwen2.5-VL-32B ranks #74 out of 88 models with an overall score of 34. It is created by Alibaba and features a 32K context window.

Is Qwen2.5-VL-32B good for knowledge and understanding?

Qwen2.5-VL-32B ranks #74 out of 88 models in knowledge and understanding benchmarks with an average score of 40.8. There are stronger options in this category.

Is Qwen2.5-VL-32B good for coding and programming?

Qwen2.5-VL-32B ranks #74 out of 88 models in coding and programming benchmarks with an average score of 35. There are stronger options in this category.

Is Qwen2.5-VL-32B good for mathematics?

Qwen2.5-VL-32B ranks #74 out of 88 models in mathematics benchmarks with an average score of 42. There are stronger options in this category.

Is Qwen2.5-VL-32B good for reasoning and logic?

Qwen2.5-VL-32B ranks #74 out of 88 models in reasoning and logic benchmarks with an average score of 40. There are stronger options in this category.

Is Qwen2.5-VL-32B open source?

Yes, Qwen2.5-VL-32B is an open weight model created by Alibaba, meaning it can be downloaded and run locally or fine-tuned for specific use cases.

What is the context window size of Qwen2.5-VL-32B?

Qwen2.5-VL-32B has a context window of 32K tokens, which determines how much text it can process in a single interaction.