GPT-OSS 120B vs Moonshot v1

Side-by-side benchmark comparison across knowledge, coding, math, and reasoning.

Quick Verdict

Moonshot v1 wins overall with a score of 44 vs 42 (2 point difference).Moonshot v1 wins 4 out of 4 categories.

Knowledge

Moonshot v1

GPT-OSS 120B

48.8

Moonshot v1

50.8

51
MMLU
53
50
GPQA
52
48
SuperGPQA
50
46
OpenBookQA
48

Coding

Moonshot v1

GPT-OSS 120B

43

Moonshot v1

45

43
HumanEval
45

Mathematics

Moonshot v1

GPT-OSS 120B

50

Moonshot v1

52

51
AIME 2023
53
53
AIME 2024
55
52
AIME 2025
54
47
HMMT Feb 2023
49
49
HMMT Feb 2024
51
48
HMMT Feb 2025
50
50
BRUMO 2025
52

Reasoning

Moonshot v1

GPT-OSS 120B

48

Moonshot v1

50

49
SimpleQA
51
47
MuSR
49

Frequently Asked Questions

Which is better, GPT-OSS 120B or Moonshot v1?

Moonshot v1 scores higher overall with 44 vs 42, a difference of 2 points across all benchmarks.

Which is better for knowledge tasks, GPT-OSS 120B or Moonshot v1?

Moonshot v1 leads in knowledge tasks with an average score of 50.8 vs 48.8.

Which is better for coding, GPT-OSS 120B or Moonshot v1?

Moonshot v1 leads in coding with an average score of 45 vs 43.

Which is better for math, GPT-OSS 120B or Moonshot v1?

Moonshot v1 leads in math with an average score of 52 vs 50.

Which is better for reasoning, GPT-OSS 120B or Moonshot v1?

Moonshot v1 leads in reasoning with an average score of 50 vs 48.