PolyMath

Name: PolyMath
Creator: BenchLM

A multilingual mathematical reasoning benchmark that tests whether math performance transfers across languages rather than only in English.

Benchmark score on PolyMath — July 4, 2026

BenchLM mirrors the published score view for PolyMath. Qwen3.7 Max leads the public snapshot at 86.5% , followed by Qwen3.7 Plus (84.0%). BenchLM does not use these results to rank models overall.

1Closed

Qwen3.7 Max

Alibaba

86.5%

Overall 84Context 1M

2Closed

Qwen3.7 Plus

Alibaba

84.0%

Overall 80Context 1M

2 modelsMultilingualCurrentDisplay onlyUpdated July 4, 2026

About PolyMath

Year

2026

Tasks

Multilingual math problems

Format

Cross-lingual mathematical reasoning

Difficulty

Advanced multilingual reasoning

PolyMath isolates cross-lingual math transfer rather than general chat quality. It is useful for spotting models that keep surface fluency in other languages but lose structured reasoning quality.

Qwen3.6 launch benchmarks

BenchLM freshness & provenance

Version

PolyMath 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (2 models)

Qwen3.7 Max

AlibabaClosed

86.5%

Qwen3.7 Plus

AlibabaClosed

84.0%

FAQ

What does PolyMath measure?

A multilingual mathematical reasoning benchmark that tests whether math performance transfers across languages rather than only in English.

Which model scores highest on PolyMath?

Qwen3.7 Max by Alibaba currently leads with a score of 86.5% on PolyMath.

How many models are evaluated on PolyMath?

2 AI models have been evaluated on PolyMath on BenchLM.

Compare Top Models on PolyMath

Qwen3.7 Max vs Qwen3.7 Plus

Last updated: July 4, 2026 · BenchLM version PolyMath 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.