Release, provider, and benchmark trends
BenchLM trend pages are derived from the same normalized model and benchmark dataset as the leaderboard. This page focuses on release cadence, provider depth, and whether the benchmark mix is staying fresh enough to separate current models.
Recent release momentum
All modelsMar 2024
2 releases
49
Top score
Claude 3 Haiku
May 2024
1 releases
52
Top score
GPT-4o
Jun 2024
1 releases
60
Top score
Claude 3.5 Sonnet
Jul 2024
1 releases
59
Top score
Mistral Large 2
Dec 2024
3 releases
67
Top score
o1
Jan 2025
3 releases
70
Top score
o3-mini
Feb 2025
1 releases
43
Top score
Grok 3 [Beta]
Apr 2025
5 releases
68
Top score
o3
May 2025
1 releases
65
Top score
Claude 4 Sonnet
Jul 2025
1 releases
67
Top score
Grok 4
Aug 2025
3 releases
49
Top score
GPT-OSS 120B
Oct 2025
1 releases
62
Top score
Claude Haiku 4.5
Dec 2025
4 releases
77
Top score
GPT-5.2
Feb 2026
5 releases
83
Top score
Gemini 3.1 Pro
Mar 2026
2 releases
87
Top score
GPT-5.4 Pro
Provider progression snapshot
OpenAI
GPT-5.4
Avg. top 3
80.3
Ranked
9
Anthropic
Claude Opus 4.6
Avg. top 3
73.7
Ranked
7
Gemini 3.1 Pro
Avg. top 3
73.5
Ranked
2
DeepSeek
DeepSeek Coder 2.0
Avg. top 3
60.7
Ranked
5
Mistral
Mistral Large 2
Avg. top 3
59
Ranked
1
Alibaba
Qwen2.5-1M
Avg. top 3
58
Ranked
3
NVIDIA
Nemotron 3 Ultra 500B
Avg. top 3
57
Ranked
4
Zhipu AI
GLM-4.7
Avg. top 3
56.3
Ranked
3
Moonshot AI
Kimi K2
Avg. top 3
56
Ranked
2
xAI
Grok 4
Avg. top 3
55
Ranked
2
Meta
Llama 3.1 405B
Avg. top 3
49.5
Ranked
2
Z
Z-1
Avg. top 3
47
Ranked
1
Benchmark freshness snapshot
Agentic
Coding
Reasoning
Multimodal
Knowledge
Multilingual
Instruction Following
Math