Top standard AI models (no chain-of-thought reasoning) ranked by benchmark performance. Faster and cheaper than reasoning models.
Unless noted otherwise, ranking surfaces on this page use BenchLM's provisional leaderboard lane rather than the stricter sourced-only verified leaderboard.
Bottom line: Non-reasoning models are faster and cheaper than chain-of-thought alternatives. Gemini 3.1 Pro leads this tier — proving that strong reasoning scores are possible without dedicated thinking tokens.
According to BenchLM.ai, Claude Opus 4.7 leads this ranking with a score of 97, followed by Gemini 3.1 Pro (93) and Claude Opus 4.6 (91). There is meaningful separation between the top models, suggesting genuine performance differences.
The best open-weight option is GLM-5 (ranked #8 with a score of 77). While proprietary models lead, open-weight options are within striking distance for teams willing to trade a few points of performance for full model control.
This ranking is based on provisional overall weighted scores across BenchLM.ai's scoring formula tracked by BenchLM.ai. For detailed model profiles, click any model name below. To compare two specific models head-to-head, use the "vs #" links.
Claude Opus 4.7
Anthropic · 1M
Gemini 3.1 Pro
Google · 1M
Best non-reasoning model. Leads reasoning, knowledge, and multilingual.
Claude Opus 4.6
Anthropic · 1M
Most balanced non-reasoning model. Top instruction following (95).
Gemini 3.1 Pro leads non-reasoning models — best reasoning (97), knowledge (96), and multilingual (100).
Claude Opus 4.6 most consistent non-reasoning model across all 8 categories.
Claude Sonnet 4.6 strong mid-tier with best multimodal (95) in this tier.
Best non-reasoning model?
Gemini 3.1 Pro — strongest across all categories
Production reliability?
Claude Opus 4.6 — most consistent in this tier
Lower latency and cost?
Non-reasoning models skip chain-of-thought — all are faster than reasoning alternatives
Compare with reasoning models?
See reasoning models to evaluate the accuracy-speed trade-off
Get notified when models move. One email a week with what changed and why.
Free. No spam. Unsubscribe anytime.
The top model is Claude Opus 4.7 by Anthropic with a provisional score of 97.
The best open-weight model is GLM-5 at position #8.
67 models are included in this ranking.
Non-reasoning models are standard completion/chat models without dedicated chain-of-thought. They are ranked by the same overall BenchLM score and are typically faster and cheaper per token.
The "non-reasoning" label excludes models with explicit chain-of-thought (like o3, DeepSeek R1). Some non-reasoning models still reason internally — the distinction is about architecture and pricing, not capability.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.