Head-to-head comparison across 2benchmark categories. Overall scores shown here use BenchLM's provisional ranking lane.
Ornith-1.0-9B
52
Qwen3.5-35B-A3B
55
Verified leaderboard positions: Ornith-1.0-9B unranked · Qwen3.5-35B-A3B #27
Pick Qwen3.5-35B-A3B if you want the stronger benchmark profile. Ornith-1.0-9B only becomes the better choice if coding is the priority.
Agentic
+7.5 difference
Coding
+11.0 difference
Ornith-1.0-9B
Qwen3.5-35B-A3B
$0 / $0
$0 / $0
N/A
N/A
N/A
N/A
262K
262K
Pick Qwen3.5-35B-A3B if you want the stronger benchmark profile. Ornith-1.0-9B only becomes the better choice if coding is the priority.
Qwen3.5-35B-A3B has the cleaner provisional overall profile here, landing at 55 versus 52. It is a real lead, but still close enough that category-level strengths matter more than the headline number.
Qwen3.5-35B-A3B's sharpest advantage is in agentic, where it averages 50.6 against 43.1. The single biggest benchmark swing on the page is Terminal-Bench 2.0, 43.1% to 40.5%. Ornith-1.0-9B does hit back in coding, so the answer changes if that is the part of the workload you care about most.
Qwen3.5-35B-A3B is ahead on BenchLM's provisional leaderboard, 55 to 52. The biggest single separator in this matchup is Terminal-Bench 2.0, where the scores are 43.1% and 40.5%.
Ornith-1.0-9B has the edge for coding in this comparison, averaging 69.4 versus 58.4. Inside this category, SWE-bench Verified is the benchmark that creates the most daylight between them.
Qwen3.5-35B-A3B has the edge for agentic tasks in this comparison, averaging 50.6 versus 43.1. Inside this category, Terminal-Bench 2.0 is the benchmark that creates the most daylight between them.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.