A verified subset of OSWorld focused on computer-use tasks in desktop-like environments, including navigation, editing, and workflow completion.
As of April 29, 2026, Holo3-35B-A3B leads the OSWorld-Verified leaderboard with 82.6% , followed by Claude Mythos Preview (79.6%) and Holo3-122B-A10B (78.8%).
Holo3-35B-A3B
H Company
Claude Mythos Preview
Anthropic
Holo3-122B-A10B
H Company
According to BenchLM.ai, Holo3-35B-A3B leads the OSWorld-Verified benchmark with a score of 82.6%, followed by Claude Mythos Preview (79.6%) and Holo3-122B-A10B (78.8%). The scores show moderate spread, with meaningful differences between the top tier and mid-tier models.
18 models have been evaluated on OSWorld-Verified. The benchmark falls in the Agentic category. This category carries a 22% weight in BenchLM.ai's overall scoring system. Within that category, OSWorld-Verified contributes 24% of the category score, so strong performance here directly affects a model's overall ranking.
Year
2025
Tasks
Desktop and GUI tasks
Format
Interactive computer-use evaluation
Difficulty
Complex multi-step workflows
OSWorld-Verified measures whether models can operate software interfaces, keep state across steps, and complete practical GUI workflows. It is one of the clearest public signals for computer-use capability.
Version
OSWorld Verified
Refresh cadence
Quarterly
Staleness state
Current
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
A verified subset of OSWorld focused on computer-use tasks in desktop-like environments, including navigation, editing, and workflow completion.
Holo3-35B-A3B by H Company currently leads with a score of 82.6% on OSWorld-Verified.
18 AI models have been evaluated on OSWorld-Verified on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.