A benchmark that evaluates language models' ability to follow verifiable instructions such as formatting constraints, keyword inclusion/exclusion, length limits, and structural requirements.
As of April 21, 2026, Qwen3.5-27B leads the IFEval leaderboard with 95% , followed by Qwen3.6 Plus (94.3%) and Kimi K2.5 (93.9%).
Qwen3.5-27B
Alibaba
Qwen3.6 Plus
Alibaba
Kimi K2.5
Moonshot AI
According to BenchLM.ai, Qwen3.5-27B leads the IFEval benchmark with a score of 95%, followed by Qwen3.6 Plus (94.3%) and Kimi K2.5 (93.9%). The top models are clustered within 1.1 points, suggesting this benchmark is nearing saturation for frontier models.
15 models have been evaluated on IFEval. The benchmark falls in the Instruction Following category. This category carries a 5% weight in BenchLM.ai's overall scoring system. Within that category, IFEval contributes 65% of the category score, so strong performance here directly affects a model's overall ranking.
Year
2023
Tasks
500+ instructions
Format
Constrained generation
Difficulty
Instruction precision
IFEval uses verifiable instructions to objectively measure instruction-following ability. Instructions include requirements like 'write in all caps', 'include exactly 3 bullet points', or 'respond in JSON format', making evaluation automated and reproducible.
Version
IFEval 2023
Refresh cadence
Static
Staleness state
Stale
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
A benchmark that evaluates language models' ability to follow verifiable instructions such as formatting constraints, keyword inclusion/exclusion, length limits, and structural requirements.
Qwen3.5-27B by Alibaba currently leads with a score of 95% on IFEval.
15 AI models have been evaluated on IFEval on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.