IFBench evaluates precise instruction-following generalization on 58 challenging, verifiable out-of-domain constraints. Unlike IFEval which tests familiar constraint types, IFBench specifically measures how well models follow novel instructions they haven't been optimized for, exposing overfitting to common instruction patterns.
As of May 13, 2026, Grok 4.3 leads the IFBench leaderboard with 81.3% , followed by Qwen3.6 Plus (75.8%) and Nemotron 3 Nano Omni 30B A3B (74.2%).
Grok 4.3
xAI
Qwen3.6 Plus
Alibaba
Nemotron 3 Nano Omni 30B A3B
NVIDIA
According to BenchLM.ai, Grok 4.3 leads the IFBench benchmark with a score of 81.3%, followed by Qwen3.6 Plus (75.8%) and Nemotron 3 Nano Omni 30B A3B (74.2%). There is significant spread across the leaderboard, making this benchmark effective at differentiating model capabilities.
7 models have been evaluated on IFBench. The benchmark falls in the Instruction Following category. This category carries a 5% weight in BenchLM.ai's overall scoring system. Within that category, IFBench contributes 35% of the category score, so strong performance here directly affects a model's overall ranking.
Year
2025
Tasks
58
Version
IFBench 2025
Refresh cadence
Quarterly
Staleness state
Current
Question availability
Public benchmark set
BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.
IFBench evaluates precise instruction-following generalization on 58 challenging, verifiable out-of-domain constraints. Unlike IFEval which tests familiar constraint types, IFBench specifically measures how well models follow novel instructions they haven't been optimized for, exposing overfitting to common instruction patterns.
Grok 4.3 by xAI currently leads with a score of 81.3% on IFBench.
7 AI models have been evaluated on IFBench on BenchLM.
For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.
Free. No spam. Unsubscribe anytime.