A benchmark that evaluates language models' ability to follow verifiable instructions such as formatting constraints, keyword inclusion/exclusion, length limits, and structural requirements.
Year
2023
Tasks
500+ instructions
Format
Constrained generation
Difficulty
Instruction precision
IFEval uses verifiable instructions to objectively measure instruction-following ability. Instructions include requirements like 'write in all caps', 'include exactly 3 bullet points', or 'respond in JSON format', making evaluation automated and reproducible.
Instruction-Following Evaluation for Large Language ModelsA benchmark that evaluates language models' ability to follow verifiable instructions such as formatting constraints, keyword inclusion/exclusion, length limits, and structural requirements.
GPT-5.4 by OpenAI currently leads with a score of 95 on IFEval.
88 AI models have been evaluated on IFEval on BenchLM.