Instruction Following
Instruction Following Benchmarks
Ability to follow precise instructions and constraints
IFEval
88 models
1 GPT-5.3 Codex OpenAI | Closed | Reasoning | 400K | 92 | 93% |
2 GPT-5.4 OpenAI | Closed | Reasoning | 1M | 91 | 95% |
3 GPT-5.2 OpenAI | Closed | Reasoning | 400K | 91 | 94% |
4 Claude Opus 4.6 Anthropic | Closed | Standard | 1M | 90 | 95% |
5 Gemini 3.1 Pro Google | Closed | Standard | 1M | 89 | 95% |
6 Grok 4.1 xAI | Closed | Standard | 128K | 89 | 93% |
7 GPT-5.2-Codex OpenAI | Closed | Reasoning | 400K | 88 | 92% |
8 GPT-5.1-Codex-Max OpenAI | Closed | Reasoning | 400K | 87 | 91% |
9 Claude Sonnet 4.6 Anthropic | Closed | Standard | 1M | 86 | 91% |
10 Gemini 3 Pro Deep Think Google | Closed | Reasoning | 2M | 85 | 89% |
11 Claude Opus 4.5 Anthropic | Closed | Standard | 200K | 85 | 90% |
12 GPT-5.1 OpenAI | Closed | Reasoning | 400K | 85 | 89% |
13 GPT-5 (high) OpenAI | Closed | Reasoning | 128K | 84 | 91% |
14 Gemini 3 Pro Google | Closed | Standard | 2M | 84 | 88% |
15 GLM-5 (Reasoning) Zhipu AI | Open | Reasoning | 200K | 84 | 92% |
16 o1-preview OpenAI | Closed | Reasoning | 200K | 83 | 88% |
17 Claude Sonnet 4.5 Anthropic | Closed | Standard | 1M | 83 | 90% |
18 Grok 4.1 Fast xAI | Closed | Standard | 2M | 83 | 90% |
19 GPT-5 (medium) OpenAI | Closed | Reasoning | 128K | 82 | 88% |
20 Kimi K2.5 (Reasoning) Moonshot AI | Open | Reasoning | 128K | 82 | 91% |
21 Qwen3.5 397B (Reasoning) Alibaba | Open | Reasoning | 128K | 82 | 89% |
22 o3-pro OpenAI | Closed | Reasoning | 200K | 77 | 82% |
23 o3 OpenAI | Closed | Reasoning | 200K | 76 | 85% |
24 DeepSeek V3.2 (Thinking) DeepSeek | Open | Reasoning | 128K | 75 | 85% |
25 GPT-5 mini OpenAI | Closed | Reasoning | 128K | 74 | 82% |
Showing 25 of 88
About Instruction Following Benchmarks
Tests ability to follow verifiable instructions like format constraints and content requirements