Instruction Following

Instruction Following Benchmarks

Ability to follow precise instructions and constraints

IFEval

88 models
1
GPT-5.3 Codex
OpenAI
ClosedReasoning400K9293%
2
GPT-5.4
OpenAI
ClosedReasoning1M9195%
3
GPT-5.2
OpenAI
ClosedReasoning400K9194%
4
Claude Opus 4.6
Anthropic
ClosedStandard1M9095%
5
Gemini 3.1 Pro
Google
ClosedStandard1M8995%
6
Grok 4.1
xAI
ClosedStandard128K8993%
7
GPT-5.2-Codex
OpenAI
ClosedReasoning400K8892%
8
GPT-5.1-Codex-Max
OpenAI
ClosedReasoning400K8791%
9
Claude Sonnet 4.6
Anthropic
ClosedStandard1M8691%
10
Gemini 3 Pro Deep Think
Google
ClosedReasoning2M8589%
11
Claude Opus 4.5
Anthropic
ClosedStandard200K8590%
12
GPT-5.1
OpenAI
ClosedReasoning400K8589%
13
GPT-5 (high)
OpenAI
ClosedReasoning128K8491%
14
Gemini 3 Pro
Google
ClosedStandard2M8488%
15
GLM-5 (Reasoning)
Zhipu AI
OpenReasoning200K8492%
16
o1-preview
OpenAI
ClosedReasoning200K8388%
17
Claude Sonnet 4.5
Anthropic
ClosedStandard1M8390%
18
Grok 4.1 Fast
xAI
ClosedStandard2M8390%
19
GPT-5 (medium)
OpenAI
ClosedReasoning128K8288%
20
Kimi K2.5 (Reasoning)
Moonshot AI
OpenReasoning128K8291%
21
Qwen3.5 397B (Reasoning)
Alibaba
OpenReasoning128K8289%
22
o3-pro
OpenAI
ClosedReasoning200K7782%
23
o3
OpenAI
ClosedReasoning200K7685%
24
DeepSeek V3.2 (Thinking)
DeepSeek
OpenReasoning128K7585%
25
GPT-5 mini
OpenAI
ClosedReasoning128K7482%
Showing 25 of 88

About Instruction Following Benchmarks

Tests ability to follow verifiable instructions like format constraints and content requirements