Best LLMs for Instruction Following in 2026

Top AI models ranked by instruction following benchmark performance including IFEval.

GPT-5.4
OpenAIProprietary1M

95

avg

Claude Opus 4.6
AnthropicProprietary1M

95

avg

Gemini 3.1 Pro
GoogleProprietary1M

95

avg

4
GPT-5.2
OpenAIProprietary400K

94

avg

5
GPT-5.3 Codex
OpenAIProprietary400K

93

avg

6
Grok 4.1
xAIProprietary128K

93

avg

7
GPT-5.2-Codex
OpenAIProprietary400K

92

avg

8
GLM-5 (Reasoning)
Zhipu AIOpen Weight200K

92

avg

9
GPT-5.1-Codex-Max
OpenAIProprietary400K

91

avg

10
Claude Sonnet 4.6
AnthropicProprietary1M

91

avg

11
GPT-5 (high)
OpenAIProprietary128K

91

avg

12
Kimi K2.5 (Reasoning)
Moonshot AIOpen Weight128K

91

avg

13
Claude Opus 4.5
AnthropicProprietary200K

90

avg

14
Claude Sonnet 4.5
AnthropicProprietary1M

90

avg

15
Grok 4.1 Fast
xAIProprietary2M

90

avg

16
Gemini 3 Pro Deep Think
GoogleProprietary2M

89

avg

17
GPT-5.1
OpenAIProprietary400K

89

avg

18
Qwen3.5 397B (Reasoning)
AlibabaOpen Weight128K

89

avg

19
Gemini 3 Pro
GoogleProprietary2M

88

avg

20
o1-preview
OpenAIProprietary200K

88

avg

21
GPT-5 (medium)
OpenAIProprietary128K

88

avg

22
DeepSeek Coder 2.0
DeepSeekOpen Weight128K

86

avg

23
Llama 3.1 405B
MetaOpen Weight128K

86

avg

24
Claude Haiku 4.5
AnthropicProprietary200K

86

avg

25
o3
OpenAIProprietary200K

85

avg

26
DeepSeek V3.2 (Thinking)
DeepSeekOpen Weight128K

85

avg

27
GLM-5
Zhipu AIOpen Weight200K

85

avg

28
GLM-4.7
Zhipu AIOpen Weight200K

85

avg

29
Qwen2.5-72B
AlibabaOpen Weight128K

85

avg

30
DeepSeek V3.2
DeepSeekOpen Weight128K

85

avg

31
DeepSeek LLM 2.0
DeepSeekOpen Weight128K

85

avg

32
Kimi K2.5
Moonshot AIOpen Weight128K

85

avg

33
MiniMax M2.5
MiniMaxProprietary128K

85

avg

34
Gemini 3 Flash
GoogleProprietary1M

85

avg

35
Qwen2.5-1M
AlibabaOpen Weight1M

84

avg

36
MiMo-V2-Flash
XiaomiOpen Weight128K

84

avg

37
Nemotron 3 Ultra 500B
NVIDIAOpen Weight32K

84

avg

38
GLM-4.7-Flash
Zhipu AIOpen Weight200K

84

avg

39
Nemotron 3 Super 100B
NVIDIAOpen Weight32K

84

avg

40
Gemini 2.5 Pro
GoogleProprietary2M

83

avg

41
o4-mini (high)
OpenAIProprietary200K

83

avg

42
DeepSeekMath V2
DeepSeekOpen Weight128K

83

avg

43
Claude 4.1 Opus
AnthropicProprietary200K

83

avg

44
Mistral Large 3
MistralOpen Weight128K

83

avg

45
Claude 4 Sonnet
AnthropicProprietary200K

83

avg

46
Mistral Large 2
MistralProprietary128K

83

avg

47
Claude 3.5 Sonnet
AnthropicProprietary200K

83

avg

48
o3-pro
OpenAIProprietary200K

82

avg

49
GPT-5 mini
OpenAIProprietary128K

82

avg

50
Grok 4
xAIProprietary128K

82

avg

51
Qwen3.5 397B
AlibabaOpen Weight128K

82

avg

52
GPT-4o
OpenAIProprietary128K

82

avg

53
GPT-4 Turbo
OpenAIProprietary128K

80

avg

54
Z-1
ZProprietary128K

80

avg

55
Grok Code Fast 1
xAIProprietary256K

79

avg

56
Gemini 3.1 Flash-Lite
GoogleProprietary1M

79

avg

57
Nemotron-4 15B
NVIDIAOpen Weight32K

79

avg

58
GPT-OSS 120B
OpenAIOpen Weight128K

79

avg

59
Gemini 2.5 Flash
GoogleProprietary1M

79

avg

60
Mistral 8x7B
MistralOpen Weight32K

78

avg

61
Nemotron 3 Nano 30B
NVIDIAOpen Weight32K

78

avg

62
Nemotron Ultra 253B
NVIDIAOpen Weight32K

78

avg

63
Gemini 1.5 Pro
GoogleProprietary2M

77

avg

64
Claude 3 Opus
AnthropicProprietary200K

77

avg

65
Gemini 1.0 Pro
GoogleProprietary32K

77

avg

66
Llama 3 70B
MetaOpen Weight128K

77

avg

67
Moonshot v1
Moonshot AIProprietary128K

77

avg

68
Claude 3 Haiku
AnthropicProprietary200K

76

avg

69
DeepSeek V3.1 (Reasoning)
DeepSeekOpen Weight128K

70

avg

70
DeepSeek-R1
DeepSeekOpen Weight128K

69

avg

71
Qwen3 235B 2507
AlibabaOpen Weight128K

69

avg

72
Llama 4 Behemoth
MetaOpen Weight32K

68

avg

73
Llama 4 Scout
MetaOpen Weight32K

68

avg

74
Llama 4 Maverick
MetaOpen Weight32K

68

avg

75
Qwen3 235B 2507 (Reasoning)
AlibabaOpen Weight128K

68

avg

76
GLM-4.5
TsinghuaProprietary128K

68

avg

77
MiniMax M1 80k
MiniMaxProprietary80K

68

avg

78
GLM-4.5-Air
TsinghuaProprietary128K

68

avg

79
Mistral 7B v0.3
MistralOpen Weight32K

68

avg

80
Gemma 3 27B
GoogleOpen Weight32K

67

avg

81
Qwen2.5-VL-32B
AlibabaOpen Weight32K

67

avg

82
Grok 3 [Beta]
xAIProprietary128K

67

avg

83
DeepSeek V3.1
DeepSeekOpen Weight128K

67

avg

84
Kimi K2
Moonshot AIProprietary128K

67

avg

85
GPT-OSS 20B
OpenAIOpen Weight128K

67

avg

86
Mistral 8x7B v0.2
MistralOpen Weight32K

67

avg

87
Nova Pro
Nova AIProprietary128K

66

avg

88
Claude 4.1 Opus Thinking
AnthropicProprietary200K

66

avg

Key Takeaways

  • The top model is GPT-5.4 by OpenAI with a score of 95.
  • The best open-weight model in this ranking is GLM-5 (Reasoning) at position #8.
  • 88 models are included in this ranking.