Skip to main content

Artificial Analysis IFBench (AA-IFBench)

A display-only Artificial Analysis IFBench score.

Benchmark score on AA-IFBench — July 4, 2026

BenchLM mirrors the published score view for AA-IFBench. MiniMax M3 leads the public snapshot at 82.9% , followed by Nemotron 3 Ultra (81.4%) and Grok 4.3 (81.3%). BenchLM does not use these results to rank models overall.

126 modelsInstruction FollowingCurrentDisplay onlyUpdated July 4, 2026

The published AA-IFBench snapshot is tightly clustered at the top: MiniMax M3 sits at 82.9%, while the third row is only 1.6 points behind. The broader top-10 spread is 5.7 points, so many of the published scores sit in a relatively narrow band.

126 models have been evaluated on AA-IFBench. The benchmark falls in the Instruction Following category. This category carries a 5% weight in BenchLM.ai's overall scoring system. AA-IFBench is currently displayed for reference but excluded from the scoring formula, so it does not directly affect overall rankings.

About AA-IFBench

Year

2026

Tasks

Verifiable instruction constraints

Format

Constraint satisfaction accuracy

Difficulty

Instruction precision

BenchLM stores the Artificial Analysis IFBench result separately from the weighted IFBench lane so AA refreshes remain display-only.

BenchLM freshness & provenance

Version

AA-IFBench 2026

Refresh cadence

Quarterly

Staleness state

Current

Question availability

Public benchmark set

CurrentDisplay only

BenchLM uses freshness metadata to decide whether a benchmark should still be treated as a strong differentiator, a benchmark to watch, or a display-only reference. For the full scoring policy, see the BenchLM methodology page.

Benchmark score table (126 models)

1
82.9%
2
81.4%
3
81.3%
4
80.5%
5
79.9%
6
79.2%
7
78.8%
8
78.0%
9
77.6%
10
77.2%
11
77.1%
12
76.6%
13
76.5%
14
76.3%
15
76.3%
16
76.0%
17
75.9%
18
75.9%
19
75.9%
20
75.7%
21
75.7%
22
75.6%
23
75.6%
24
75.4%
25
75.4%
26
75.2%
27
73.9%
28
73.9%
29
73.5%
30
73.5%
31
73.3%
32
73.3%
33
73.2%
34
73.1%
35
72.9%
36
72.5%
37
72.4%
38
72.3%
39
71.4%
40
71.3%
41
70.6%
42
70.4%
43
70.3%
44
70.2%
45
70.2%
46
70.0%
47
70.0%
48
69.0%
49
68.8%
50
68.8%
51
67.9%
52
67.6%
53
67.3%
54
65.1%
55
64.7%
56
64.4%
58
63.1%
59
63.1%
60
62.2%
61
61.1%
62
58.6%
63
58.0%
64
57.4%
65
56.3%
66
56.3%
67
55.6%
68
55.4%
69
55.1%
70
53.7%
71
53.5%
72
53.1%
74
51.6%
75
50.5%
76
49.0%
77
48.7%
78
48.2%
79
48.2%
80
45.4%
81
44.6%
82
44.2%
83
44.1%
84
43.6%
85
43.0%
86
43.0%
87
43.0%
88
41.5%
89
41.5%
90
41.4%
91
41.2%
92
39.9%
93
39.6%
94
39.5%
95
39.3%
96
39.0%
97
39.0%
98
38.3%
99
38.2%
100
38.1%
101
38.0%
102
37.8%
103
37.6%
104
37.5%
105
36.7%
106
36.5%
107
36.2%
108
36.1%
109
34.8%
110
34.4%
111
34.3%
112
33.7%
113
33.5%
114
33.1%
115
32.0%
116
31.8%
117
31.2%
118
31.0%
119
26.5%
120
26.2%
121
25.3%
122
23.5%
123
22.9%
124
20.5%
125
17.6%
126
15.9%

FAQ

What does AA-IFBench measure?

A display-only Artificial Analysis IFBench score.

Which model scores highest on AA-IFBench?

MiniMax M3 by MiniMax currently leads with a score of 82.9% on AA-IFBench.

How many models are evaluated on AA-IFBench?

126 AI models have been evaluated on AA-IFBench on BenchLM.

Last updated: July 4, 2026 · BenchLM version AA-IFBench 2026

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.