Skip to main content
Skip to main content

LLM Speed & Latency Comparison

Compare inference speed across every major AI model. Tokens/sec measures output generation speed. TTFT (Time to First Token) measures response latency.

Speed data from Artificial Analysis. Last updated: 2026-04-07. Median tokens/s, Latency first answer chunk (s).

Fastest Output

Mercury 2

789 tok/s · Inception

Lowest Latency

Ministral 3 3B

0.42s TTFT · Mistral

Fastest (Score 70+)

Grok 4.20

233 tok/s · Score: 78

Top 25 — Output Speed (tok/s)

Ultra FastFastMediumSlow

Average Speed by Provider

NVIDIA

260 tok/s avg · 2 models

1.3s avg TTFT

xAI

157 tok/s avg · 5 models

6s avg TTFT

Google

136 tok/s avg · 7 models

13.5s avg TTFT

Mistral

126 tok/s avg · 7 models

0.8s avg TTFT

OpenAI

121 tok/s avg · 25 models

42.3s avg TTFT

Meta

93 tok/s avg · 3 models

1.3s avg TTFT

Z.AI

82 tok/s avg · 5 models

1.3s avg TTFT

Anthropic

52 tok/s avg · 7 models

3.3s avg TTFT

DeepSeek

48 tok/s avg · 2 models

2.3s avg TTFT

MiniMax

46 tok/s avg · 2 models

2.3s avg TTFT

Moonshot AI

44 tok/s avg · 2 models

1.9s avg TTFT

Model
Mercury 2

Inception · Proprietary

7893.8813
Nemotron 3 Super 100B

NVIDIA · Open Weight

3670.7146
GPT-OSS 20B

OpenAI · Open Weight

3130.6520
Ministral 3 3B

Mistral · Open Weight

2740.421
GPT-OSS 120B

OpenAI · Open Weight

2620.7938
Grok 4.20

xAI · Proprietary

23310.3378
Gemini 2.5 Flash

Google · Proprietary

2210.541
Gemini 3.1 Flash-Lite

Google · Proprietary

2057.551
GPT-5.4 mini

OpenAI · Proprietary

2013.8573
GPT-5.4 nano

OpenAI · Proprietary

1913.6463
Grok 3 Mini

xAI · Proprietary

1900.5448
Ministral 3 8B

Mistral · Open Weight

1820.523
GPT-4.1 nano

OpenAI · Proprietary

1810.6328
Mistral Small 4

Mistral · Open Weight

1750.6447
Grok Code Fast 1

xAI · Proprietary

1722.8142
o4-mini (high)

OpenAI · Proprietary

16121.9446
o3-mini

OpenAI · Proprietary

1607.1258
Gemini 3 Flash

Google · Proprietary

1591.1967
Nemotron 3 Nano 30B

NVIDIA · Open Weight

1521.927
Nova Pro

Amazon · Proprietary

1410.8111
Grok 4.1 Fast

xAI · Proprietary

1380.5472
Claude 3 Haiku

Anthropic · Proprietary

1381.1625
GPT-5 nano

OpenAI · Proprietary

13783.311
GPT-4o

OpenAI · Proprietary

1310.8141
MiMo-V2-Flash

Xiaomi · Open Weight

1292.1463
Llama 4 Scout

Meta · Open Weight

1280.724
GPT-5.2-Codex

OpenAI · Proprietary

12387.3480
Llama 4 Maverick

Meta · Open Weight

1210.9518
o3

OpenAI · Proprietary

1185.3860
Gemini 2.5 Pro

Google · Proprietary

11721.1967
GPT-5.1

OpenAI · Proprietary

11157.4781
Ministral 3 14B

Mistral · Open Weight

1100.66
Gemini 3.1 Pro

Google · Proprietary

10929.7194
Gemini 3 Pro

Google · Proprietary

10932.6583
GPT-4.1

OpenAI · Proprietary

1081.0261
GLM-4.5-Air

Z.AI · Proprietary

1061.1822
o1

OpenAI · Proprietary

9832.2960
Qwen3.5 397B

Alibaba · Open Weight

962.4466
GLM-4.7-Flash

Z.AI · Open Weight

950.9113
LFM2-24B-A2B

LiquidAI · Proprietary

920.423
Step 3.5 Flash

StepFun · Open Weight

873.0315
GPT-5 mini

OpenAI · Proprietary

8665.3211
GPT-5 (high)

OpenAI · Proprietary

8336.2880
GPT-5 (medium)

OpenAI · Proprietary

8336.2874
GLM-4.7

Z.AI · Open Weight

821.172
GPT-4.1 mini

OpenAI · Proprietary

800.7647
GPT-5.3 Codex

OpenAI · Proprietary

7988.2689
GPT-5.4

OpenAI · Proprietary

74151.7994
GPT-5.4 Pro

OpenAI · Proprietary

74151.7992
GLM-5

Z.AI · Open Weight

741.6477
GPT-5.2

OpenAI · Proprietary

73130.3484
DeepSeek R1 Distill Qwen 32B

DeepSeek · Open Weight

600.847
Mistral Medium 3

Mistral · Proprietary

571.245
Grok 4

xAI · Proprietary

5415.667
GLM-4.5

Z.AI · Proprietary

511.4529
Mistral Large 3

Mistral · Proprietary

481.0452
Claude Opus 4.5

Anthropic · Proprietary

461.0180
MiniMax M2.5

MiniMax · Proprietary

462.1217
Kimi K2.5

Moonshot AI · Open Weight

452.3868
MiniMax M2.7

MiniMax · Proprietary

452.5364
Claude Sonnet 4.6

Anthropic · Proprietary

441.4886
Kimi K2

Moonshot AI · Proprietary

431.5144
Claude Opus 4.6

Anthropic · Proprietary

401.7892
Claude 4 Sonnet

Anthropic · Proprietary

401.3352
Mistral Large 2

Mistral · Proprietary

381.4540
DeepSeek V3.2

DeepSeek · Open Weight

353.7560
Phi-4

Microsoft · Open Weight

352.0229
GPT-4o mini

OpenAI · Proprietary

333.1645
Gemma 3 27B

Google · Open Weight

312.0419
GPT-4 Turbo

OpenAI · Proprietary

302.8427
Claude 4.1 Opus

Anthropic · Proprietary

291.6653
Claude 4.1 Opus Thinking

Anthropic · Proprietary

291545
Llama 3.1 405B

Meta · Open Weight

292.1943
o3-pro

OpenAI · Proprietary

2784.9360

Speed data sourced from Artificial Analysis. Metrics reflect median performance across providers. Reasoning models typically show higher TTFT due to chain-of-thought processing.

Frequently Asked Questions

What does tokens per second mean for LLMs?

Tokens per second (tok/s) measures how fast an LLM generates output text. Higher is better. A model at 200 tok/s produces roughly 150 words per second — fast enough for real-time streaming. Models below 50 tok/s may feel sluggish in interactive applications.

What is TTFT (Time to First Token)?

TTFT (Time to First Token) measures the latency between sending a request and receiving the first token of the response. Lower is better. For chat applications, TTFT under 1 second feels instant. Reasoning models often have high TTFT (10-150s) because they "think" before responding.

Which LLM is the fastest?

Currently, Mercury 2 by Inception is the fastest at 789 tokens/second. The fastest model scoring above 70 overall is Grok 4.20 at 233 tok/s.

Why are reasoning models slower?

Reasoning models (like o3, GPT-5, Gemini Deep Think) use chain-of-thought processing — they generate internal "thinking" tokens before producing the final answer. This adds significant TTFT latency (often 10-150 seconds) but can dramatically improve accuracy on complex tasks. The output speed (tok/s) once generation starts is usually comparable to standard models.

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.