Skip to main content
Skip to main content

LLM Speed & Latency Comparison

Compare inference speed across every major AI model. Tokens/sec measures output generation speed. TTFT (Time to First Token) measures response latency.

Speed data from Artificial Analysis. Last updated: 2026-05-01. Median tokens/s, Latency first answer chunk (s).

Fastest Output

Mercury 2

789 tok/s · Inception

Lowest Latency

Ministral 3 3B

0.42s TTFT · Mistral

Fastest (Score 70+)

Grok 4.3

209 tok/s · Score: 77

Top 25 — Output Speed (tok/s)

Ultra FastFastMediumSlow

Average Speed by Provider

NVIDIA

260 tok/s avg · 2 models

1.3s avg TTFT

xAI

166 tok/s avg · 6 models

7s avg TTFT

Google

136 tok/s avg · 7 models

13.5s avg TTFT

Mistral

126 tok/s avg · 7 models

0.8s avg TTFT

OpenAI

121 tok/s avg · 25 models

42.3s avg TTFT

Meta

93 tok/s avg · 3 models

1.3s avg TTFT

Z.AI

82 tok/s avg · 5 models

1.3s avg TTFT

Anthropic

52 tok/s avg · 7 models

3.3s avg TTFT

DeepSeek

48 tok/s avg · 2 models

2.3s avg TTFT

MiniMax

46 tok/s avg · 2 models

2.3s avg TTFT

Moonshot AI

44 tok/s avg · 2 models

1.9s avg TTFT

Model
Mercury 2

Inception · Proprietary

7893.8811
Nemotron 3 Super 100B

NVIDIA · Open Weight

3670.7144
GPT-OSS 20B

OpenAI · Open Weight

3130.6517
Ministral 3 3B

Mistral · Open Weight

2740.421
GPT-OSS 120B

OpenAI · Open Weight

2620.7935
Grok 4.20

xAI · Proprietary

23310.3365
Gemini 2.5 Flash

Google · Proprietary

2210.538
Ling 2.6 Flash

InclusionAI · Open Weight

209.51.0736
Grok 4.3

xAI · Proprietary

20912.3677
Gemini 3.1 Flash-Lite

Google · Proprietary

2057.548
GPT-5.4 mini

OpenAI · Proprietary

2013.8571
GPT-5.4 nano

OpenAI · Proprietary

1913.6461
Grok 3 Mini

xAI · Proprietary

1900.5446
Ministral 3 8B

Mistral · Open Weight

1820.523
GPT-4.1 nano

OpenAI · Proprietary

1810.6327
Mistral Small 4

Mistral · Open Weight

1750.6446
Grok Code Fast 1

xAI · Proprietary

1722.8140
o4-mini (high)

OpenAI · Proprietary

16121.9444
o3-mini

OpenAI · Proprietary

1607.1256
Gemini 3 Flash

Google · Proprietary

1591.1965
Nemotron 3 Nano 30B

NVIDIA · Open Weight

1521.926
Nova Pro

Amazon · Proprietary

1410.8110
Grok 4.1 Fast

xAI · Proprietary

1380.5470
Claude 3 Haiku

Anthropic · Proprietary

1381.1624
GPT-5 nano

OpenAI · Proprietary

13783.3
GPT-4o

OpenAI · Proprietary

1310.8143
MiMo-V2-Flash

Xiaomi · Open Weight

1292.1460
Llama 4 Scout

Meta · Open Weight

1280.722
GPT-5.2-Codex

OpenAI · Proprietary

12387.3477
Llama 4 Maverick

Meta · Open Weight

1210.9517
o3

OpenAI · Proprietary

1185.3858
Gemini 2.5 Pro

Google · Proprietary

11721.1965
GPT-5.1

OpenAI · Proprietary

11157.4779
Ministral 3 14B

Mistral · Open Weight

1100.65
Gemini 3.1 Pro

Google · Proprietary

10929.7192
Gemini 3 Pro

Google · Proprietary

10932.6581
GPT-4.1

OpenAI · Proprietary

1081.0258
GLM-4.5-Air

Z.AI · Proprietary

1061.1819
o1

OpenAI · Proprietary

9832.2957
Qwen3.5 397B

Alibaba · Open Weight

962.4464
GLM-4.7-Flash

Z.AI · Open Weight

950.9111
LFM2-24B-A2B

LiquidAI · Proprietary

920.422
Step 3.5 Flash

StepFun · Open Weight

873.0314
GPT-5 mini

OpenAI · Proprietary

8665.32
GPT-5 (high)

OpenAI · Proprietary

8336.2878
GPT-5 (medium)

OpenAI · Proprietary

8336.2871
GLM-4.7

Z.AI · Open Weight

821.169
GPT-4.1 mini

OpenAI · Proprietary

800.7645
GPT-5.3 Codex

OpenAI · Proprietary

7988.2687
GPT-5.4 Pro

OpenAI · Proprietary

74151.7991
GPT-5.4

OpenAI · Proprietary

74151.7989
GLM-5

Z.AI · Open Weight

741.6467
GPT-5.2

OpenAI · Proprietary

73130.3481
DeepSeek R1 Distill Qwen 32B

DeepSeek · Open Weight

600.846
Mistral Medium 3

Mistral · Proprietary

571.243
Grok 4

xAI · Proprietary

5415.665
GLM-4.5

Z.AI · Proprietary

511.4527
Mistral Large 3

Mistral · Proprietary

481.0449
Claude Opus 4.5

Anthropic · Proprietary

461.0177
MiniMax M2.5

MiniMax · Proprietary

462.12
Kimi K2.5

Moonshot AI · Open Weight

452.3864
MiniMax M2.7

MiniMax · Open Weight

452.5362
Claude Sonnet 4.6

Anthropic · Proprietary

441.4883
Kimi K2

Moonshot AI · Proprietary

431.5142
Claude Opus 4.6

Anthropic · Proprietary

401.7887
Claude 4 Sonnet

Anthropic · Proprietary

401.3351
Mistral Large 2

Mistral · Proprietary

381.4538
DeepSeek V3.2

DeepSeek · Open Weight

353.7558
Phi-4

Microsoft · Open Weight

352.0228
GPT-4o mini

OpenAI · Proprietary

333.1650
Gemma 3 27B

Google · Open Weight

312.0417
GPT-4 Turbo

OpenAI · Proprietary

302.8425
Claude 4.1 Opus

Anthropic · Proprietary

291.6652
Claude 4.1 Opus Thinking

Anthropic · Proprietary

291544
Llama 3.1 405B

Meta · Open Weight

292.1941
o3-pro

OpenAI · Proprietary

2784.9358

Speed data sourced from Artificial Analysis. Metrics reflect median performance across providers. Reasoning models typically show higher TTFT due to chain-of-thought processing.

Frequently Asked Questions

What does tokens per second mean for LLMs?

Tokens per second (tok/s) measures how fast an LLM generates output text. Higher is better. A model at 200 tok/s produces roughly 150 words per second — fast enough for real-time streaming. Models below 50 tok/s may feel sluggish in interactive applications.

What is TTFT (Time to First Token)?

TTFT (Time to First Token) measures the latency between sending a request and receiving the first token of the response. Lower is better. For chat applications, TTFT under 1 second feels instant. Reasoning models often have high TTFT (10-150s) because they "think" before responding.

Which LLM is the fastest?

Currently, Mercury 2 by Inception is the fastest at 789 tokens/second. The fastest model scoring above 70 overall is Grok 4.3 at 209 tok/s.

Why are reasoning models slower?

Reasoning models (like o3, GPT-5, Gemini Deep Think) use chain-of-thought processing — they generate internal "thinking" tokens before producing the final answer. This adds significant TTFT latency (often 10-150 seconds) but can dramatically improve accuracy on complex tasks. The output speed (tok/s) once generation starts is usually comparable to standard models.

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.