LLM Speed & Latency Comparison

Compare inference speed across every major AI model. Tokens/sec measures output generation speed. TTFT (Time to First Token) measures response latency.

Speed data from Artificial Analysis. Last updated: 2026-03-31. Median tokens/s, Latency first answer chunk (s).

Fastest Output

Mercury 2

789 tok/s · Inception

Lowest Latency

Ministral 3 3B

0.42s TTFT · Mistral

Fastest (Score 70+)

Grok 4.1 Fast

138 tok/s · Score: 70

Top 25 — Output Speed (tok/s)

Ultra FastFastMediumSlow

Average Speed by Provider

NVIDIA

260 tok/s avg · 2 models

1.3s avg TTFT

xAI

157 tok/s avg · 5 models

6s avg TTFT

Google

136 tok/s avg · 7 models

13.5s avg TTFT

Mistral

126 tok/s avg · 7 models

0.8s avg TTFT

OpenAI

121 tok/s avg · 25 models

42.3s avg TTFT

Meta

93 tok/s avg · 3 models

1.3s avg TTFT

Zhipu AI

82 tok/s avg · 5 models

1.3s avg TTFT

Anthropic

52 tok/s avg · 7 models

3.3s avg TTFT

DeepSeek

48 tok/s avg · 2 models

2.3s avg TTFT

MiniMax

46 tok/s avg · 2 models

2.3s avg TTFT

Moonshot AI

44 tok/s avg · 2 models

1.9s avg TTFT

Model
Mercury 2

Inception · Proprietary

7893.8849
Nemotron 3 Super 100B

NVIDIA · Open Weight

3670.7156
GPT-OSS 20B

OpenAI · Open Weight

3130.6536
Ministral 3 3B

Mistral · Open Weight

2740.4219
GPT-OSS 120B

OpenAI · Open Weight

2620.7950
Grok 4.20

xAI · Proprietary

23310.339
Gemini 2.5 Flash

Google · Proprietary

2210.550
Gemini 3.1 Flash-Lite

Google · Proprietary

2057.556
GPT-5.4 mini

OpenAI · Proprietary

2013.8566
GPT-5.4 nano

OpenAI · Proprietary

1913.6458
Grok 3 Mini

xAI · Proprietary

1900.5449
Ministral 3 8B

Mistral · Open Weight

1820.5223
GPT-4.1 nano

OpenAI · Proprietary

1810.6344
Mistral Small 4

Mistral · Open Weight

1750.6462
Grok Code Fast 1

xAI · Proprietary

1722.8156
o4-mini (high)

OpenAI · Proprietary

16121.9458
o3-mini

OpenAI · Proprietary

1607.1265
Gemini 3 Flash

Google · Proprietary

1591.1967
Nemotron 3 Nano 30B

NVIDIA · Open Weight

1521.942
Nova Pro

Amazon · Proprietary

1410.8133
Grok 4.1 Fast

xAI · Proprietary

1380.5470
Claude 3 Haiku

Anthropic · Proprietary

1381.1643
GPT-5 nano

OpenAI · Proprietary

13783.336
GPT-4o

OpenAI · Proprietary

1310.8150
MiMo-V2-Flash

Xiaomi · Open Weight

1292.1467
Llama 4 Scout

Meta · Open Weight

1280.744
GPT-5.2-Codex

OpenAI · Proprietary

12387.3482
Llama 4 Maverick

Meta · Open Weight

1210.9539
o3

OpenAI · Proprietary

1185.3864
Gemini 2.5 Pro

Google · Proprietary

11721.1965
GPT-5.1

OpenAI · Proprietary

11157.4778
Ministral 3 14B

Mistral · Open Weight

1100.639
Gemini 3.1 Pro

Google · Proprietary

10929.7187
Gemini 3 Pro

Google · Proprietary

10932.6579
GPT-4.1

OpenAI · Proprietary

1081.0264
GLM-4.5-Air

Zhipu AI · Proprietary

1061.1838
o1

OpenAI · Proprietary

9832.2964
Qwen3.5 397B

Alibaba · Open Weight

962.4468
GLM-4.7-Flash

Zhipu AI · Open Weight

950.9147
LFM2-24B-A2B

LiquidAI · Proprietary

920.4227
Step 3.5 Flash

StepFun · Open Weight

873.0353
GPT-5 mini

OpenAI · Proprietary

8665.3251
GPT-5 (high)

OpenAI · Proprietary

8336.2882
GPT-5 (medium)

OpenAI · Proprietary

8336.2876
GLM-4.7

Zhipu AI · Open Weight

821.174
GPT-4.1 mini

OpenAI · Proprietary

800.7657
GPT-5.3 Codex

OpenAI · Proprietary

7988.2685
GPT-5.4 Pro

OpenAI · Proprietary

74151.7992
GPT-5.4

OpenAI · Proprietary

74151.7982
GLM-5

Zhipu AI · Open Weight

741.6475
GPT-5.2

OpenAI · Proprietary

73130.3482
DeepSeek R1 Distill Qwen 32B

DeepSeek · Open Weight

600.843
Mistral Medium 3

Mistral · Proprietary

571.253
Grok 4

xAI · Proprietary

5415.668
GLM-4.5

Zhipu AI · Proprietary

511.4540
Mistral Large 3

Mistral · Proprietary

481.0458
Claude Opus 4.5

Anthropic · Proprietary

461.0176
MiniMax M2.5

MiniMax · Proprietary

462.1248
Kimi K2.5

Moonshot AI · Open Weight

452.3872
MiniMax M2.7

MiniMax · Proprietary

452.5366
Claude Sonnet 4.6

Anthropic · Proprietary

441.4884
Kimi K2

Moonshot AI · Proprietary

431.5153
Claude Opus 4.6

Anthropic · Proprietary

401.7885
Claude 4 Sonnet

Anthropic · Proprietary

401.3362
Mistral Large 2

Mistral · Proprietary

381.4552
DeepSeek V3.2

DeepSeek · Open Weight

353.7561
Phi-4

Microsoft · Open Weight

352.0240
GPT-4o mini

OpenAI · Proprietary

333.1654
Gemma 3 27B

Google · Open Weight

312.0435
GPT-4 Turbo

OpenAI · Proprietary

302.8443
Claude 4.1 Opus

Anthropic · Proprietary

291.6662
Claude 4.1 Opus Thinking

Anthropic · Proprietary

291557
Llama 3.1 405B

Meta · Open Weight

292.1953
o3-pro

OpenAI · Proprietary

2784.9367

Speed data sourced from Artificial Analysis. Metrics reflect median performance across providers. Reasoning models typically show higher TTFT due to chain-of-thought processing.

Frequently Asked Questions

What does tokens per second mean for LLMs?

Tokens per second (tok/s) measures how fast an LLM generates output text. Higher is better. A model at 200 tok/s produces roughly 150 words per second — fast enough for real-time streaming. Models below 50 tok/s may feel sluggish in interactive applications.

What is TTFT (Time to First Token)?

TTFT (Time to First Token) measures the latency between sending a request and receiving the first token of the response. Lower is better. For chat applications, TTFT under 1 second feels instant. Reasoning models often have high TTFT (10-150s) because they "think" before responding.

Which LLM is the fastest?

Currently, Mercury 2 by Inception is the fastest at 789 tokens/second. The fastest model scoring above 70 overall is Grok 4.1 Fast at 138 tok/s.

Why are reasoning models slower?

Reasoning models (like o3, GPT-5, Gemini Deep Think) use chain-of-thought processing — they generate internal "thinking" tokens before producing the final answer. This adds significant TTFT latency (often 10-150 seconds) but can dramatically improve accuracy on complex tasks. The output speed (tok/s) once generation starts is usually comparable to standard models.

Weekly LLM Benchmark Digest

Get notified when new models drop, benchmark scores change, or the leaderboard shifts. One email per week.

Free. No spam. Unsubscribe anytime. We only store derived location metadata for consent routing.