LLM Speed & Latency Comparison

Compare inference speed across every major AI model. Tokens/sec measures output generation speed. TTFT (Time to First Token) measures response latency.

Speed data from Artificial Analysis. Last updated: 2026-05-01. Median tokens/s, Latency first answer chunk (s).

Fastest Output

Mercury 2

789 tok/s · Inception

Lowest Latency

Ministral 3 3B

0.42s TTFT · Mistral

Fastest (Score 70+)

Grok 4.3

209 tok/s · Score: 77

Top 25 — Output Speed (tok/s)

Ultra FastFastMediumSlow

Average Speed by Provider

NVIDIA

260 tok/s avg · 2 models

1.3s avg TTFT

xAI

166 tok/s avg · 6 models

7s avg TTFT

Google

136 tok/s avg · 7 models

13.5s avg TTFT

Mistral

126 tok/s avg · 7 models

0.8s avg TTFT

OpenAI

121 tok/s avg · 25 models

42.3s avg TTFT

Meta

93 tok/s avg · 3 models

1.3s avg TTFT

Z.AI

82 tok/s avg · 5 models

1.3s avg TTFT

Anthropic

52 tok/s avg · 7 models

3.3s avg TTFT

DeepSeek

48 tok/s avg · 2 models

2.3s avg TTFT

MiniMax

46 tok/s avg · 2 models

2.3s avg TTFT

Moonshot AI

44 tok/s avg · 2 models

1.9s avg TTFT

Model			Tier			Reasoning
Mercury 2 Inception · Proprietary	789	3.88	Ultra Fast	11	$0.75	Reasoning
Nemotron 3 Super 100B NVIDIA · Open Weight	367	0.71	Ultra Fast	44	$0	Standard
GPT-OSS 20B OpenAI · Open Weight	313	0.65	Ultra Fast	17	$0	Standard
Ministral 3 3B Mistral · Open Weight	274	0.42	Ultra Fast	1	$0.1	Standard
GPT-OSS 120B OpenAI · Open Weight	262	0.79	Ultra Fast	35	$0	Standard
Grok 4.20 xAI · Proprietary	233	10.33	Ultra Fast	65	$6	Reasoning
Gemini 2.5 Flash Google · Proprietary	221	0.5	Ultra Fast	38	$2.5	Standard
Ling 2.6 Flash InclusionAI · Open Weight	209.5	1.07	Ultra Fast	36	—	Standard
Grok 4.3 xAI · Proprietary	209	12.36	Ultra Fast	77	$2.5	Reasoning
Gemini 3.1 Flash-Lite Google · Proprietary	205	7.5	Ultra Fast	48	$1.5	Standard
GPT-5.4 mini OpenAI · Proprietary	201	3.85	Ultra Fast	71	$4.5	Reasoning
GPT-5.4 nano OpenAI · Proprietary	191	3.64	Fast	61	$1.25	Reasoning
Grok 3 Mini xAI · Proprietary	190	0.54	Fast	46	$0.5	Reasoning
Ministral 3 8B Mistral · Open Weight	182	0.52	Fast	3	$0.15	Standard
GPT-4.1 nano OpenAI · Proprietary	181	0.63	Fast	27	$0.4	Standard
Mistral Small 4 Mistral · Open Weight	175	0.64	Fast	46	$0.6	Standard
Grok Code Fast 1 xAI · Proprietary	172	2.81	Fast	40	$1.5	Standard
o4-mini (high) OpenAI · Proprietary	161	21.94	Fast	44	—	Reasoning
o3-mini OpenAI · Proprietary	160	7.12	Fast	56	$4.4	Reasoning
Gemini 3 Flash Google · Proprietary	159	1.19	Fast	65	$3	Standard
Nemotron 3 Nano 30B NVIDIA · Open Weight	152	1.9	Fast	26	$0	Standard
Nova Pro Amazon · Proprietary	141	0.81	Fast	10	—	Standard
Grok 4.1 Fast xAI · Proprietary	138	0.54	Fast	70	$0.5	Standard
Claude 3 Haiku Anthropic · Proprietary	138	1.16	Fast	24	$1.25	Standard
GPT-5 nano OpenAI · Proprietary	137	83.3	Fast	—	$0.4	Reasoning
GPT-4o OpenAI · Proprietary	131	0.81	Fast	43	$10	Standard
MiMo-V2-Flash Xiaomi · Open Weight	129	2.14	Fast	60	$0	Reasoning
Llama 4 Scout Meta · Open Weight	128	0.7	Fast	22	$0	Standard
GPT-5.2-Codex OpenAI · Proprietary	123	87.34	Fast	77	$14	Reasoning
Llama 4 Maverick Meta · Open Weight	121	0.95	Fast	17	$0	Standard
o3 OpenAI · Proprietary	118	5.38	Fast	58	$8	Reasoning
Gemini 2.5 Pro Google · Proprietary	117	21.19	Fast	65	$10	Standard
GPT-5.1 OpenAI · Proprietary	111	57.47	Fast	79	$10	Reasoning
Ministral 3 14B Mistral · Open Weight	110	0.6	Fast	5	$0.2	Standard
Gemini 3.1 Pro Google · Proprietary	109	29.71	Fast	92	$12	Standard
Gemini 3 Pro Google · Proprietary	109	32.65	Fast	81	$12	Standard
GPT-4.1 OpenAI · Proprietary	108	1.02	Fast	58	$8	Standard
GLM-4.5-Air Z.AI · Proprietary	106	1.18	Fast	19	$1.1	Standard
o1 OpenAI · Proprietary	98	32.29	Medium	57	$60	Reasoning
Qwen3.5 397B Alibaba · Open Weight	96	2.44	Medium	64	$3.6	Standard
GLM-4.7-Flash Z.AI · Open Weight	95	0.91	Medium	11	$0	Reasoning
LFM2-24B-A2B LiquidAI · Proprietary	92	0.42	Medium	2	$0	Standard
Step 3.5 Flash StepFun · Open Weight	87	3.03	Medium	14	$0.3	Standard
GPT-5 mini OpenAI · Proprietary	86	65.32	Medium	—	$2	Reasoning
GPT-5 (high) OpenAI · Proprietary	83	36.28	Medium	78	$10	Reasoning
GPT-5 (medium) OpenAI · Proprietary	83	36.28	Medium	71	—	Reasoning
GLM-4.7 Z.AI · Open Weight	82	1.1	Medium	69	$0	Reasoning
GPT-4.1 mini OpenAI · Proprietary	80	0.76	Medium	45	$1.6	Standard
GPT-5.3 Codex OpenAI · Proprietary	79	88.26	Medium	87	$14	Reasoning
GPT-5.4 Pro OpenAI · Proprietary	74	151.79	Medium	91	$180	Reasoning
GPT-5.4 OpenAI · Proprietary	74	151.79	Medium	89	$15	Reasoning
GLM-5 Z.AI · Open Weight	74	1.64	Medium	67	$3.2	Standard
GPT-5.2 OpenAI · Proprietary	73	130.34	Medium	81	$14	Reasoning
DeepSeek R1 Distill Qwen 32B DeepSeek · Open Weight	60	0.84	Medium	6	$0	Reasoning
Mistral Medium 3 Mistral · Proprietary	57	1.2	Medium	43	$2	Standard
Grok 4 xAI · Proprietary	54	15.6	Medium	65	—	Standard
GLM-4.5 Z.AI · Proprietary	51	1.45	Medium	27	$2.2	Standard
Mistral Large 3 Mistral · Proprietary	48	1.04	Slow	49	$1.5	Standard
Claude Opus 4.5 Anthropic · Proprietary	46	1.01	Slow	77	$25	Standard
MiniMax M2.5 MiniMax · Proprietary	46	2.12	Slow	—	$1.2	Standard
Kimi K2.5 Moonshot AI · Open Weight	45	2.38	Slow	64	$3	Standard
MiniMax M2.7 MiniMax · Open Weight	45	2.53	Slow	62	$1.2	Standard
Claude Sonnet 4.6 Anthropic · Proprietary	44	1.48	Slow	83	$15	Standard
Kimi K2 Moonshot AI · Proprietary	43	1.51	Slow	42	$2.5	Standard
Claude Opus 4.6 Anthropic · Proprietary	40	1.78	Slow	87	$25	Standard
Claude 4 Sonnet Anthropic · Proprietary	40	1.33	Slow	51	$15	Standard
Mistral Large 2 Mistral · Proprietary	38	1.45	Slow	38	—	Standard
DeepSeek V3.2 DeepSeek · Open Weight	35	3.75	Slow	58	$0.42	Standard
Phi-4 Microsoft · Open Weight	35	2.02	Slow	28	$0	Standard
GPT-4o mini OpenAI · Proprietary	33	3.16	Slow	50	$0.6	Standard
Gemma 3 27B Google · Open Weight	31	2.04	Slow	17	$0	Standard
GPT-4 Turbo OpenAI · Proprietary	30	2.84	Slow	25	$30	Standard
Claude 4.1 Opus Anthropic · Proprietary	29	1.66	Slow	52	$75	Standard
Claude 4.1 Opus Thinking Anthropic · Proprietary	29	15	Slow	44	—	Reasoning
Llama 3.1 405B Meta · Open Weight	29	2.19	Slow	41	$0	Standard
o3-pro OpenAI · Proprietary	27	84.93	Slow	58	$80	Reasoning

Speed data sourced from Artificial Analysis. Metrics reflect median performance across providers. Reasoning models typically show higher TTFT due to chain-of-thought processing.

Price vs Performance

See which models offer the best value

Full Pricing Table

Compare all LLM API prices

Benchmark Confidence

Which scores can you trust?

Frequently Asked Questions

What does tokens per second mean for LLMs?

Tokens per second (tok/s) measures how fast an LLM generates output text. Higher is better. A model at 200 tok/s produces roughly 150 words per second — fast enough for real-time streaming. Models below 50 tok/s may feel sluggish in interactive applications.

What is TTFT (Time to First Token)?

TTFT (Time to First Token) measures the latency between sending a request and receiving the first token of the response. Lower is better. For chat applications, TTFT under 1 second feels instant. Reasoning models often have high TTFT (10-150s) because they "think" before responding.

Which LLM is the fastest?

Currently, Mercury 2 by Inception is the fastest at 789 tokens/second. The fastest model scoring above 70 overall is Grok 4.3 at 209 tok/s.

Why are reasoning models slower?

Reasoning models (like o3, GPT-5, Gemini Deep Think) use chain-of-thought processing — they generate internal "thinking" tokens before producing the final answer. This adds significant TTFT latency (often 10-150 seconds) but can dramatically improve accuracy on complex tasks. The output speed (tok/s) once generation starts is usually comparable to standard models.

The AI models change fast. We track them for you.

For engineers, researchers, and the plain curious — a weekly brief on new models, ranking shifts, and pricing changes.

Free. No spam. Unsubscribe anytime.