Speed Benchmarks
Compare inference speed (tokens/sec) across AI providers. Speed-optimized providers like Groq use custom silicon for dramatically faster output.
Fastest: Gemma 2 9B (Groq) at 950 tok/s
Output Speed (tokens/sec)
| Model | Provider | Tokens/sec | TTFT (ms) | Category | ||
|---|---|---|---|---|---|---|
| 🥇Gemma 2 9B (Groq) | Groq | 950 | 30 | Speed-Optimized | ||
| 🥈Llama 3.3 70B (Groq) | Groq | 820 | 45 | Speed-Optimized | ||
| 🥉Llama 4 Maverick (Groq) | Groq | 710 | 55 | Speed-Optimized | ||
| Mixtral 8x7B (Groq) | Groq | 580 | 60 | Speed-Optimized | ||
| Gemini 2.5 Flash Lite | 480 | 50 | Standard | |||
| Gemini 2.5 Flash | 350 | 80 | Standard | |||
| GPT-5 Nano | OpenAI | 260 | 95 | Standard | ||
| GPT-5 Mini | OpenAI | 155 | 180 | Standard | ||
| Grok 3 | xAI | 140 | 200 | Standard | ||
| Devstral 2 | Mistral | 130 | 170 | Open Source | ||
| Claude Sonnet 4 | Anthropic | 125 | 230 | Standard | ||
| Gemini 2.5 Pro | 120 | 250 | Standard | |||
| Gemini 3 Pro Preview | 110 | 280 | Standard | |||
| Llama 4 Maverick | Meta | 100 | 260 | Open Source | ||
| DeepSeek V3 | DeepSeek | 95 | 280 | Open Source | ||
| Mistral Large | Mistral | 90 | 240 | Open Source | ||
| GPT-5 | OpenAI | 85 | 320 | Standard | ||
| GPT-5.2 | OpenAI | 75 | 380 | Standard | ||
| DeepSeek R1 | DeepSeek | 55 | 400 | Open Source | ||
| Claude Opus 4 | Anthropic | 42 | 450 | Standard |
Custom Silicon
Groq's LPU (Language Processing Unit) is purpose-built for LLM inference, delivering 5–10× faster output than GPU-based providers.
TTFT Matters
Time-to-first-token (TTFT) affects perceived speed. Speed-optimized providers often respond in under 60ms — instant for users.
Trade-offs
Speed-optimized providers currently support fewer models. Frontier models (GPT-5, Claude Opus) prioritize quality over raw throughput.
Benchmarks are approximate figures based on publicly available data and community testing. Actual performance varies by prompt length, concurrency, region, and model configuration.