At a glance

DeepInfrapricing, performance & catalog

The citable facts about DeepInfra's 12 models — sourced from provider APIs and refreshed continuously.

Lowest price
Nemotron 3 Nano (30B A3B) at $0.060 per 1M input tokens
Highest throughput
Qwen3 30B A3B at 83 tokens/s
Lowest latency
Qwen3 30B A3B at 0.84s
Largest context
DeepSeek-V4-Pro-Max at 1.0M tokens
Catalog
12 active models from 12 organizations

FAQ

Common questions about DeepInfra.

What is DeepInfra?

DeepInfra is an API provider that hosts large language models. Active models: 12; From (input): $0.06 / 1M tok; Avg throughput: 55 tok/s; Avg latency: 1.01 s; Max context: 1.0M.

How many models does DeepInfra offer?

DeepInfra currently serves 12 active models out of 47 historical offerings on LLM Stats.

What is DeepInfra's API pricing?

DeepInfra input pricing starts from $0.06 per 1M tokens, with the most expensive offering at $1.74 per 1M tokens. See the Pricing tab above for the full per-model breakdown.

How fast is DeepInfra?

DeepInfra averages 55 output tokens per second across its catalog, with average latency of 1.01s. Per-model performance is shown in the Performance tab.

Is DeepInfra OpenAI compatible?

Most providers expose an OpenAI-compatible /v1/chat/completions endpoint so you can switch from OpenAI to DeepInfra by changing only the base URL and API key. Check https://deepinfra.com/ for the exact endpoint format and any provider-specific parameters.

Does DeepInfra support multimodal models?

Yes. DeepInfra's catalog includes 6 vision-capable models. See the Models and Capabilities tabs for the full per-model breakdown.

Whose models does DeepInfra host?

DeepInfra hosts models from DeepSeek, NVIDIA, OpenAI, Alibaba Cloud / Qwen Team, Xiaomi, and Google, plus 6 more. See the Models tab for the full catalog grouped by creator.

How do I start using DeepInfra?

Sign up at https://deepinfra.com/ to get an API key, then call DeepInfra's API directly from your application. Most clients work out of the box by pointing the OpenAI SDK at DeepInfra's base URL with your key. Use the Pricing and Performance tabs above to pick the right model for your latency, cost, and context-window requirements.