At a glance

Novitapricing, performance & catalog

The citable facts about Novita's 21 models — sourced from provider APIs and refreshed continuously.

Lowest price
Qwen3 VL 8B Instruct at $0.080 per 1M input tokens
Highest throughput
Qwen3 30B A3B at 89 tokens/s
Lowest latency
Qwen3 30B A3B at 0.73s
Largest context
MiMo-V2.5-Pro at 1.0M tokens
Catalog
21 active models from 10 organizations

FAQ

Common questions about Novita.

What is Novita?

Novita is an API provider that hosts large language models. Active models: 21; From (input): $0.08 / 1M tok; Avg throughput: 45 tok/s; Avg latency: 0.95 s; Max context: 1.0M.

How many models does Novita offer?

Novita currently serves 21 active models out of 45 historical offerings on LLM Stats.

What is Novita's API pricing?

Novita input pricing starts from $0.08 per 1M tokens, with the most expensive offering at $2 per 1M tokens. See the Pricing tab above for the full per-model breakdown.

How fast is Novita?

Novita averages 45 output tokens per second across its catalog, with average latency of 0.95s. Per-model performance is shown in the Performance tab.

Is Novita OpenAI compatible?

Most providers expose an OpenAI-compatible /v1/chat/completions endpoint so you can switch from OpenAI to Novita by changing only the base URL and API key. Check https://novita.ai/ for the exact endpoint format and any provider-specific parameters.

Does Novita support multimodal models?

Yes. Novita's catalog includes 12 vision-capable models. See the Models and Capabilities tabs for the full per-model breakdown.

Whose models does Novita host?

Novita hosts models from DeepSeek, Google, MiniMax, Moonshot AI, OpenAI, and Alibaba Cloud / Qwen Team, plus 4 more. See the Models tab for the full catalog grouped by creator.

How do I start using Novita?

Sign up at https://novita.ai/ to get an API key, then call Novita's API directly from your application. Most clients work out of the box by pointing the OpenAI SDK at Novita's base URL with your key. Use the Pricing and Performance tabs above to pick the right model for your latency, cost, and context-window requirements.