At a glance

Replicatepricing, performance & catalog

The citable facts about Replicate's 35 models — sourced from provider APIs and refreshed continuously.

Largest context
Flux 2 Klein at 10K tokens
Catalog
35 active models from 13 organizations

Most affordable

No public pricing data.

Fastest

No throughput data yet.

FAQ

Common questions about Replicate.

What is Replicate?

Replicate is an API provider that hosts large language models. Active models: 35; Max context: 10K.

How many models does Replicate offer?

Replicate currently serves 35 active models out of 39 historical offerings on LLM Stats.

Is Replicate OpenAI compatible?

Most providers expose an OpenAI-compatible /v1/chat/completions endpoint so you can switch from OpenAI to Replicate by changing only the base URL and API key. Check https://replicate.com/ for the exact endpoint format and any provider-specific parameters.

Does Replicate support multimodal models?

Yes. Replicate's catalog includes 46 vision-capable, 39 image generation, 16 audio, and 44 video models. See the Models and Capabilities tabs for the full per-model breakdown.

Whose models does Replicate host?

Replicate hosts models from Alibaba, Black Forest Labs, ByteDance, Kling AI, MiniMax, and Alibaba Cloud / Qwen Team, plus 7 more. See the Models tab for the full catalog grouped by creator.

How do I start using Replicate?

Sign up at https://replicate.com/ to get an API key, then call Replicate's API directly from your application. Most clients work out of the box by pointing the OpenAI SDK at Replicate's base URL with your key. Use the Pricing and Performance tabs above to pick the right model for your latency, cost, and context-window requirements.