At a glance

Googlepricing, performance & catalog

The citable facts about Google's 12 models — sourced from provider APIs and refreshed continuously.

Lowest price
Gemini 3.1 Flash-Lite at $0.250 per 1M input tokens
Highest throughput
Gemini 3.1 Pro at 90 tokens/s
Lowest latency
Gemini 3.1 Pro at 0.60s
Largest context
Gemini 3.5 Flash at 1.0M tokens
Catalog
12 active models from 5 organizations

FAQ

Common questions about Google.

What is Google?

Google is an API provider that hosts large language models. Active models: 12; From (input): $0.25 / 1M tok; Avg throughput: 88 tok/s; Avg latency: 0.65 s; Max context: 1.0M.

How many models does Google offer?

Google currently serves 12 active models out of 41 historical offerings on LLM Stats.

What is Google's API pricing?

Google input pricing starts from $0.25 per 1M tokens, with the most expensive offering at $2.5 per 1M tokens. See the Pricing tab above for the full per-model breakdown.

How fast is Google?

Google averages 88 output tokens per second across its catalog, with average latency of 0.65s. Per-model performance is shown in the Performance tab.

Is Google OpenAI compatible?

Most providers expose an OpenAI-compatible /v1/chat/completions endpoint so you can switch from OpenAI to Google by changing only the base URL and API key. Check https://ai.google.dev for the exact endpoint format and any provider-specific parameters.

Does Google support multimodal models?

Yes. Google's catalog includes 18 vision-capable, 10 image generation, and 6 video models. See the Models and Capabilities tabs for the full per-model breakdown.

Whose models does Google host?

Google hosts models from Google, AI21 Labs, Anthropic, Meta, and Mistral AI. See the Models tab for the full catalog grouped by creator.

How do I start using Google?

Sign up at https://ai.google.dev to get an API key, then call Google's API directly from your application. Most clients work out of the box by pointing the OpenAI SDK at Google's base URL with your key. Use the Pricing and Performance tabs above to pick the right model for your latency, cost, and context-window requirements.