Pricing

Free

Get started with Ollama

Download

Automate coding, document analysis, and other tasks with open models
Keep your data private
Run models on your hardware
Access cloud models
CLI, API, and desktop apps
40,000+ community integrations
Unlimited public models

Pro

Solve harder tasks, faster

$20 / mo

or $200/yr billed annually

Get Pro

Everything in Free, plus:

Access larger, more powerful cloud models
Run 3 cloud models at a time
50x more cloud usage than Free
Upload and share private models

Max

For your most demanding work

$100 / mo

Get Max

Everything in Pro, plus:

Run 10 cloud models at a time
5x more usage than Pro

Frequently asked questions

Models

Which models are available?

See the full list of cloud-enabled models here.
Do models support tool calling?

Yes. Cloud models that are trained to support tools are tested for tool calling and with real agent workflows before they go live. If something isn't working, let us know at [email protected].
What quantization or data format do cloud models use?

Native weights, as released by the model provider. On modern NVIDIA hardware, models may use accelerated data formats supported by Blackwell and Vera Rubin architectures (e.g. NVFP4).
How fast is Ollama?

Speed depends on model size, architecture, and hardware optimization. We target and monitor for low time-to-first-token and high throughput across all cloud models. Priority tiers with faster performance may be available in the future.

Usage

What are the usage limits for each plan?

Running models on your own hardware is always unlimited. Cloud usage varies by plan:

Plan	Usage	Example use cases
Free	Light usage	Chatting with models, evaluating larger models, coding and AI assistants with smaller models
Pro	Day-to-day work	Larger models, coding automation, deep research
Max	Heavy, sustained usage	Continuous agent tasks, multiple concurrent agents, large models over extended sessions

Each plan has session limits that reset every 5 hours and weekly limits that reset every 7 days.

How is usage measured?

Usage reflects actual utilization of Ollama's cloud infrastructure - primarily GPU time, which depends on model size and request duration. Shorter requests and prompts that share cached context use less.

This is different from fixed token or request-based plans. Ollama doesn't cap you at a set number of tokens. As hardware and model architectures get more efficient, you'll get more out of your plan over time.
How much usage does each model use?

Models consume a different amount of usage based on how difficult they are to run. To view a model's usage level, visit the model's page, where its usage level is displayed from small, light models (level 1), like gpt-oss:20b, to extra heavy models (level 4), like deepseek-v4-pro.
How does extra usage work?

Pro and Max users can add extra usage balance. Ollama uses your included plan limits first, then draws from your extra usage balance when you go over those limits.
How much more usage does Pro include?

50x more than Free.
How much more usage does Max include?

5x more than Pro.
How do I know when I've hit my limit?

Check your usage here anytime. At 90% of your plan's limit, Ollama sends an email reminder. You can turn this off in settings.

How many cloud models can I run at once?

Concurrency limits ensure dedicated capacity for workflows that need multiple models running simultaneously:

Plan	Concurrent models
Free	1
Pro	3
Max	10

Requests beyond your plan's concurrency limit are queued and processed as soon as a slot is available. Queued requests are held up to a fixed limit - if the queue is full, the request will be rejected until one of your concurrency slots opens.

Privacy

Where are models hosted?

Ollama hosts models and compute resources primarily in the United States. To serve global demand, we may route to Europe and Singapore for additional capacity.
Is my prompt or response data trained on?

Prompt or response data is never logged or trained on.
Who does Ollama partner with to host models?

Ollama collaborates with NVIDIA Cloud Providers (NCPs) to host open models.

When Ollama partners with providers, we require no logging, no training, and zero data retention policies in place.

Pricing

Free

Pro

Max

Frequently asked questions

Models

Which models are available?

Do models support tool calling?

What quantization or data format do cloud models use?

How fast is Ollama?

Usage

What are the usage limits for each plan?

How is usage measured?

How much usage does each model use?

How does extra usage work?

How much more usage does Pro include?

How much more usage does Max include?

How do I know when I've hit my limit?

How many cloud models can I run at once?

Privacy

Where are models hosted?

Is my prompt or response data trained on?

Who does Ollama partner with to host models?