nemotron-3-ultra:cloud

nemotron-3-ultra:cloud

13K Downloads Updated 3 weeks ago

NVIDIA Nemotron 3 Ultra is built for high-throughput reasoning and long-running agent workflows.

tools thinking cloud

Usage

high

Context

256K tokens

Size

550B parameters

ollama run nemotron-3-ultra:cloud

curl http://localhost:11434/api/chat \
  -d '{
    "model": "nemotron-3-ultra:cloud",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

from ollama import chat

response = chat(
    model='nemotron-3-ultra:cloud',
    messages=[{'role': 'user', 'content': 'Hello!'}],
)
print(response.message.content)

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'nemotron-3-ultra:cloud',
  messages: [{role: 'user', content: 'Hello!'}],
})
console.log(response.message.content)

Readme

NVIDIA Nemotron 3 Ultra is a 550 billion parameter (55B active) open model from NVIDIA built for long-running, agentic workflows with fast and affordable performance across hundreds of tool calls.

Model highlights

Built for long-running agents: Tuned for agent orchestration, coding agents, deep research, and complex enterprise workflows that run across hundreds of steps.
1M token context: Keep entire codebases, long tool histories, and research trails in context without losing the thread.
Frontier reasoning, high efficiency: 550B total parameters with only 55B active per token, and optimized for NVFP4, NVIDIA’s 4-bit floating point format that packs the model into less memory and runs faster.

Benchmarks

Nemotron 3 Ultra leads on accuracy across agent productivity, instruction following, and long-context tasks, while delivering leading throughput—saving up to 30% on costs compared to other leading open models.

Figure 1: Nemotron 3 Ultra leads among open models on agentic benchmarks for agent productivity, coding, and instruction following.

Reference

NVIDIA Nemotron 3 Ultra blog

NVIDIA Nemotron 3 Ultra is a 550 billion parameter (55B active) open model from NVIDIA built for long-running, agentic workflows with fast and affordable performance across hundreds of tool calls.

### Model highlights

- **Built for long-running agents:** Tuned for agent orchestration, coding agents, deep research, and complex enterprise workflows that run across hundreds of steps.
- **1M token context:** Keep entire codebases, long tool histories, and research trails in context without losing the thread.
- **Frontier reasoning, high efficiency:** 550B total parameters with only 55B active per token, and optimized for NVFP4, NVIDIA’s 4-bit floating point format that packs the model into less memory and runs faster.

### Benchmarks

Nemotron 3 Ultra leads on accuracy across agent productivity, instruction following, and long-context tasks, while delivering leading throughput—saving up to 30% on costs compared to other leading open models.

![Table showing Nemotron 3 Ultra leading among open models on agentic benchmarks for agent productivity, coding, and instruction following.](https://files.ollama.com/nemotron-3-ultra-agentic-benchmarks.png)
*Figure 1: Nemotron 3 Ultra leads among open models on agentic benchmarks for agent productivity, coding, and instruction following.*

### Reference

- [NVIDIA Nemotron 3 Ultra blog](https://developer.nvidia.com/blog/nvidia-nemotron-3-ultra-powers-faster-more-efficient-reasoning-for-long-running-agents/)

Paste, drop or click to upload images (.png, .jpeg, .jpg, .svg, .gif)