391 20 hours ago

NVIDIA Nemotron 3 Ultra is built for high-throughput reasoning and long-running agent workflows.

tools thinking cloud
Usage
high
Context
256K tokens
Size
550B parameters
ollama run nemotron-3-ultra:cloud

Readme

NVIDIA Nemotron 3 Ultra is a 550 billion parameter (55B active) open model from NVIDIA built for long-running, agentic workflows with fast and affordable performance across hundreds of tool calls.

Model highlights

  • Built for long-running agents: Tuned for agent orchestration, coding agents, deep research, and complex enterprise workflows that run across hundreds of steps.
  • 1M token context: Keep entire codebases, long tool histories, and research trails in context without losing the thread.
  • Frontier reasoning, high efficiency: 550B total parameters with only 55B active per token, and optimized for NVFP4, NVIDIA’s 4-bit floating point format that packs the model into less memory and runs faster.

Benchmarks

Nemotron 3 Ultra leads on accuracy across agent productivity, instruction following, and long-context tasks, while delivering leading throughput—saving up to 30% on costs compared to other leading open models.

Table showing Nemotron 3 Ultra leading among open models on agentic benchmarks for agent productivity, coding, and instruction following. Figure 1: Nemotron 3 Ultra leads among open models on agentic benchmarks for agent productivity, coding, and instruction following.

Reference