179.4K 3 weeks ago

NVIDIA Nemotron 3 Super is a 120B open MoE model activating just 12B parameters to deliver maximum compute efficiency and accuracy for complex multi-agent applications.

tools thinking cloud 120b
ollama run nemotron-3-super

Applications

Claude Code
Claude Code ollama launch claude --model nemotron-3-super
Codex
Codex ollama launch codex --model nemotron-3-super
OpenCode
OpenCode ollama launch opencode --model nemotron-3-super
OpenClaw
OpenClaw ollama launch openclaw --model nemotron-3-super

Models

View all →

Readme

NVIDIA Nemotron™ is a family of open models with open weights, training data, and recipes, delivering leading efficiency and accuracy for building specialized AI agents.

Nemotron-3-Super is a large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities. It is optimized for collaborative agents and high-volume workloads such as IT ticket automation. Like other models in the family, it responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model’s reasoning capabilities can be configured through a flag in the chat template.

The model has 12B active parameters and 120B parameters in total.

The supported languages include: English, French, German, Italian, Japanese, Spanish, and Chinese

This model is ready for commercial use.

nemotron 3 super

Benchmarks

Benchmark Nemotron 3 Super Qwen3.5-122B-A10B GPT-OSS-120B
General Knowledge
MMLU-Pro 83.73 86.70 81.00
Reasoning
AIME25 (no tools) 90.21 90.36 92.50
HMMT Feb25 (no tools) 93.67 91.40 90.00
HMMT Feb25 (with tools) 94.73 89.55
GPQA (no tools) 79.23 86.60 80.10
GPQA (with tools) 82.70 80.09
LiveCodeBench (v5 2024-07 to 2024-12) 81.19 78.93 88.00
SciCode (subtask) 42.05 42.00 39.00
HLE (no tools) 18.26 25.30 14.90
HLE (with tools) 22.82 19.0
Agentic
Terminal Bench (hard subset) 25.78 26.80 24.00
Terminal Bench Core 2.0 31.00 37.50 18.70
SWE-Bench (OpenHands) 60.47 66.40 41.9
SWE-Bench (OpenCode) 59.20 67.40
SWE-Bench (Codex) 53.73 61.20
SWE-Bench Multilingual (OpenHands) 45.78 30.80
TauBench V2
Airline 56.25 66.0 49.2
Retail 62.83 62.6 67.80
Telecom 64.36 95.00 66.00
Average 61.15 74.53 61.0
BrowseComp with Search 31.28 33.89
BIRD Bench 41.80 38.25
Chat & Instruction Following
IFBench (prompt) 72.56 73.77 68.32
Scale AI Multi-Challenge 55.23 61.50 58.29
Arena-Hard-V2 73.88 75.15 90.26
Long Context
AA-LCR 58.31 66.90 51.00
RULER @ 256k 96.30 96.74 52.30
RULER @ 512k 95.67 95.95 46.70
RULER @ 1M 91.75 91.33 22.30
Multilingual
MMLU-ProX (avg over langs) 79.36 85.06 76.59
WMT24++ (en→xx) 86.67 87.84 88.89