Try Nemotron 3 Super in the Workbench
Run this model interactively, tune parameters, and compare outputs.
nvidia-nemotron-120b-a12b
nvidia/Nemotron-120B-A12B (Nemotron 3 Super) is a 120B total parameter model with 12B active parameters, using a hybrid Mamba-Transformer mixture-of-experts (MoE) architecture. It delivers over 5x throughput compared to the previous Nemotron Super and features a native 1M-token context window for long-term memory in multi-agent systems. The model excels at agentic reasoning, scoring 85.6% on PinchBench (best in its class), and is optimized for applications like software development and cybersecurity triaging.
| Metric | Value |
|---|---|
| Parameter Count | 120 billion |
| Mixture of Experts | Yes |
| Active Parameter Count | 12 billion |
| Context Length | 1,000,000 tokens |
| Multilingual | Yes |
| Quantized* | Yes |
| Precision* | NVFP4 |
Example request
- Minimal
- Basic parameters
- All parameters
Fetch model details
The models endpoint returns the full model object, including itsjson_request_schema.