> ## Documentation Index
> Fetch the complete documentation index at: https://docs.oxen.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Nemotron 3 Super

> Agentic reasoning MoE, 1M context

<CardGroup cols={1}>
  <Card title="Try Nemotron 3 Super in the Workbench" icon="flask" href="https://www.oxen.ai/ai/workbench?model=nvidia-nemotron-120b-a12b">
    Run this model interactively, tune parameters, and compare outputs.
  </Card>
</CardGroup>

**Model ID:** `nvidia-nemotron-120b-a12b`

nvidia/Nemotron-120B-A12B (Nemotron 3 Super) is a 120B total parameter model with 12B active parameters, using a hybrid Mamba-Transformer mixture-of-experts (MoE) architecture. It delivers over 5x throughput compared to the previous Nemotron Super and features a native 1M-token context window for long-term memory in multi-agent systems. The model excels at agentic reasoning, scoring 85.6% on PinchBench (best in its class), and is optimized for applications like software development and cybersecurity triaging.

| Metric                 | Value            |
| ---------------------- | ---------------- |
| Parameter Count        | 120 billion      |
| Mixture of Experts     | Yes              |
| Active Parameter Count | 12 billion       |
| Context Length         | 1,000,000 tokens |
| Multilingual           | Yes              |
| Quantized\*            | Yes              |
| Precision\*            | NVFP4            |

\**Quantization is specific to the inference provider and the model may be offered with different quantization levels by other providers.*

## Example request

<Tip>
  Use the [Workbench](https://www.oxen.ai/ai/workbench?model=nvidia-nemotron-120b-a12b) as a request builder: configure parameters for this model in the UI, then open the **API** tab to copy the exact cURL or Python call.
</Tip>

<Tabs>
  <Tab title="Minimal">
    <CodeGroup>
      ```bash cURL theme={null}
      curl -X POST https://hub.oxen.ai/api/ai/chat/completions \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer $OXEN_API_KEY" \
        -d '{
        "model": "nvidia-nemotron-120b-a12b",
        "messages": [
          {
            "role": "user",
            "content": "Hello, what can you do?"
          }
        ]
      }'
      ```

      ```python Python theme={null}
      import os
      import requests

      response = requests.post(
          "https://hub.oxen.ai/api/ai/chat/completions",
          headers={
              "Content-Type": "application/json",
              "Authorization": f"Bearer {os.environ['OXEN_API_KEY']}",
          },
          json={
              "model": "nvidia-nemotron-120b-a12b",
              "messages": [
                  {
                      "role": "user",
                      "content": "Hello, what can you do?"
                  }
              ]
          },
      )
      response.raise_for_status()
      print(response.json())
      ```
    </CodeGroup>
  </Tab>

  <Tab title="Basic parameters">
    <CodeGroup>
      ```bash cURL theme={null}
      curl -X POST https://hub.oxen.ai/api/ai/chat/completions \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer $OXEN_API_KEY" \
        -d '{
        "model": "nvidia-nemotron-120b-a12b",
        "messages": [
          {
            "role": "user",
            "content": "Hello, what can you do?"
          }
        ],
        "temperature": 0.7,
        "max_tokens": 1024,
        "stream": false
      }'
      ```

      ```python Python theme={null}
      import os
      import requests

      response = requests.post(
          "https://hub.oxen.ai/api/ai/chat/completions",
          headers={
              "Content-Type": "application/json",
              "Authorization": f"Bearer {os.environ['OXEN_API_KEY']}",
          },
          json={
              "model": "nvidia-nemotron-120b-a12b",
              "messages": [
                  {
                      "role": "user",
                      "content": "Hello, what can you do?"
                  }
              ],
              "temperature": 0.7,
              "max_tokens": 1024,
              "stream": false
          },
      )
      response.raise_for_status()
      print(response.json())
      ```
    </CodeGroup>
  </Tab>

  <Tab title="All parameters">
    <CodeGroup>
      ```bash cURL theme={null}
      curl -X POST https://hub.oxen.ai/api/ai/chat/completions \
        -H "Content-Type: application/json" \
        -H "Authorization: Bearer $OXEN_API_KEY" \
        -d '{
        "model": "nvidia-nemotron-120b-a12b",
        "messages": [
          {
            "role": "user",
            "content": "Hello, what can you do?"
          }
        ],
        "temperature": 0.7,
        "max_tokens": 1024,
        "stream": false,
        "top_p": 1.0
      }'
      ```

      ```python Python theme={null}
      import os
      import requests

      response = requests.post(
          "https://hub.oxen.ai/api/ai/chat/completions",
          headers={
              "Content-Type": "application/json",
              "Authorization": f"Bearer {os.environ['OXEN_API_KEY']}",
          },
          json={
              "model": "nvidia-nemotron-120b-a12b",
              "messages": [
                  {
                      "role": "user",
                      "content": "Hello, what can you do?"
                  }
              ],
              "temperature": 0.7,
              "max_tokens": 1024,
              "stream": false,
              "top_p": 1.0
          },
      )
      response.raise_for_status()
      print(response.json())
      ```
    </CodeGroup>
  </Tab>
</Tabs>

## Fetch model details

The [models endpoint](/inference-api/reference/models/overview) returns the full model object, including its `json_request_schema`.

```bash theme={null}
curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/nvidia-nemotron-120b-a12b
```

## Request parameters

This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list.
