> ## Documentation Index
> Fetch the complete documentation index at: https://docs.oxen.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Chat Completions

> Generate text responses from language models with support for streaming, vision, and tool calling

## Endpoint

```
POST /api/ai/chat/completions
```

Compatible with the OpenAI chat completions format. Supports streaming, multimodal input (images and video), tool calling, and structured output.

## Request Parameters

| Parameter             | Type          | Required | Default | Description                                                                                  |
| --------------------- | ------------- | -------- | ------- | -------------------------------------------------------------------------------------------- |
| `model`               | string        | **yes**  | --      | Model name (e.g. `claude-sonnet-4-6`, `gpt-5-4-2026-03-05`, `gemini-3-1-flash-lite-preview`) |
| `messages`            | array         | **yes**  | --      | Array of message objects. Must not be empty.                                                 |
| `stream`              | boolean       | no       | `false` | Stream the response as server-sent events.                                                   |
| `max_tokens`          | integer       | no       | varies  | Maximum tokens in the response.                                                              |
| `temperature`         | number        | no       | varies  | Sampling temperature (0-2).                                                                  |
| `top_p`               | number        | no       | --      | Nucleus sampling parameter.                                                                  |
| `frequency_penalty`   | number        | no       | --      | Penalize repeated tokens.                                                                    |
| `presence_penalty`    | number        | no       | --      | Penalize tokens already present.                                                             |
| `tools`               | array         | no       | --      | Tool/function definitions for tool calling.                                                  |
| `tool_choice`         | string/object | no       | --      | Control tool selection behavior.                                                             |
| `parallel_tool_calls` | boolean       | no       | --      | Allow parallel tool calls.                                                                   |
| `response_format`     | object        | no       | --      | Constrain response format (e.g. `{"type": "json_object"}`). Support varies by provider.      |

## Message Format

Each message has a `role` and `content`:

```json theme={null}
[
  {"role": "system", "content": "You are a helpful assistant."},
  {"role": "user", "content": "Hello!"},
  {"role": "assistant", "content": "Hi there!"}
]
```

### Vision (multimodal)

Use a content array to include images or video:

```json theme={null}
{
  "role": "user",
  "content": [
    {"type": "text", "text": "What's in this image?"},
    {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
  ]
}
```

Video input:

```json theme={null}
{
  "role": "user",
  "content": [
    {"type": "text", "text": "Describe this video"},
    {"type": "video_url", "video_url": {"url": "https://example.com/clip.mp4"}}
  ]
}
```

Image and video URLs must be publicly accessible.

## Examples

### Basic text generation

<CodeGroup>
  ```python Python theme={null}
  from openai import OpenAI

  client = OpenAI(
      base_url="https://hub.oxen.ai/api/ai",
      api_key="YOUR_API_KEY",
  )

  response = client.chat.completions.create(
      model="claude-sonnet-4-6",
      messages=[{"role": "user", "content": "Say hello in exactly 3 words."}],
      max_tokens=50,
      temperature=0.1,
  )

  print(response.choices[0].message.content)
  ```

  ```bash cURL theme={null}
  curl -X POST https://hub.oxen.ai/api/ai/chat/completions \
    -H "Authorization: Bearer $OXEN_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "claude-sonnet-4-6",
      "messages": [{"role": "user", "content": "Say hello in exactly 3 words."}],
      "max_tokens": 50,
      "temperature": 0.1
    }'
  ```
</CodeGroup>

### Response

```json theme={null}
{
  "id": "chatcmpl-97eab7db-fe67-4b29-900c-ed5260c654d4",
  "object": "chat.completion",
  "created": 1775090332,
  "model": "claude-sonnet-4-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello, how are you?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 15,
    "completion_tokens": 5,
    "total_tokens": 20
  }
}
```

### Streaming

<CodeGroup>
  ```python Python theme={null}
  from openai import OpenAI

  client = OpenAI(
      base_url="https://hub.oxen.ai/api/ai",
      api_key="YOUR_API_KEY",
  )

  stream = client.chat.completions.create(
      model="gemini-3-1-flash-lite-preview",
      messages=[{"role": "user", "content": "Say hello"}],
      stream=True,
  )

  for chunk in stream:
      content = chunk.choices[0].delta.content
      if content:
          print(content, end="", flush=True)
  print()
  ```

  ```bash cURL theme={null}
  curl -X POST https://hub.oxen.ai/api/ai/chat/completions \
    -H "Authorization: Bearer $OXEN_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "gemini-3-1-flash-lite-preview",
      "messages": [{"role": "user", "content": "Say hello"}],
      "stream": true
    }'
  ```
</CodeGroup>

Returns server-sent events. Each chunk has a `delta` instead of a full `message`:

```
data: {"choices":[{"delta":{"content":"Hello"},"finish_reason":null,"index":0}],"created":1775090334,"id":"chatcmpl-...","model":"gemini-3-1-flash-lite-preview","object":"chat.completion.chunk"}

data: {"choices":[{"delta":{"content":" there"},"finish_reason":null,"index":0}],...}

data: [DONE]
```

### Tool calling

<CodeGroup>
  ```python Python theme={null}
  from openai import OpenAI

  client = OpenAI(
      base_url="https://hub.oxen.ai/api/ai",
      api_key="YOUR_API_KEY",
  )

  response = client.chat.completions.create(
      model="gpt-5-4-2026-03-05",
      messages=[
          {"role": "system", "content": "Use tools when appropriate."},
          {"role": "user", "content": "What is the weather in San Francisco?"},
      ],
      tools=[{
          "type": "function",
          "function": {
              "name": "get_weather",
              "description": "Get current weather",
              "parameters": {
                  "type": "object",
                  "properties": {"location": {"type": "string"}},
                  "required": ["location"],
              },
          },
      }],
  )

  tool_call = response.choices[0].message.tool_calls[0]
  print(f"{tool_call.function.name}({tool_call.function.arguments})")
  ```

  ```bash cURL theme={null}
  curl -X POST https://hub.oxen.ai/api/ai/chat/completions \
    -H "Authorization: Bearer $OXEN_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "gpt-5-4-2026-03-05",
      "messages": [
        {"role": "system", "content": "Use tools when appropriate."},
        {"role": "user", "content": "What is the weather in San Francisco?"}
      ],
      "tools": [{
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get current weather",
          "parameters": {
            "type": "object",
            "properties": {"location": {"type": "string"}},
            "required": ["location"]
          }
        }
      }]
    }'
  ```
</CodeGroup>

When the model uses a tool, `finish_reason` is `"tool_calls"`:

```json theme={null}
{
  "choices": [{
    "finish_reason": "tool_calls",
    "message": {
      "content": null,
      "role": "assistant",
      "tool_calls": [{
        "id": "call_GRNwPXnbuQW4Sa3QNB3FYkYw",
        "index": 0,
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"location\":\"San Francisco\"}"
        }
      }]
    }
  }]
}
```

### Structured output (JSON mode)

<CodeGroup>
  ```python Python theme={null}
  from openai import OpenAI

  client = OpenAI(
      base_url="https://hub.oxen.ai/api/ai",
      api_key="YOUR_API_KEY",
  )

  response = client.chat.completions.create(
      model="gpt-5-4-2026-03-05",
      messages=[{"role": "user", "content": "List 3 colors as a JSON array"}],
      response_format={"type": "json_object"},
      max_tokens=100,
  )

  print(response.choices[0].message.content)
  ```

  ```bash cURL theme={null}
  curl -X POST https://hub.oxen.ai/api/ai/chat/completions \
    -H "Authorization: Bearer $OXEN_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "gpt-5-4-2026-03-05",
      "messages": [{"role": "user", "content": "List 3 colors as a JSON array"}],
      "response_format": {"type": "json_object"},
      "max_tokens": 100
    }'
  ```
</CodeGroup>

## Errors

| Condition            | Error                                |
| -------------------- | ------------------------------------ |
| No model specified   | `"You must specify a model to call"` |
| Model not found      | `"Model not found: <name>"`          |
| Empty messages       | `"Messages array cannot be empty"`   |
| Insufficient credits | Credit-related error message         |
