> ## Documentation Index
> Fetch the complete documentation index at: https://docs.oxen.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Kling O3 Pro: Reference to Video

> Generate videos from text prompts with optional reference images, multi-shot support, and native audio

Transforms reference images into dynamic video sequences. Preserves identity, layout, and text from reference images while adding realistic motion, camera movements, and scene progression. Supports multi-shot generation with per-shot prompts and durations, and optional native audio (Chinese/English).

**Model name:** `kling-video-o3-pro-reference-to-video`

## Endpoint

```
POST /api/ai/videos/generate
```

Video generation is synchronous, the request blocks until the video is ready (typically 1-5 minutes). It is recommended to use [`/ai/queue`](/inference-api/reference/async_queue) instead for long-running jobs, so that you don't have long running http requests.

## Request Parameters

| Parameter          | Type             | Required   | Default                            | Description                                                                                                              |
| ------------------ | ---------------- | ---------- | ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------ |
| `model`            | string           | **yes**    | --                                 | `"kling-video-o3-pro-reference-to-video"`                                                                                |
| `prompt`           | string           | **one of** | --                                 | Single prompt for the video. Use this or `multi_prompt`, not both. Max 512 characters.                                   |
| `multi_prompt`     | array            | **one of** | --                                 | Multi-shot prompts. See [multi\_prompt](#multi_prompt) below.                                                            |
| `duration`         | integer          | no         | 5                                  | Duration in seconds when using `prompt`.                                                                                 |
| `input_image`      | array of URIs    | no         | --                                 | Reference images for style/appearance (max 4 combined with elements). Reference in prompts as `@Image1`, `@Image2`, etc. |
| `start_image_url`  | string (URI)     | no         | --                                 | First frame of the video. The model extends from this image.                                                             |
| `tail_image_url`   | string (URI)     | no         | --                                 | Last frame of the video. Requires `start_image_url`. The model fills in between the frames.                              |
| `elements`         | array of objects | no         | --                                 | Structured element references for characters/objects. See [elements](#elements) below.                                   |
| `negative_prompt`  | string           | no         | `"blur, distort, and low quality"` | Text describing what to avoid in the generated video.                                                                    |
| `aspect_ratio`     | string           | no         | `"16:9"`                           | `"9:16"`, `"1:1"`, or `"16:9"`.                                                                                          |
| `generate_audio`   | boolean          | no         | `false`                            | Generate native audio. Supports Chinese and English voice output.                                                        |
| `response_format`  | string           | no         | `"url"`                            | `"url"` returns a hosted URL. `"b64_json"` returns base64-encoded video bytes inline.                                    |
| `target_namespace` | string           | no         | current user                       | Namespace to save results and bill to. Can be an organization name.                                                      |

### prompt vs multi\_prompt

Use **either** `prompt` or `multi_prompt`, not both. Sending both returns:

```
"Cannot provide both 'prompt' and 'multi_prompt'."
```

Sending neither (or an empty `multi_prompt: []`) returns:

```
"Either 'prompt' or 'multi_prompt' must be provided."
```

When using `prompt`, the duration defaults to 5 seconds. Override with `duration`:

```json theme={null}
{"model": "kling-video-o3-pro-reference-to-video", "prompt": "A flower blooming in timelapse", "duration": 10}
```

### multi\_prompt

Array of shot objects. Each shot generates a segment of the video.

| Field      | Type    | Required | Default | Description                               |
| ---------- | ------- | -------- | ------- | ----------------------------------------- |
| `prompt`   | string  | **yes**  | --      | Prompt for this shot. Max 512 characters. |
| `duration` | integer | no       | 5       | Duration of this shot in seconds (1-15).  |

### Duration Constraints

| Constraint             | Value          |
| ---------------------- | -------------- |
| Minimum total duration | **3 seconds**  |
| Maximum total duration | **15 seconds** |
| Maximum per shot       | 15 seconds     |
| Default per shot       | 5 seconds      |

Individual shots can be as short as 1 second, as long as the total across all shots is between 3 and 15 seconds.

| Configuration                                              | Total | Result    |
| ---------------------------------------------------------- | ----- | --------- |
| Single shot, `duration: 1`                                 | 1s    | **Fails** |
| Single shot, `duration: 2`                                 | 2s    | **Fails** |
| Single shot, `duration: 3`                                 | 3s    | Works     |
| Two shots: `duration: 2` + `duration: 1`                   | 3s    | Works     |
| Two shots: `duration: 1` + `duration: 1`                   | 2s    | **Fails** |
| Single shot, `duration: 15`                                | 15s   | Works     |
| Three shots: `duration: 5` + `duration: 5` + `duration: 5` | 15s   | Works     |
| Three shots: `duration: 5` + `duration: 5` + `duration: 6` | 16s   | **Fails** |

When total duration is too short:

```
"duration value '2' is invalid. Try using duration='5' instead, as duration support may vary by model and mode."
```

When total duration exceeds 15 seconds:

```
"Total shot duration (16s) exceeds maximum allowed (15s)."
```

When a single shot exceeds 15 seconds:

```
"Input should be '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14' or '15'"
```

### elements

Array of element objects for character/object reference. Use `@Element1`, `@Element2`, etc. in prompts.

| Field                  | Type          | Required | Description                                      |
| ---------------------- | ------------- | -------- | ------------------------------------------------ |
| `frontal_image_url`    | string (URI)  | **yes**  | Front view of the reference object or character. |
| `reference_image_urls` | array of URIs | no       | Additional angles. Max 3 images per element.     |

Maximum 4 total images across all elements and `input_image` references.

## Examples

### Minimal: text only

`input_image` is optional. Without it the model generates purely from the prompt.

<CodeGroup>
  ```python Python theme={null}
  import requests

  response = requests.post(
      "https://hub.oxen.ai/api/ai/videos/generate",
      headers={
          "Authorization": "Bearer YOUR_API_KEY",
          "Content-Type": "application/json",
      },
      json={
          "model": "kling-video-o3-pro-reference-to-video",
          "prompt": "A puppy runs through a park",
      },
  )

  data = response.json()
  print("Video URL:", data["videos"][0]["url"])
  ```

  ```bash cURL theme={null}
  curl -X POST https://hub.oxen.ai/api/ai/videos/generate \
    -H "Authorization: Bearer $OXEN_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "kling-video-o3-pro-reference-to-video",
      "prompt": "A puppy runs through a park"
    }'
  ```
</CodeGroup>

### Single prompt with reference image

<CodeGroup>
  ```python Python theme={null}
  import requests

  response = requests.post(
      "https://hub.oxen.ai/api/ai/videos/generate",
      headers={
          "Authorization": "Bearer YOUR_API_KEY",
          "Content-Type": "application/json",
      },
      json={
          "model": "kling-video-o3-pro-reference-to-video",
          "prompt": "A dog runs across a sunny field",
          "input_image": ["https://example.com/dog.jpg"],
      },
  )

  data = response.json()
  print("Video URL:", data["videos"][0]["url"])
  ```

  ```bash cURL theme={null}
  curl -X POST https://hub.oxen.ai/api/ai/videos/generate \
    -H "Authorization: Bearer $OXEN_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "kling-video-o3-pro-reference-to-video",
      "prompt": "A dog runs across a sunny field",
      "input_image": ["https://example.com/dog.jpg"]
    }'
  ```
</CodeGroup>

### Multi-shot with reference image

<CodeGroup>
  ```python Python theme={null}
  import requests

  response = requests.post(
      "https://hub.oxen.ai/api/ai/videos/generate",
      headers={
          "Authorization": "Bearer YOUR_API_KEY",
          "Content-Type": "application/json",
      },
      json={
          "model": "kling-video-o3-pro-reference-to-video",
          "multi_prompt": [
              {"prompt": "A woman walks toward the camera smiling, cinematic lighting", "duration": 5},
              {"prompt": "She turns and looks out a window, soft focus background", "duration": 5},
          ],
          "input_image": ["https://example.com/reference-face.jpg"],
      },
  )

  data = response.json()
  print("Video URL:", data["videos"][0]["url"])
  ```

  ```bash cURL theme={null}
  curl -X POST https://hub.oxen.ai/api/ai/videos/generate \
    -H "Authorization: Bearer $OXEN_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "kling-video-o3-pro-reference-to-video",
      "multi_prompt": [
        {"prompt": "A woman walks toward the camera smiling, cinematic lighting", "duration": 5},
        {"prompt": "She turns and looks out a window, soft focus background", "duration": 5}
      ],
      "input_image": ["https://example.com/reference-face.jpg"]
    }'
  ```
</CodeGroup>

### With start/end frames and elements

<CodeGroup>
  ```python Python theme={null}
  import requests

  response = requests.post(
      "https://hub.oxen.ai/api/ai/videos/generate",
      headers={
          "Authorization": "Bearer YOUR_API_KEY",
          "Content-Type": "application/json",
      },
      json={
          "model": "kling-video-o3-pro-reference-to-video",
          "multi_prompt": [
              {"prompt": "@Element1 picks up a coffee cup from the table", "duration": 5},
          ],
          "start_image_url": "https://example.com/first-frame.jpg",
          "tail_image_url": "https://example.com/last-frame.jpg",
          "elements": [
              {
                  "frontal_image_url": "https://example.com/character-front.jpg",
                  "reference_image_urls": ["https://example.com/character-side.jpg"],
              }
          ],
          "aspect_ratio": "16:9",
          "generate_audio": True,
      },
  )

  data = response.json()
  print("Video URL:", data["videos"][0]["url"])
  ```

  ```bash cURL theme={null}
  curl -X POST https://hub.oxen.ai/api/ai/videos/generate \
    -H "Authorization: Bearer $OXEN_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "kling-video-o3-pro-reference-to-video",
      "multi_prompt": [
        {"prompt": "@Element1 picks up a coffee cup from the table", "duration": 5}
      ],
      "start_image_url": "https://example.com/first-frame.jpg",
      "tail_image_url": "https://example.com/last-frame.jpg",
      "elements": [
        {
          "frontal_image_url": "https://example.com/character-front.jpg",
          "reference_image_urls": ["https://example.com/character-side.jpg"]
        }
      ],
      "aspect_ratio": "16:9",
      "generate_audio": true
    }'
  ```
</CodeGroup>

### Response (`response_format: "url"`)

```json theme={null}
{
  "created": 1775090723,
  "model": "kling-video-o3-pro-reference-to-video",
  "videos": [
    {
      "url": "https://hub.oxen.ai/api/repos/.../files/.../video.mp4?..."
    }
  ]
}
```

The URL is a temporary link that expires after a period of time.

### Response (`response_format: "b64_json"`)

```json theme={null}
{
  "created": 1775090723,
  "model": "kling-video-o3-pro-reference-to-video",
  "videos": [
    {
      "b64_json": "<base64-encoded mp4 bytes>"
    }
  ]
}
```

## Using with /ai/queue

Recommended for video generation. Returns immediately, processes in the background.

### Enqueue

<CodeGroup>
  ```python Python theme={null}
  import requests

  response = requests.post(
      "https://hub.oxen.ai/api/ai/queue",
      headers={
          "Authorization": "Bearer YOUR_API_KEY",
          "Content-Type": "application/json",
      },
      json={
          "model": "kling-video-o3-pro-reference-to-video",
          "multi_prompt": [{"prompt": "A person speaking into a microphone", "duration": 5}],
          "generate_audio": True,
          "num_generations": 2,
      },
  )

  generations = response.json()["generations"]
  for g in generations:
      print(f"ID: {g['generation_id']}, Status: {g['status']}")
  ```

  ```bash cURL theme={null}
  curl -X POST https://hub.oxen.ai/api/ai/queue \
    -H "Authorization: Bearer $OXEN_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "kling-video-o3-pro-reference-to-video",
      "multi_prompt": [{"prompt": "A person speaking into a microphone", "duration": 5}],
      "generate_audio": true,
      "num_generations": 2
    }'
  ```
</CodeGroup>

### Poll

<CodeGroup>
  ```python Python theme={null}
  import requests
  import time

  generation_id = "4ef840a4-..."
  while True:
      data = requests.get(
          f"https://hub.oxen.ai/api/ai/queue/{generation_id}",
          headers={"Authorization": "Bearer YOUR_API_KEY"},
      ).json()
      if data["status"] in {"succeeded", "failed", "cancelled"}:
          break
      time.sleep(10)

  if data["status"] == "succeeded":
      print(f"Result: {data['result_url']}")
  else:
      print(f"Generation {data['status']}: {data.get('error_message')}")
  ```

  ```bash cURL theme={null}
  curl -H "Authorization: Bearer $OXEN_API_KEY" \
    "https://hub.oxen.ai/api/ai/queue/4ef840a4-..."
  ```
</CodeGroup>

A generation is done when its `status` is `succeeded`, `failed`, or `cancelled`. On success, `result_url` points to the output file.

### Cancel

<CodeGroup>
  ```python Python theme={null}
  import requests

  generation_id = "4ef840a4-..."
  response = requests.delete(
      f"https://hub.oxen.ai/api/ai/queue/{generation_id}",
      headers={"Authorization": "Bearer YOUR_API_KEY"},
  )

  print(response.json())
  ```

  ```bash cURL theme={null}
  curl -X DELETE -H "Authorization: Bearer $OXEN_API_KEY" \
    "https://hub.oxen.ai/api/ai/queue/4ef840a4-..."
  ```
</CodeGroup>

## Errors

| Error                                                                                                  | Cause                                | Fix                                  |
| ------------------------------------------------------------------------------------------------------ | ------------------------------------ | ------------------------------------ |
| `Getting model response error: 422 - Value error, Cannot provide both 'prompt' and 'multi_prompt'.`    | Sent both fields                     | Use one or the other                 |
| `Getting model response error: 422 - Value error, Either 'prompt' or 'multi_prompt' must be provided.` | Neither sent, or empty array         | Provide at least one                 |
| `Field required`                                                                                       | `multi_prompt` item missing `prompt` | Every shot needs a `prompt` string   |
| `duration value '2' is invalid`                                                                        | Total duration \< 3 seconds          | Ensure total across shots >= 3       |
| `Total shot duration (16s) exceeds maximum allowed (15s)`                                              | Total duration > 15 seconds          | Keep total at 15 seconds or less     |
| `Input should be '1', '2', ... or '15'`                                                                | Single shot > 15                     | Keep each shot at 15 seconds or less |
| `num_generations must be an integer between 1 and 4`                                                   | Invalid count (via `/ai/queue`)      | Use 1-4                              |

## Other Kling Models

| Model                                    | Input                | Use Case                          | Cost/sec |
| ---------------------------------------- | -------------------- | --------------------------------- | -------- |
| `kling-video-v2-6-pro-text-to-video`     | Text only            | Simple text-to-video              | \$0.070  |
| `kling-video-v2-6-pro-image-to-video`    | Image                | Animate a single image            | \$0.070  |
| `kling-video-o3-pro-image-to-video`      | Image + text         | Higher quality image animation    | \$0.224  |
| `kling-video-o3-pro-reference-to-video`  | Images + text        | Reference-conditioned, multi-shot | \$0.224  |
| `kling-video-o3-pro-video-to-video-edit` | Video                | Edit existing video               | \$0.336  |
| `kling-video-v3-pro-motion-control`      | Text + image + video | Camera/motion control             | \$0.168  |

The O3 Pro models produce higher quality output than v2.x but cost roughly 3x more per second.