Documentation Index
Fetch the complete documentation index at: https://docs.oxen.ai/llms.txt
Use this file to discover all available pages before exploring further.
Transforms reference images into dynamic video sequences. Preserves identity, layout, and text from reference images while adding realistic motion, camera movements, and scene progression. Supports multi-shot generation with per-shot prompts and durations, and optional native audio (Chinese/English).
Model name: kling-video-o3-pro-reference-to-video
Endpoint
POST /api/ai/videos/generate
Video generation is synchronous, the request blocks until the video is ready (typically 1-5 minutes). It is recommended to use /ai/queue instead for long-running jobs, so that you don’t have long running http requests.
Request Parameters
| Parameter | Type | Required | Default | Description |
|---|
model | string | yes | — | "kling-video-o3-pro-reference-to-video" |
prompt | string | one of | — | Single prompt for the video. Use this or multi_prompt, not both. Max 512 characters. |
multi_prompt | array | one of | — | Multi-shot prompts. See multi_prompt below. |
duration | integer | no | 5 | Duration in seconds when using prompt. |
input_image | array of URIs | no | — | Reference images for style/appearance (max 4 combined with elements). Reference in prompts as @Image1, @Image2, etc. |
start_image_url | string (URI) | no | — | First frame of the video. The model extends from this image. |
tail_image_url | string (URI) | no | — | Last frame of the video. Requires start_image_url. The model fills in between the frames. |
elements | array of objects | no | — | Structured element references for characters/objects. See elements below. |
negative_prompt | string | no | "blur, distort, and low quality" | Text describing what to avoid in the generated video. |
aspect_ratio | string | no | "16:9" | "9:16", "1:1", or "16:9". |
generate_audio | boolean | no | false | Generate native audio. Supports Chinese and English voice output. |
response_format | string | no | "url" | "url" returns a hosted URL. "b64_json" returns base64-encoded video bytes inline. |
target_namespace | string | no | current user | Namespace to save results and bill to. Can be an organization name. |
prompt vs multi_prompt
Use either prompt or multi_prompt, not both. Sending both returns:
"Cannot provide both 'prompt' and 'multi_prompt'."
Sending neither (or an empty multi_prompt: []) returns:
"Either 'prompt' or 'multi_prompt' must be provided."
When using prompt, the duration defaults to 5 seconds. Override with duration:
{"model": "kling-video-o3-pro-reference-to-video", "prompt": "A flower blooming in timelapse", "duration": 10}
multi_prompt
Array of shot objects. Each shot generates a segment of the video.
| Field | Type | Required | Default | Description |
|---|
prompt | string | yes | — | Prompt for this shot. Max 512 characters. |
duration | integer | no | 5 | Duration of this shot in seconds (1-15). |
Duration Constraints
| Constraint | Value |
|---|
| Minimum total duration | 3 seconds |
| Maximum total duration | 15 seconds |
| Maximum per shot | 15 seconds |
| Default per shot | 5 seconds |
Individual shots can be as short as 1 second, as long as the total across all shots is between 3 and 15 seconds.
| Configuration | Total | Result |
|---|
Single shot, duration: 1 | 1s | Fails |
Single shot, duration: 2 | 2s | Fails |
Single shot, duration: 3 | 3s | Works |
Two shots: duration: 2 + duration: 1 | 3s | Works |
Two shots: duration: 1 + duration: 1 | 2s | Fails |
Single shot, duration: 15 | 15s | Works |
Three shots: duration: 5 + duration: 5 + duration: 5 | 15s | Works |
Three shots: duration: 5 + duration: 5 + duration: 6 | 16s | Fails |
When total duration is too short:
"duration value '2' is invalid. Try using duration='5' instead, as duration support may vary by model and mode."
When total duration exceeds 15 seconds:
"Total shot duration (16s) exceeds maximum allowed (15s)."
When a single shot exceeds 15 seconds:
"Input should be '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14' or '15'"
elements
Array of element objects for character/object reference. Use @Element1, @Element2, etc. in prompts.
| Field | Type | Required | Description |
|---|
frontal_image_url | string (URI) | yes | Front view of the reference object or character. |
reference_image_urls | array of URIs | no | Additional angles. Max 3 images per element. |
Maximum 4 total images across all elements and input_image references.
Examples
Minimal: text only
input_image is optional. Without it the model generates purely from the prompt.
import requests
response = requests.post(
"https://hub.oxen.ai/api/ai/videos/generate",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "kling-video-o3-pro-reference-to-video",
"prompt": "A puppy runs through a park",
},
)
data = response.json()
print("Video URL:", data["videos"][0]["url"])
Single prompt with reference image
import requests
response = requests.post(
"https://hub.oxen.ai/api/ai/videos/generate",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "kling-video-o3-pro-reference-to-video",
"prompt": "A dog runs across a sunny field",
"input_image": ["https://example.com/dog.jpg"],
},
)
data = response.json()
print("Video URL:", data["videos"][0]["url"])
Multi-shot with reference image
import requests
response = requests.post(
"https://hub.oxen.ai/api/ai/videos/generate",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "kling-video-o3-pro-reference-to-video",
"multi_prompt": [
{"prompt": "A woman walks toward the camera smiling, cinematic lighting", "duration": 5},
{"prompt": "She turns and looks out a window, soft focus background", "duration": 5},
],
"input_image": ["https://example.com/reference-face.jpg"],
},
)
data = response.json()
print("Video URL:", data["videos"][0]["url"])
With start/end frames and elements
import requests
response = requests.post(
"https://hub.oxen.ai/api/ai/videos/generate",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "kling-video-o3-pro-reference-to-video",
"multi_prompt": [
{"prompt": "@Element1 picks up a coffee cup from the table", "duration": 5},
],
"start_image_url": "https://example.com/first-frame.jpg",
"tail_image_url": "https://example.com/last-frame.jpg",
"elements": [
{
"frontal_image_url": "https://example.com/character-front.jpg",
"reference_image_urls": ["https://example.com/character-side.jpg"],
}
],
"aspect_ratio": "16:9",
"generate_audio": True,
},
)
data = response.json()
print("Video URL:", data["videos"][0]["url"])
{
"created": 1775090723,
"model": "kling-video-o3-pro-reference-to-video",
"videos": [
{
"url": "https://hub.oxen.ai/api/repos/.../files/.../video.mp4?..."
}
]
}
The URL is a temporary link that expires after a period of time.
{
"created": 1775090723,
"model": "kling-video-o3-pro-reference-to-video",
"videos": [
{
"b64_json": "<base64-encoded mp4 bytes>"
}
]
}
Using with /ai/queue
Recommended for video generation. Returns immediately, processes in the background.
Enqueue
import requests
response = requests.post(
"https://hub.oxen.ai/api/ai/queue",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "kling-video-o3-pro-reference-to-video",
"multi_prompt": [{"prompt": "A person speaking into a microphone", "duration": 5}],
"generate_audio": True,
"num_generations": 2,
},
)
generations = response.json()["generations"]
for g in generations:
print(f"ID: {g['generation_id']}, Status: {g['status']}")
Poll
import requests
import time
generation_id = "4ef840a4-..."
while True:
data = requests.get(
f"https://hub.oxen.ai/api/ai/queue/{generation_id}",
headers={"Authorization": "Bearer YOUR_API_KEY"},
).json()
if data["status"] in {"succeeded", "failed", "cancelled"}:
break
time.sleep(10)
if data["status"] == "succeeded":
print(f"Result: {data['result_url']}")
else:
print(f"Generation {data['status']}: {data.get('error_message')}")
A generation is done when its status is succeeded, failed, or cancelled. On success, result_url points to the output file.
Cancel
import requests
generation_id = "4ef840a4-..."
response = requests.delete(
f"https://hub.oxen.ai/api/ai/queue/{generation_id}",
headers={"Authorization": "Bearer YOUR_API_KEY"},
)
print(response.json())
Errors
| Error | Cause | Fix |
|---|
Getting model response error: 422 - Value error, Cannot provide both 'prompt' and 'multi_prompt'. | Sent both fields | Use one or the other |
Getting model response error: 422 - Value error, Either 'prompt' or 'multi_prompt' must be provided. | Neither sent, or empty array | Provide at least one |
Field required | multi_prompt item missing prompt | Every shot needs a prompt string |
duration value '2' is invalid | Total duration < 3 seconds | Ensure total across shots >= 3 |
Total shot duration (16s) exceeds maximum allowed (15s) | Total duration > 15 seconds | Keep total at 15 seconds or less |
Input should be '1', '2', ... or '15' | Single shot > 15 | Keep each shot at 15 seconds or less |
num_generations must be an integer between 1 and 4 | Invalid count (via /ai/queue) | Use 1-4 |
Other Kling Models
| Model | Input | Use Case | Cost/sec |
|---|
kling-video-v2-6-pro-text-to-video | Text only | Simple text-to-video | $0.070 |
kling-video-v2-6-pro-image-to-video | Image | Animate a single image | $0.070 |
kling-video-o3-pro-image-to-video | Image + text | Higher quality image animation | $0.224 |
kling-video-o3-pro-reference-to-video | Images + text | Reference-conditioned, multi-shot | $0.224 |
kling-video-o3-pro-video-to-video-edit | Video | Edit existing video | $0.336 |
kling-video-v3-pro-motion-control | Text + image + video | Camera/motion control | $0.168 |
The O3 Pro models produce higher quality output than v2.x but cost roughly 3x more per second.