Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.oxen.ai/llms.txt

Use this file to discover all available pages before exploring further.

Transforms reference images into dynamic video sequences. Preserves identity, layout, and text from reference images while adding realistic motion, camera movements, and scene progression. Supports multi-shot generation with per-shot prompts and durations, and optional native audio (Chinese/English). Model name: kling-video-o3-pro-reference-to-video

Endpoint

POST /api/ai/videos/generate
Video generation is synchronous, the request blocks until the video is ready (typically 1-5 minutes). It is recommended to use /ai/queue instead for long-running jobs, so that you don’t have long running http requests.

Request Parameters

ParameterTypeRequiredDefaultDescription
modelstringyes"kling-video-o3-pro-reference-to-video"
promptstringone ofSingle prompt for the video. Use this or multi_prompt, not both. Max 512 characters.
multi_promptarrayone ofMulti-shot prompts. See multi_prompt below.
durationintegerno5Duration in seconds when using prompt.
input_imagearray of URIsnoReference images for style/appearance (max 4 combined with elements). Reference in prompts as @Image1, @Image2, etc.
start_image_urlstring (URI)noFirst frame of the video. The model extends from this image.
tail_image_urlstring (URI)noLast frame of the video. Requires start_image_url. The model fills in between the frames.
elementsarray of objectsnoStructured element references for characters/objects. See elements below.
negative_promptstringno"blur, distort, and low quality"Text describing what to avoid in the generated video.
aspect_ratiostringno"16:9""9:16", "1:1", or "16:9".
generate_audiobooleannofalseGenerate native audio. Supports Chinese and English voice output.
response_formatstringno"url""url" returns a hosted URL. "b64_json" returns base64-encoded video bytes inline.
target_namespacestringnocurrent userNamespace to save results and bill to. Can be an organization name.

prompt vs multi_prompt

Use either prompt or multi_prompt, not both. Sending both returns:
"Cannot provide both 'prompt' and 'multi_prompt'."
Sending neither (or an empty multi_prompt: []) returns:
"Either 'prompt' or 'multi_prompt' must be provided."
When using prompt, the duration defaults to 5 seconds. Override with duration:
{"model": "kling-video-o3-pro-reference-to-video", "prompt": "A flower blooming in timelapse", "duration": 10}

multi_prompt

Array of shot objects. Each shot generates a segment of the video.
FieldTypeRequiredDefaultDescription
promptstringyesPrompt for this shot. Max 512 characters.
durationintegerno5Duration of this shot in seconds (1-15).

Duration Constraints

ConstraintValue
Minimum total duration3 seconds
Maximum total duration15 seconds
Maximum per shot15 seconds
Default per shot5 seconds
Individual shots can be as short as 1 second, as long as the total across all shots is between 3 and 15 seconds.
ConfigurationTotalResult
Single shot, duration: 11sFails
Single shot, duration: 22sFails
Single shot, duration: 33sWorks
Two shots: duration: 2 + duration: 13sWorks
Two shots: duration: 1 + duration: 12sFails
Single shot, duration: 1515sWorks
Three shots: duration: 5 + duration: 5 + duration: 515sWorks
Three shots: duration: 5 + duration: 5 + duration: 616sFails
When total duration is too short:
"duration value '2' is invalid. Try using duration='5' instead, as duration support may vary by model and mode."
When total duration exceeds 15 seconds:
"Total shot duration (16s) exceeds maximum allowed (15s)."
When a single shot exceeds 15 seconds:
"Input should be '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14' or '15'"

elements

Array of element objects for character/object reference. Use @Element1, @Element2, etc. in prompts.
FieldTypeRequiredDescription
frontal_image_urlstring (URI)yesFront view of the reference object or character.
reference_image_urlsarray of URIsnoAdditional angles. Max 3 images per element.
Maximum 4 total images across all elements and input_image references.

Examples

Minimal: text only

input_image is optional. Without it the model generates purely from the prompt.
import requests

response = requests.post(
    "https://hub.oxen.ai/api/ai/videos/generate",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "kling-video-o3-pro-reference-to-video",
        "prompt": "A puppy runs through a park",
    },
)

data = response.json()
print("Video URL:", data["videos"][0]["url"])

Single prompt with reference image

import requests

response = requests.post(
    "https://hub.oxen.ai/api/ai/videos/generate",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "kling-video-o3-pro-reference-to-video",
        "prompt": "A dog runs across a sunny field",
        "input_image": ["https://example.com/dog.jpg"],
    },
)

data = response.json()
print("Video URL:", data["videos"][0]["url"])

Multi-shot with reference image

import requests

response = requests.post(
    "https://hub.oxen.ai/api/ai/videos/generate",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "kling-video-o3-pro-reference-to-video",
        "multi_prompt": [
            {"prompt": "A woman walks toward the camera smiling, cinematic lighting", "duration": 5},
            {"prompt": "She turns and looks out a window, soft focus background", "duration": 5},
        ],
        "input_image": ["https://example.com/reference-face.jpg"],
    },
)

data = response.json()
print("Video URL:", data["videos"][0]["url"])

With start/end frames and elements

import requests

response = requests.post(
    "https://hub.oxen.ai/api/ai/videos/generate",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "kling-video-o3-pro-reference-to-video",
        "multi_prompt": [
            {"prompt": "@Element1 picks up a coffee cup from the table", "duration": 5},
        ],
        "start_image_url": "https://example.com/first-frame.jpg",
        "tail_image_url": "https://example.com/last-frame.jpg",
        "elements": [
            {
                "frontal_image_url": "https://example.com/character-front.jpg",
                "reference_image_urls": ["https://example.com/character-side.jpg"],
            }
        ],
        "aspect_ratio": "16:9",
        "generate_audio": True,
    },
)

data = response.json()
print("Video URL:", data["videos"][0]["url"])

Response (response_format: "url")

{
  "created": 1775090723,
  "model": "kling-video-o3-pro-reference-to-video",
  "videos": [
    {
      "url": "https://hub.oxen.ai/api/repos/.../files/.../video.mp4?..."
    }
  ]
}
The URL is a temporary link that expires after a period of time.

Response (response_format: "b64_json")

{
  "created": 1775090723,
  "model": "kling-video-o3-pro-reference-to-video",
  "videos": [
    {
      "b64_json": "<base64-encoded mp4 bytes>"
    }
  ]
}

Using with /ai/queue

Recommended for video generation. Returns immediately, processes in the background.

Enqueue

import requests

response = requests.post(
    "https://hub.oxen.ai/api/ai/queue",
    headers={
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "model": "kling-video-o3-pro-reference-to-video",
        "multi_prompt": [{"prompt": "A person speaking into a microphone", "duration": 5}],
        "generate_audio": True,
        "num_generations": 2,
    },
)

generations = response.json()["generations"]
for g in generations:
    print(f"ID: {g['generation_id']}, Status: {g['status']}")

Poll

import requests
import time

generation_id = "4ef840a4-..."
while True:
    data = requests.get(
        f"https://hub.oxen.ai/api/ai/queue/{generation_id}",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
    ).json()
    if data["status"] in {"succeeded", "failed", "cancelled"}:
        break
    time.sleep(10)

if data["status"] == "succeeded":
    print(f"Result: {data['result_url']}")
else:
    print(f"Generation {data['status']}: {data.get('error_message')}")
A generation is done when its status is succeeded, failed, or cancelled. On success, result_url points to the output file.

Cancel

import requests

generation_id = "4ef840a4-..."
response = requests.delete(
    f"https://hub.oxen.ai/api/ai/queue/{generation_id}",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
)

print(response.json())

Errors

ErrorCauseFix
Getting model response error: 422 - Value error, Cannot provide both 'prompt' and 'multi_prompt'.Sent both fieldsUse one or the other
Getting model response error: 422 - Value error, Either 'prompt' or 'multi_prompt' must be provided.Neither sent, or empty arrayProvide at least one
Field requiredmulti_prompt item missing promptEvery shot needs a prompt string
duration value '2' is invalidTotal duration < 3 secondsEnsure total across shots >= 3
Total shot duration (16s) exceeds maximum allowed (15s)Total duration > 15 secondsKeep total at 15 seconds or less
Input should be '1', '2', ... or '15'Single shot > 15Keep each shot at 15 seconds or less
num_generations must be an integer between 1 and 4Invalid count (via /ai/queue)Use 1-4

Other Kling Models

ModelInputUse CaseCost/sec
kling-video-v2-6-pro-text-to-videoText onlySimple text-to-video$0.070
kling-video-v2-6-pro-image-to-videoImageAnimate a single image$0.070
kling-video-o3-pro-image-to-videoImage + textHigher quality image animation$0.224
kling-video-o3-pro-reference-to-videoImages + textReference-conditioned, multi-shot$0.224
kling-video-o3-pro-video-to-video-editVideoEdit existing video$0.336
kling-video-v3-pro-motion-controlText + image + videoCamera/motion control$0.168
The O3 Pro models produce higher quality output than v2.x but cost roughly 3x more per second.