Use this file to discover all available pages before exploring further.
Try WAN 2.7 - Reference to Video in the Workbench
Run this model interactively, tune parameters, and compare outputs.
Model ID:wan-v2-7-reference-to-videoWAN 2.7 reference-to-video generates video from one or more reference images and reference videos with optional first-frame joint control, supporting single-character performances, multi-character interactions, and multi-shot narration. Up to 5 reference images and reference videos combined; reference identifiers in the prompt (“Image 1”, “Image 2”, “Video 1”, …) match the order of assets within each type. Output up to 1080P at 2-15s (2-10s when a reference video is included).
Use the Workbench as a request builder: configure parameters for this model in the UI, then open the API tab to copy the exact cURL or Python call.
Sync
Async
Async with SSE
This blocks until the video is ready (typically 5-15 minutes). Prefer Async or Async with SSE for anything beyond quick experimentation.See the video generation reference for more details.
Minimal
Basic parameters
All parameters
curl -X POST https://hub.oxen.ai/api/ai/videos/generate \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OXEN_API_KEY" \ -d '{ "model": "wan-v2-7-reference-to-video", "prompt": "Image 1 walks through a beautiful garden in the style of Image 2, cinematic lighting."}'
curl -X POST https://hub.oxen.ai/api/ai/videos/generate \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OXEN_API_KEY" \ -d '{ "model": "wan-v2-7-reference-to-video", "prompt": "Image 1 walks through a beautiful garden in the style of Image 2, cinematic lighting.", "reference_images": [ "https://hub.oxen.ai/api/repos/elau/assets/file/main/bloxy/bloxy_cropped_512x512.png" ], "reference_videos": [ "https://hub.oxen.ai/api/repos/ox/Oxen-AI-Assets/file/main/images/winter_summer_ox.mp4" ], "input_image": "https://hub.oxen.ai/api/repos/elau/assets/file/main/bloxy/bloxy_cropped_512x512.png"}'
curl -X POST https://hub.oxen.ai/api/ai/videos/generate \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OXEN_API_KEY" \ -d '{ "model": "wan-v2-7-reference-to-video", "prompt": "Image 1 walks through a beautiful garden in the style of Image 2, cinematic lighting.", "reference_images": [ "https://hub.oxen.ai/api/repos/elau/assets/file/main/bloxy/bloxy_cropped_512x512.png" ], "reference_videos": [ "https://hub.oxen.ai/api/repos/ox/Oxen-AI-Assets/file/main/images/winter_summer_ox.mp4" ], "input_image": "https://hub.oxen.ai/api/repos/elau/assets/file/main/bloxy/bloxy_cropped_512x512.png", "aspect_ratio": "16:9", "resolution": "1080P", "duration": 5, "prompt_extend": true, "watermark": false}'
"Image 1 walks through a beautiful garden in the style of Image 2, cinematic lighting."
Text prompt describing the desired video. Supports Chinese and English. Max 5000 characters. Use ‘Image 1, Image 2, …’ to reference reference_images in order, and ‘Video 1, Video 2, …’ to reference reference_videos in order; identifiers are independent across types.
Array of reference images for character/object/scene appearance. Each item has a URL and an optional reference voice. Order maps to ‘Image 1’, ‘Image 2’, etc. Reference images + reference videos must total ≤ 5. JPEG/JPG/PNG/BMP/WEBP, 240-8000 px per side, aspect ratio 1:8 to 8:1, max 20 MB each.
reference_videos
array<object>
—
Array of reference videos for character/object appearance, motion, and voice. Each item has a URL and an optional reference voice. Order maps to ‘Video 1’, ‘Video 2’, etc. Reference images + reference videos must total ≤ 5. MP4/MOV, 1-30s, 240-4096 px per side, aspect ratio 1:8 to 8:1, max 100 MB each.
input_image
string
—
Optional first-frame image used for joint control. Provides a starting frame the video is generated from. JPEG/JPG/PNG/BMP/WEBP, 240-8000 px per side, max 20 MB. When provided, the output aspect ratio is taken from this image and the aspect_ratio parameter is ignored. Format: uri.
aspect_ratio
string
"16:9"
Aspect ratio of the generated video. Ignored when a first frame image is provided (the model uses the input asset’s ratio). One of: 16:9, 9:16, 1:1, 4:3, 3:4.
resolution
string
"1080P"
Output video resolution tier. One of: 720P, 1080P.
duration
integer
5
Output video duration in seconds. 2-15 with reference images only; 2-10 when any reference video is included. Range: 2 – 15.
negative_prompt
string
—
Content to avoid in the video. Supports Chinese and English. Max 500 characters.
prompt_extend
boolean
true
Whether the model rewrites short prompts to improve quality. Adds processing time.
watermark
boolean
false
Adds an ‘AI Generated’ watermark to the bottom-right corner.
seed
integer
—
Random seed for reproducibility (0-2147483647). Range: 0 – 2147483647.
⌘I
Assistant
Responses are generated using AI and may contain mistakes.