Try WAN 2.7 - Reference to Video in the Workbench
Run this model interactively, tune parameters, and compare outputs.
wan-v2-7-reference-to-video
WAN 2.7 reference-to-video generates video from one or more reference images and reference videos with optional first-frame joint control, supporting single-character performances, multi-character interactions, and multi-shot narration. Up to 5 reference images and reference videos combined; reference identifiers in the prompt (“Image 1”, “Image 2”, “Video 1”, …) match the order of assets within each type. Output up to 1080P at 2-15s (2-10s when a reference video is included).
Example request
- Sync
- Async
- Async with SSE
This blocks until the video is ready (typically 5-15 minutes). Prefer Async or Async with SSE for anything beyond quick experimentation.See the video generation reference for more details.
- Minimal
- Basic parameters
- All parameters
Fetch model details
The models endpoint returns the full model object, including itsjson_request_schema.
Request parameters
Required parameters
| Field | Type | Default | Description |
|---|---|---|---|
prompt | string | "Image 1 walks through a beautiful garden in the style of Image 2, cinematic lighting." | Text prompt describing the desired video. Supports Chinese and English. Max 5000 characters. Use ‘Image 1, Image 2, …’ to reference reference_images in order, and ‘Video 1, Video 2, …’ to reference reference_videos in order; identifiers are independent across types. |
Optional parameters
| Field | Type | Default | Description |
|---|---|---|---|
reference_images | array<object> | — | Array of reference images for character/object/scene appearance. Each item has a URL and an optional reference voice. Order maps to ‘Image 1’, ‘Image 2’, etc. Reference images + reference videos must total ≤ 5. JPEG/JPG/PNG/BMP/WEBP, 240-8000 px per side, aspect ratio 1:8 to 8:1, max 20 MB each. |
reference_videos | array<object> | — | Array of reference videos for character/object appearance, motion, and voice. Each item has a URL and an optional reference voice. Order maps to ‘Video 1’, ‘Video 2’, etc. Reference images + reference videos must total ≤ 5. MP4/MOV, 1-30s, 240-4096 px per side, aspect ratio 1:8 to 8:1, max 100 MB each. |
input_image | string | — | Optional first-frame image used for joint control. Provides a starting frame the video is generated from. JPEG/JPG/PNG/BMP/WEBP, 240-8000 px per side, max 20 MB. When provided, the output aspect ratio is taken from this image and the aspect_ratio parameter is ignored. Format: uri. |
aspect_ratio | string | "16:9" | Aspect ratio of the generated video. Ignored when a first frame image is provided (the model uses the input asset’s ratio). One of: 16:9, 9:16, 1:1, 4:3, 3:4. |
resolution | string | "1080P" | Output video resolution tier. One of: 720P, 1080P. |
duration | integer | 5 | Output video duration in seconds. 2-15 with reference images only; 2-10 when any reference video is included. Range: 2 – 15. |
negative_prompt | string | — | Content to avoid in the video. Supports Chinese and English. Max 500 characters. |
prompt_extend | boolean | true | Whether the model rewrites short prompts to improve quality. Adds processing time. |
watermark | boolean | false | Adds an ‘AI Generated’ watermark to the bottom-right corner. |
seed | integer | — | Random seed for reproducibility (0-2147483647). Range: 0 – 2147483647. |