Try Seedance 2.0 - Reference to Video in the Workbench
Run this model interactively, tune parameters, and compare outputs.
bytedance-seedance-2-0-reference-to-video
ByteDance Seedance 2.0 (Pro) generates video from a text prompt guided by reference images, videos, and/or audio. Reference media are addressed in the prompt as [Image 1], [Video 1], [Audio 1] (1-based positional index within each media type).
Example request
- Sync
- Async
- Async with SSE
This blocks until the video is ready (typically 5-15 minutes). Prefer Async or Async with SSE for anything beyond quick experimentation.See the video generation reference for more details.
- Minimal
- Basic parameters
- All parameters
Fetch model details
The models endpoint returns the full model object, including itsjson_request_schema.
Request parameters
Required parameters
| Field | Type | Default | Description |
|---|---|---|---|
prompt | string | — | The text prompt used to generate the video. Use [Image 1], [Video 1], [Audio 1], etc. (1-based positional index within each media type) to reference inputs. Face-containing media is indexed before non-face media within its type. |
Optional parameters
| Field | Type | Default | Description |
|---|---|---|---|
input_face_images | array<string> | — | Place reference images here when they show a real human face. Content filters may block the request otherwise. |
input_images | array<string> | — | Reference images that guide the generated video. Up to 9 total reference images and up to 15 total references across all types may be used. |
input_face_videos | array<string> | — | Place reference videos here when they show a real human face. Content filters may block the request otherwise. |
input_videos | array<string> | — | Reference videos that guide the generated video. Up to 3 total reference videos and up to 15 total references across all types may be used. |
input_audios | array<string> | — | Reference audios that guide the generated video. Up to 3 total reference audios and up to 15 total references across all types may be used. |
resolution | string | "720p" | Video resolution. One of: 480p, 720p, 1080p. |
duration | integer | -1 | Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. One of: -1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15. |
aspect_ratio | string | "adaptive" | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. One of: adaptive, 21:9, 16:9, 4:3, 1:1, 3:4, 9:16. |
generate_audio | boolean | false | Whether to generate synchronized audio. |
watermark | boolean | false | Whether to add an ‘AI generated’ watermark to the output. |