Try Seedance 2.0 - Reference to Video in the Workbench
Run this model interactively, tune parameters, and compare outputs.
bytedance-seedance-2-0-reference-to-video
ByteDance Seedance 2 reference-to-video generates video from a text prompt guided by reference images, videos, and/or audio. Reference media are addressed in the prompt as @Image1, @Image2, @Video1, @Video2, @Audio1, etc.
Example request
- Sync
- Async
- Async with SSE
This blocks until the video is ready (typically 5-15 minutes). Prefer Async or Async with SSE for anything beyond quick experimentation.See the video generation reference for more details.
- Minimal
- Basic parameters
- All parameters
Fetch model details
The models endpoint returns the full model object, including itsjson_request_schema.
Request parameters
Required parameters
| Field | Type | Default | Description |
|---|---|---|---|
prompt | string | — | The text prompt used to generate the video. Use @Image1, @Video1, @Audio1, etc. to refer to reference media. |
Optional parameters
| Field | Type | Default | Description |
|---|---|---|---|
input_images | array<string> | — | Reference images to guide video generation. Refer to them in the prompt as @Image1, @Image2, etc. Supported formats: JPEG, PNG, WebP. Max 30 MB per image. Up to 9 images. Total files across all modalities must not exceed 12. |
input_videos | array<string> | — | Reference videos to guide video generation. Refer to them in the prompt as @Video1, @Video2, etc. Supported formats: MP4, MOV. Up to 3 videos, combined duration must be between 2 and 15 seconds, total size under 50 MB. Each video must be between ~480p (640x640) and ~720p (834x1112) in resolution. |
input_audios | array<string> | — | Reference audio to guide video generation. Refer to them in the prompt as @Audio1, @Audio2, etc. Supported formats: MP3, WAV. Up to 3 files, combined duration must not exceed 15 seconds. Max 15 MB per file. If audio is provided, at least one reference image or video is required. |
resolution | string | "720p" | Video resolution - 480p for faster generation, 720p for balance, 1080p for highest quality. One of: 480p, 720p, 1080p. |
duration | string | "auto" | Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. One of: auto, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15. |
generate_audio | boolean | true | Whether to generate synchronized audio for the video, including sound effects, ambient sounds, and lip-synced speech. The cost of video generation is the same regardless of whether audio is generated or not. |
aspect_ratio | string | "auto" | The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. One of: auto, 21:9, 16:9, 4:3, 1:1, 3:4, 9:16. |
seed | integer | — | Random seed for reproducibility. Note that results may still vary slightly even with the same seed. |