@Image1, @Image2, @Video1, @Video2, @Audio1, etc. Supports resolutions up to 720p, durations from 4–15 seconds, and synchronized audio generation including sound effects, ambient sounds, and lip-synced speech.
Model name: bytedance-seedance-2-0-reference-to-video
Endpoint
/ai/queue instead for long-running jobs, so that you don’t have long running http requests.
Request Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model | string | yes | — | "bytedance-seedance-2-0-reference-to-video" |
prompt | string | yes | — | Text prompt. Use @Image1, @Video1, @Audio1, etc. to reference input media. |
input_images | array of URIs | no | — | Reference images (JPEG, PNG, WebP). Max 30 MB each. Up to 9. Use @Image1, @Image2, … in the prompt. |
input_videos | array of URIs | no | — | Reference videos (MP4, MOV). Up to 3. Combined duration must be 2–15 s, total size < 50 MB. Resolution between ~480p and ~720p. Use @Video1, @Video2, … in the prompt. |
input_audios | array of URIs | no | — | Reference audio (MP3, WAV). Up to 3 files. Combined duration ≤ 15 s. Max 15 MB each. Requires at least one reference image or video. Use @Audio1, @Audio2, … in the prompt. |
resolution | string | no | "720p" | "480p" for faster generation, "720p" for higher quality. |
duration | string | no | "auto" | Duration in seconds: "auto", or "4" through "15". |
generate_audio | boolean | no | true | Generate synchronized audio (sound effects, ambient sounds, lip-synced speech). Cost is the same either way. |
aspect_ratio | string | no | "auto" | "auto", "21:9", "16:9", "4:3", "1:1", "3:4", or "9:16". |
seed | integer | no | — | Random seed for reproducibility. Results may still vary slightly. |
response_format | string | no | "url" | "url" returns a hosted URL. "b64_json" returns base64-encoded video bytes inline. |
target_namespace | string | no | current user | Namespace to save results and bill to. Can be an organization name. |
Reference Media Limits
| Modality | Max Count | Size Limit | Other Constraints |
|---|---|---|---|
| Images | 9 | 30 MB each | JPEG, PNG, WebP |
| Videos | 3 | 50 MB total | MP4, MOV. Combined duration 2–15 s. Resolution ~480p to ~720p. |
| Audio | 3 | 15 MB each | MP3, WAV. Combined duration ≤ 15 s. Requires ≥ 1 image or video. |
Duration
| Value | Behavior |
|---|---|
"auto" | Model decides based on prompt and references |
"4" – "15" | Fixed duration in seconds |
Examples
Text-only prompt
With reference images
With reference video and audio
Portrait video at 480p
Response (response_format: "url")
Response (response_format: "b64_json")
Using with /ai/queue
Recommended for video generation. Returns immediately, processes in the background.Enqueue
Poll
count of 0 means all generations are complete.
Cancel
Errors
| Error | Cause | Fix |
|---|---|---|
Field required | Missing prompt | Provide a text prompt |
Too many input files | Total files across images, videos, audio > 12 | Reduce the number of reference files |
Audio requires at least one image or video | input_audios provided without input_images or input_videos | Add at least one reference image or video |
Invalid duration | Duration not "auto" or "4"–"15" | Use a supported duration value |
Invalid resolution | Resolution not "480p" or "720p" | Use "480p" or "720p" |
num_generations must be an integer between 1 and 4 | Invalid count (via /ai/queue) | Use 1–4 |