Skip to main content

Try Seedance 2.0 Fast - Reference to Video in the Workbench

Run this model interactively, tune parameters, and compare outputs.
Model ID: bytedance-seedance-2-0-fast-reference-to-video ByteDance Seedance 2.0 Fast generates video from a text prompt guided by reference images, videos, and audio. Reference media are addressed in the prompt as [Image 1], [Video 1], [Audio 1] (1-based positional index within each media type).

Example request

Use the Workbench as a request builder: configure parameters for this model in the UI, then open the API tab to copy the exact cURL or Python call.
This blocks until the video is ready (typically 5-15 minutes). Prefer Async or Async with SSE for anything beyond quick experimentation.See the video generation reference for more details.
curl -X POST https://hub.oxen.ai/api/ai/videos/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OXEN_API_KEY" \
  -d '{
  "model": "bytedance-seedance-2-0-fast-reference-to-video",
  "prompt": "A lone ox walks down an empty desert highway at golden hour, dust drifting behind it, slow cinematic camera move, warm low sunlight, shallow depth of field."
}'

Fetch model details

The models endpoint returns the full model object, including its json_request_schema.
curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/bytedance-seedance-2-0-fast-reference-to-video

Request parameters

Required parameters

FieldTypeDefaultDescription
promptstring"A lone ox walks down an empty desert highway at golden hour, dust drifting behind it, slow cinematic camera move, warm low sunlight, shallow depth of field."The text prompt used to generate the video. Use [Image 1], [Video 1], [Audio 1], etc. (1-based positional index within each media type) to reference inputs. Face-containing media is indexed before non-face media within its type.

Optional parameters

FieldTypeDefaultDescription
input_face_imagesarray<string>Place reference images here when they show a real human face. Content filters may block the request otherwise.
input_imagesarray<string>Reference images that guide the generated video. Up to 9 total reference images and up to 15 total references across all types may be used.
input_face_videosarray<string>Place reference videos here when they show a real human face. Content filters may block the request otherwise.
input_videosarray<string>Reference videos that guide the generated video. Up to 3 total reference videos and up to 15 total references across all types may be used.
input_audiosarray<string>Reference audios that guide the generated video. Up to 3 total reference audios and up to 15 total references across all types may be used.
resolutionstring"720p"Video resolution. One of: 480p, 720p.
durationinteger-1Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. One of: -1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15.
aspect_ratiostring"16:9"The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. One of: adaptive, 21:9, 16:9, 4:3, 1:1, 3:4, 9:16.
generate_audiobooleanfalseWhether to generate synchronized audio.
watermarkbooleanfalseWhether to add an ‘AI generated’ watermark to the output.