Skip to main content

Try Seedance 2.0 Fast - Reference to Video in the Workbench

Run this model interactively, tune parameters, and compare outputs.
Model ID: bytedance-seedance-2-0-fast-reference-to-video ByteDance Seedance 2.0 Fast generates video from a text prompt guided by reference images, videos, and audio. Reference media are addressed in the prompt as [Image 1], [Video 1], [Audio 1] (1-based positional index within each media type).

Example request

Use the Workbench as a request builder: configure parameters for this model in the UI, then open the API tab to copy the exact cURL or Python call.
This blocks until the video is ready (typically 5-15 minutes). Prefer Async or Async with SSE for anything beyond quick experimentation.See the video generation reference for more details.
curl -X POST https://hub.oxen.ai/api/ai/videos/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OXEN_API_KEY" \
  -d '{
  "model": "bytedance-seedance-2-0-fast-reference-to-video",
  "prompt": "<prompt>"
}'

Fetch model details

The models endpoint returns the full model object, including its json_request_schema.
curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/bytedance-seedance-2-0-fast-reference-to-video

Request parameters

Required parameters

FieldTypeDefaultDescription
promptstringThe text prompt used to generate the video. Use [Image 1], [Video 1], [Audio 1], etc. (1-based positional index within each media type) to reference inputs. Face-containing media is indexed before non-face media within its type.

Optional parameters

FieldTypeDefaultDescription
input_face_imagesarray<string>Place reference images here when they show a real human face. Content filters may block the request otherwise.
input_imagesarray<string>Reference images that guide the generated video. Up to 9 total reference images and up to 15 total references across all types may be used.
input_face_videosarray<string>Place reference videos here when they show a real human face. Content filters may block the request otherwise.
input_videosarray<string>Reference videos that guide the generated video. Up to 3 total reference videos and up to 15 total references across all types may be used.
input_audiosarray<string>Reference audios that guide the generated video. Up to 3 total reference audios and up to 15 total references across all types may be used.
resolutionstring"720p"Video resolution. One of: 480p, 720p.
durationinteger-1Duration of the video in seconds. Supports 4 to 15 seconds, or auto to let the model decide based on the prompt. One of: -1, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15.
aspect_ratiostring"adaptive"The aspect ratio of the generated video. Use 16:9 for landscape, 9:16 for portrait/vertical, 1:1 for square, 21:9 for ultrawide cinematic, or auto to let the model decide. One of: adaptive, 21:9, 16:9, 4:3, 1:1, 3:4, 9:16.
generate_audiobooleanfalseWhether to generate synchronized audio.
watermarkbooleanfalseWhether to add an ‘AI generated’ watermark to the output.