Skip to main content

Try LTX 2.3 Quality: Reference Video to Video in the Workbench

Run this model interactively, tune parameters, and compare outputs.
Model ID: ltx-2-3-quality-reference-video-to-video LTX 2.3 Quality (Reference Video to Video) is the structure-guided preset of Lightricks LTX-2.3 on fal, generating video with synchronized native audio from a reference video, a text prompt, and an optional reference image. It runs the official Union IC-LoRA workflow, where the reference video supplies motion and structure that guide the generated output. The model offers fine control over how closely the result follows the source: video_strength adjusts how much freedom the model has to change the reference motion/structure, while an optional pre-computed control video (depth/edge/pose) can be fed directly into the Union Control conditioning by skipping the built-in control estimation. A true video-to-video mode (preserve_original_video) VAE-encodes the base video as the starting latent, with the denoise strength controlling how much of the original pixels are kept. An optional reference image anchors style or character via the IC-LoRA Union node.
MetricValue
Parameter Count22 billion
Mixture of ExpertsNo
Context LengthUnknown
MultilingualUnknown
Quantized*Unknown
*Quantization is specific to the inference provider and the model may be offered with different quantization levels by other providers.

Example request

Use the Workbench as a request builder: configure parameters for this model in the UI, then open the API tab to copy the exact cURL or Python call.
This blocks until the video is ready (typically 5-15 minutes). Prefer Async or Async with SSE for anything beyond quick experimentation.See the video generation reference for more details.
curl -X POST https://hub.oxen.ai/api/ai/videos/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OXEN_API_KEY" \
  -d '{
  "model": "ltx-2-3-quality-reference-video-to-video",
  "prompt": "<prompt>",
  "input_video": "https://hub.oxen.ai/api/repos/ox/Oxen-AI-Assets/file/main/images/winter_summer_ox.mp4"
}'

Fetch model details

The models endpoint returns the full model object, including its json_request_schema.
curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ltx-2-3-quality-reference-video-to-video

Request parameters

Required parameters

FieldTypeDefaultDescription
promptstringThe prompt to guide generation.
input_videostringThe URL of the reference video that supplies motion/structure. Format: uri.

Optional parameters

FieldTypeDefaultDescription
input_imagestringOptional reference image used by the IC-LoRA Union node for style/character anchoring. Format: uri.
control_video_urlstringOptional pre-computed control video (e.g. an already-rendered depth / edge / pose composite). When provided together with skip_control_preprocess, the built-in control estimation is skipped and this video is fed directly into the Union Control conditioning, resampled to the output resolution and frame count. Format: uri.
skip_control_preprocessbooleanfalseSkip the built-in control estimation (depth / edge / pose) and use control_video_url directly as the control signal. Requires control_video_url; ignored if it is not set.
preserve_original_videobooleanfalseTrue video-to-video: VAE-encode the base video and use it as the sampler’s starting latent, while control_video_url still drives the IC-LoRA control guide. The amount preserved is controlled by strength. Requires skip_control_preprocess, video and control_video_url.
video_strengthnumber0.6Video conditioning strength. Lower values give the model more freedom to change the reference video motion/structure.
strengthnumber1Sampler denoise strength. With preserve_original_video on, lower values keep more of the original video’s pixels (e.g. 0.5 = keep ~50%), 1.0 fully regenerates. Must be greater than 0.
num_framesinteger121The number of frames to generate. Range: 9 – 481.
resolutionstring"auto"The size of the generated video. ‘auto’ keeps the official Union Control workflow’s low-resolution control output size. The output is generated at up to ~720p (shorter side capped at 704px); larger requested sizes are scaled down, preserving the aspect ratio. One of: auto, square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9.
frames_per_secondnumber24Frames per second of the generated video. Range: 1 – 60.
generate_audiobooleantrueWhether to include audio in the returned video. When disabled, the final MP4 is returned without an audio track.
video_qualitystring"high"The quality preset of the generated video. One of: low, medium, high, maximum.
negative_promptstring"color distortion, overexposure, static, blurry details, subtitles, style, artwork, painting, frame, still, dim overall tone, worst quality, low quality, JPEG compression artifacts, ugly, mutilated, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, malformed limbs, fused fingers, motionless frame, cluttered background, three legs, crowded background, walking backwards"The negative prompt to steer generation away from.
seedintegerRandom seed for reproducibility. If None, a random seed is chosen.