Try LTX 2.3 Quality: Reference Video to Video in the Workbench
Run this model interactively, tune parameters, and compare outputs.
ltx-2-3-quality-reference-video-to-video
LTX 2.3 Quality (Reference Video to Video) is the structure-guided preset of Lightricks LTX-2.3 on fal, generating video with synchronized native audio from a reference video, a text prompt, and an optional reference image. It runs the official Union IC-LoRA workflow, where the reference video supplies motion and structure that guide the generated output.
The model offers fine control over how closely the result follows the source: video_strength adjusts how much freedom the model has to change the reference motion/structure, while an optional pre-computed control video (depth/edge/pose) can be fed directly into the Union Control conditioning by skipping the built-in control estimation. A true video-to-video mode (preserve_original_video) VAE-encodes the base video as the starting latent, with the denoise strength controlling how much of the original pixels are kept. An optional reference image anchors style or character via the IC-LoRA Union node.
| Metric | Value |
|---|---|
| Parameter Count | 22 billion |
| Mixture of Experts | No |
| Context Length | Unknown |
| Multilingual | Unknown |
| Quantized* | Unknown |
Example request
- Sync
- Async
- Async with SSE
This blocks until the video is ready (typically 5-15 minutes). Prefer Async or Async with SSE for anything beyond quick experimentation.See the video generation reference for more details.
- Minimal
- Basic parameters
- All parameters
Fetch model details
The models endpoint returns the full model object, including itsjson_request_schema.
Request parameters
Required parameters
| Field | Type | Default | Description |
|---|---|---|---|
prompt | string | — | The prompt to guide generation. |
input_video | string | — | The URL of the reference video that supplies motion/structure. Format: uri. |
Optional parameters
| Field | Type | Default | Description |
|---|---|---|---|
input_image | string | — | Optional reference image used by the IC-LoRA Union node for style/character anchoring. Format: uri. |
control_video_url | string | — | Optional pre-computed control video (e.g. an already-rendered depth / edge / pose composite). When provided together with skip_control_preprocess, the built-in control estimation is skipped and this video is fed directly into the Union Control conditioning, resampled to the output resolution and frame count. Format: uri. |
skip_control_preprocess | boolean | false | Skip the built-in control estimation (depth / edge / pose) and use control_video_url directly as the control signal. Requires control_video_url; ignored if it is not set. |
preserve_original_video | boolean | false | True video-to-video: VAE-encode the base video and use it as the sampler’s starting latent, while control_video_url still drives the IC-LoRA control guide. The amount preserved is controlled by strength. Requires skip_control_preprocess, video and control_video_url. |
video_strength | number | 0.6 | Video conditioning strength. Lower values give the model more freedom to change the reference video motion/structure. |
strength | number | 1 | Sampler denoise strength. With preserve_original_video on, lower values keep more of the original video’s pixels (e.g. 0.5 = keep ~50%), 1.0 fully regenerates. Must be greater than 0. |
num_frames | integer | 121 | The number of frames to generate. Range: 9 – 481. |
resolution | string | "auto" | The size of the generated video. ‘auto’ keeps the official Union Control workflow’s low-resolution control output size. The output is generated at up to ~720p (shorter side capped at 704px); larger requested sizes are scaled down, preserving the aspect ratio. One of: auto, square_hd, square, portrait_4_3, portrait_16_9, landscape_4_3, landscape_16_9. |
frames_per_second | number | 24 | Frames per second of the generated video. Range: 1 – 60. |
generate_audio | boolean | true | Whether to include audio in the returned video. When disabled, the final MP4 is returned without an audio track. |
video_quality | string | "high" | The quality preset of the generated video. One of: low, medium, high, maximum. |
negative_prompt | string | "color distortion, overexposure, static, blurry details, subtitles, style, artwork, painting, frame, still, dim overall tone, worst quality, low quality, JPEG compression artifacts, ugly, mutilated, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, malformed limbs, fused fingers, motionless frame, cluttered background, three legs, crowded background, walking backwards" | The negative prompt to steer generation away from. |
seed | integer | — | Random seed for reproducibility. If None, a random seed is chosen. |