Try LTX 2.3 Quality: Audio to Video in the Workbench
Run this model interactively, tune parameters, and compare outputs.
ltx-2-3-quality-audio-to-video
LTX 2.3 Quality (Audio to Video) is the high-quality preset of Lightricks LTX-2.3 on fal, generating video driven by an input audio track, a text prompt, and an optional starting image. It runs a distilled DiT workflow with a quality preset control, synchronizing motion such as lip movement and gesture to the supplied audio.
When match audio length is enabled, the number of frames is derived from the audio duration and frame rate; otherwise a fixed frame count is used. An optional first-frame image can be conditioned with an adjustable strength, and the workflow can run from text and audio alone when no image is provided. It supports up to 481 frames at 1 to 60 FPS and is well suited for singing, talking-head, and performance clips.
| Metric | Value |
|---|---|
| Parameter Count | 22 billion |
| Mixture of Experts | No |
| Context Length | Unknown |
| Multilingual | Unknown |
| Quantized* | Unknown |
Example request
- Sync
- Async
- Async with SSE
This blocks until the video is ready (typically 5-15 minutes). Prefer Async or Async with SSE for anything beyond quick experimentation.See the video generation reference for more details.
- Minimal
- Basic parameters
- All parameters
Fetch model details
The models endpoint returns the full model object, including itsjson_request_schema.
Request parameters
Required parameters
| Field | Type | Default | Description |
|---|---|---|---|
prompt | string | — | The prompt to guide the audio-driven video generation. |
input_audio | string | — | The URL of the audio track that drives generation. Format: uri. |
Optional parameters
| Field | Type | Default | Description |
|---|---|---|---|
input_image | string | — | Optional URL of an image to use as the first frame. When omitted, the workflow runs from text and audio only. Format: uri. |
match_audio_length | boolean | true | When enabled, derives the number of frames from the audio duration and frames_per_second. When disabled, uses num_frames. |
num_frames | integer | 121 | The number of frames to generate. Range: 9 – 481. |
resolution | string | "auto" | Final output size. ‘auto’ matches the input image aspect ratio when an image is provided; otherwise it uses the workflow’s landscape fallback. |
frames_per_second | number | 24 | Frames per second of the generated video. Range: 1 – 60. |
image_strength | number | 0.7 | Conditioning strength for the optional first frame. 1.0 keeps the image more strictly; lower values give the model more freedom. Range: 0 – 1. |
generate_audio | boolean | true | Whether to include audio in the returned video. When disabled, the final MP4 is returned without an audio track. |
video_quality | string | "high" | The quality preset of the generated video. One of: low, medium, high, maximum. |
negative_prompt | string | "color distortion, overexposure, static, blurry details, subtitles, style, artwork, painting, frame, still, dim overall tone, worst quality, low quality, JPEG compression artifacts, ugly, mutilated, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, malformed limbs, fused fingers, motionless frame, cluttered background, three legs, crowded background, walking backwards" | The negative prompt to steer generation away from. |
seed | integer | — | Random seed for reproducibility. If None, a random seed is chosen. |