Skip to main content

Try LTX 2.3 Quality: Audio to Video in the Workbench

Run this model interactively, tune parameters, and compare outputs.
Model ID: ltx-2-3-quality-audio-to-video LTX 2.3 Quality (Audio to Video) is the high-quality preset of Lightricks LTX-2.3 on fal, generating video driven by an input audio track, a text prompt, and an optional starting image. It runs a distilled DiT workflow with a quality preset control, synchronizing motion such as lip movement and gesture to the supplied audio. When match audio length is enabled, the number of frames is derived from the audio duration and frame rate; otherwise a fixed frame count is used. An optional first-frame image can be conditioned with an adjustable strength, and the workflow can run from text and audio alone when no image is provided. It supports up to 481 frames at 1 to 60 FPS and is well suited for singing, talking-head, and performance clips.
MetricValue
Parameter Count22 billion
Mixture of ExpertsNo
Context LengthUnknown
MultilingualUnknown
Quantized*Unknown
*Quantization is specific to the inference provider and the model may be offered with different quantization levels by other providers.

Example request

Use the Workbench as a request builder: configure parameters for this model in the UI, then open the API tab to copy the exact cURL or Python call.
This blocks until the video is ready (typically 5-15 minutes). Prefer Async or Async with SSE for anything beyond quick experimentation.See the video generation reference for more details.
curl -X POST https://hub.oxen.ai/api/ai/videos/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OXEN_API_KEY" \
  -d '{
  "model": "ltx-2-3-quality-audio-to-video",
  "prompt": "<prompt>",
  "input_audio": "https://example.com/audio.mp3"
}'

Fetch model details

The models endpoint returns the full model object, including its json_request_schema.
curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ltx-2-3-quality-audio-to-video

Request parameters

Required parameters

FieldTypeDefaultDescription
promptstringThe prompt to guide the audio-driven video generation.
input_audiostringThe URL of the audio track that drives generation. Format: uri.

Optional parameters

FieldTypeDefaultDescription
input_imagestringOptional URL of an image to use as the first frame. When omitted, the workflow runs from text and audio only. Format: uri.
match_audio_lengthbooleantrueWhen enabled, derives the number of frames from the audio duration and frames_per_second. When disabled, uses num_frames.
num_framesinteger121The number of frames to generate. Range: 9 – 481.
resolutionstring"auto"Final output size. ‘auto’ matches the input image aspect ratio when an image is provided; otherwise it uses the workflow’s landscape fallback.
frames_per_secondnumber24Frames per second of the generated video. Range: 1 – 60.
image_strengthnumber0.7Conditioning strength for the optional first frame. 1.0 keeps the image more strictly; lower values give the model more freedom. Range: 0 – 1.
generate_audiobooleantrueWhether to include audio in the returned video. When disabled, the final MP4 is returned without an audio track.
video_qualitystring"high"The quality preset of the generated video. One of: low, medium, high, maximum.
negative_promptstring"color distortion, overexposure, static, blurry details, subtitles, style, artwork, painting, frame, still, dim overall tone, worst quality, low quality, JPEG compression artifacts, ugly, mutilated, extra fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, malformed limbs, fused fingers, motionless frame, cluttered background, three legs, crowded background, walking backwards"The negative prompt to steer generation away from.
seedintegerRandom seed for reproducibility. If None, a random seed is chosen.