Overview
Fine-tune video generation models to create videos in your specific style. Works for both text-to-video and image-to-video generation.Your Data
Text-to-Video
Data should have:- Video column - Paths to your training videos
- Caption column - Text descriptions of each video
videos.parquet:
| video | caption |
|---|---|
| clips/001.mp4 | person walking in cyberpunk city |
| clips/002.mp4 | car driving through neon streets |
Image-to-Video
Data should have:- Video column - Output video paths
- Image column - First frame/reference image
- Caption column - Description of the motion/action
img2vid.parquet:
| image | video | caption |
|---|---|---|
| frames/001.jpg | clips/001.mp4 | zoom into the building |
| frames/002.jpg | clips/002.mp4 | camera pan left to right |
Minimal Example: Text-to-Video
Minimal Example: Image-to-Video
Key Parameters
Text-to-Video:| Parameter | Description | Example |
|---|---|---|
video_column | Video file column | "video", "clip" |
caption_column | Description column | "caption", "prompt" |
steps | Training steps | 2000 |
| Parameter | Description | Example |
|---|---|---|
image_column | First frame/reference image | "image", "frame" |
video_column | Output video column | "video", "clip" |
caption_column | Motion description | "caption", "motion" |
steps | Training steps | 2000 |
Data Requirements
Video fine-tuning is resource-intensive:- Quantity: 50-200 videos minimum
- Quality: Consistent resolution, frame rate, duration
- Length: 2-10 seconds per clip (shorter is better)
- Format: MP4, WebM, or other common formats
- Captions: Describe motion, camera movement, and key actions
Monitor Progress
Next Steps
- Text-to-Video Reference - All parameters
- Image-to-Video Reference - All parameters
- Deploy your model - Generate videos with your fine-tuned model
Common Issues
Videos not loading
Videos not loading
Ensure videos are committed to your Oxen repository. Check file paths are correct and relative to repo root.
Out of memory error
Out of memory error
Video models need significant GPU memory. Reduce
batch_size to 1 and consider shorter video clips.Training very slow
Training very slow
Video fine-tuning takes hours to days. Start with 1000 steps for testing. Use shorter videos (2-5 seconds) for faster iteration.
Low quality output
Low quality output
Ensure training videos have consistent quality, resolution, and frame rate. Increase training steps to 3000-5000.