Skip to main content

Overview

Fine-tune video generation models to create videos in your specific style. Works for both text-to-video and image-to-video generation.

Your Data

Text-to-Video

Data should have:
  • Video column - Paths to your training videos
  • Caption column - Text descriptions of each video
Example videos.parquet:
videocaption
clips/001.mp4person walking in cyberpunk city
clips/002.mp4car driving through neon streets

Image-to-Video

Data should have:
  • Video column - Output video paths
  • Image column - First frame/reference image
  • Caption column - Description of the motion/action
Example img2vid.parquet:
imagevideocaption
frames/001.jpgclips/001.mp4zoom into the building
frames/002.jpgclips/002.mp4camera pan left to right

Minimal Example: Text-to-Video

import requests

url = "https://hub.oxen.ai/api/repos/YOUR_NAMESPACE/YOUR_REPO/fine_tunes"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

# Create fine-tune
data = {
    "resource": "main/videos.parquet",
    "base_model": "YOUR_VIDEO_MODEL",  # e.g., a video generation model
    "script_type": "text_to_video",
    "training_params": {
        "video_column": "video",
        "caption_column": "caption",
        "steps": 2000
    }
}

response = requests.post(url, headers=headers, json=data)
fine_tune_id = response.json()["fine_tune"]["id"]

# Start training
run_url = f"{url}/{fine_tune_id}/actions/run"
requests.post(run_url, headers=headers)

print(f"Fine-tune started: {fine_tune_id}")

Minimal Example: Image-to-Video

import requests

url = "https://hub.oxen.ai/api/repos/YOUR_NAMESPACE/YOUR_REPO/fine_tunes"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

# Create fine-tune
data = {
    "resource": "main/img2vid.parquet",
    "base_model": "YOUR_VIDEO_MODEL",
    "script_type": "image_to_video",
    "training_params": {
        "image_column": "image",          # First frame/reference
        "video_column": "video",          # Output video
        "caption_column": "caption",      # Motion description
        "steps": 2000
    }
}

response = requests.post(url, headers=headers, json=data)
fine_tune_id = response.json()["fine_tune"]["id"]

# Start training
run_url = f"{url}/{fine_tune_id}/actions/run"
requests.post(run_url, headers=headers)

print(f"Fine-tune started: {fine_tune_id}")

Key Parameters

Text-to-Video:
ParameterDescriptionExample
video_columnVideo file column"video", "clip"
caption_columnDescription column"caption", "prompt"
stepsTraining steps2000
Image-to-Video:
ParameterDescriptionExample
image_columnFirst frame/reference image"image", "frame"
video_columnOutput video column"video", "clip"
caption_columnMotion description"caption", "motion"
stepsTraining steps2000

Data Requirements

Video fine-tuning is resource-intensive:
  • Quantity: 50-200 videos minimum
  • Quality: Consistent resolution, frame rate, duration
  • Length: 2-10 seconds per clip (shorter is better)
  • Format: MP4, WebM, or other common formats
  • Captions: Describe motion, camera movement, and key actions
Video fine-tuning requires significant compute resources and storage. Expect longer training times compared to image or text models.

Monitor Progress

status_url = f"https://hub.oxen.ai/api/repos/YOUR_NAMESPACE/YOUR_REPO/fine_tunes/{fine_tune_id}"
response = requests.get(status_url, headers=headers)
fine_tune = response.json()["fine_tune"]

print(f"Status: {fine_tune['status']}")
print(f"Current step: {fine_tune.get('current_step', 0)}")

Next Steps

Common Issues

Ensure videos are committed to your Oxen repository. Check file paths are correct and relative to repo root.
Video models need significant GPU memory. Reduce batch_size to 1 and consider shorter video clips.
Video fine-tuning takes hours to days. Start with 1000 steps for testing. Use shorter videos (2-5 seconds) for faster iteration.
Ensure training videos have consistent quality, resolution, and frame rate. Increase training steps to 3000-5000.