Skip to main content

Overview

This schema is used for fine-tuning models with image to video capabilities.

Schema Type

When creating a fine-tune with this schema, use:
{
  "resource": "main/your-dataset.parquet",
  "base_model": "<model-canonical-name>",
  "script_type": "image_to_video",
  "training_params": {
    ...
  }
}
Key Parameters:
  • script_type: image_to_video (the fine-tune type)
  • base_model: One of the supported model canonical names below

Supported Models

  • Wan2.1 1.3B - Text to Video (Wan-AI/Wan2.1-T2V-1.3B-Diffusers)
  • Wan2.2 A14B - Text to Video (Wan-AI/Wan2.2-T2V-A14B-Diffusers)
  • Wan2.1 14B - Text to Video (Wan-AI/Wan2.1-T2V-14B-Diffusers)

Request Schema

Required Fields

FieldTypeRequiredDescription
batch_sizeintegerNo(default: 1) (min: 1)
caption_columnstringYescaption_column (DataFrame column name)
gradient_accumulationintegerNo(default: 1) (min: 1)
image_columnstringYesimage_column (DataFrame column name)
learning_ratenumberNo(default: 0.0002)
lora_alphaintegerNo(default: 16) (min: 1)
lora_rankintegerNo(default: 16) (min: 1)
num_framesintegerNo(default: 81) (min: 1)
sample_everyintegerNo(default: 200) (min: 1)
samplesarrayNoSamples (array of object)
stepsintegerNo(default: 3000) (min: 1)
timestep_typestringNo(options: weighted, linear, sigmoid)
use_lorabooleanNouse_lora

Example Request

{
  "resource": "main/your-dataset.parquet",
  "base_model": "<model-canonical-name>",
  "script_type": "image_to_video",
  "training_params": {
    "batch_size": 1,
    "caption_column": "<caption_column>",
    "gradient_accumulation": 1,
    "image_column": "<image_column>",
    "learning_rate": 0.0002,
    "lora_alpha": 16,
    "lora_rank": 16,
    "num_frames": 81,
    "sample_every": 200,
    "samples": [
      {
        "prompt": "an ox holding a sign that says 'Oxen.ai'"
      },
      {
        "prompt": "a herd of oxen running in a field"
      }
    ],
    "steps": 3000,
    "timestep_type": "weighted",
    "use_lora": true
  }
}

Field Details

batch_size

Type: integer Default: 1 Minimum: 1

caption_column

Type: string

gradient_accumulation

Type: integer Default: 1 Minimum: 1

image_column

Type: string

learning_rate

Type: number Default: 0.0002

lora_alpha

Type: integer Default: 16 Minimum: 1

lora_rank

Type: integer Default: 16 Minimum: 1

num_frames

Type: integer Default: 81 Minimum: 1

sample_every

Type: integer Default: 200 Minimum: 1

samples

Samples Type: array Used to show progress during the fine-tuning process Default: [{"prompt": "an ox holding a sign that says 'Oxen.ai'"}, {"prompt": "a herd of oxen running in a field"}]

steps

Type: integer Default: 3000 Minimum: 1

timestep_type

Type: string Default: "weighted" Options: weighted, linear, sigmoid

use_lora

Type: boolean