Fine-Tune: Image To Video

Overview

This schema is used for fine-tuning models with image to video capabilities.

Schema Type

When creating a fine-tune with this schema, use:

{
  "resource": "main/your-dataset.parquet",
  "base_model": "<model-canonical-name>",
  "script_type": "image_to_video",
  "training_params": {
    ...
  }
}

Key Parameters:

script_type: image_to_video (the fine-tune type)
base_model: One of the supported model canonical names below

Supported Models

Wan2.1 1.3B - Text to Video (Wan-AI/Wan2.1-T2V-1.3B-Diffusers)
Wan2.2 A14B - Text to Video (Wan-AI/Wan2.2-T2V-A14B-Diffusers)
Wan2.1 14B - Text to Video (Wan-AI/Wan2.1-T2V-14B-Diffusers)

Request Schema

Required Fields

Field	Type	Required	Description
`batch_size`	integer	No	(default: 1) (min: 1)
`caption_column`	string	Yes	caption_column (DataFrame column name)
`gradient_accumulation`	integer	No	(default: 1) (min: 1)
`image_column`	string	Yes	image_column (DataFrame column name)
`learning_rate`	number	No	(default: 0.0002)
`lora_alpha`	integer	No	(default: 16) (min: 1)
`lora_rank`	integer	No	(default: 16) (min: 1)
`num_frames`	integer	No	(default: 81) (min: 1)
`sample_every`	integer	No	(default: 200) (min: 1)
`samples`	array	No	Samples (array of object)
`steps`	integer	No	(default: 3000) (min: 1)
`timestep_type`	string	No	(options: weighted, linear, sigmoid)
`use_lora`	boolean	No	use_lora

Example Request

{
  "resource": "main/your-dataset.parquet",
  "base_model": "<model-canonical-name>",
  "script_type": "image_to_video",
  "training_params": {
    "batch_size": 1,
    "caption_column": "<caption_column>",
    "gradient_accumulation": 1,
    "image_column": "<image_column>",
    "learning_rate": 0.0002,
    "lora_alpha": 16,
    "lora_rank": 16,
    "num_frames": 81,
    "sample_every": 200,
    "samples": [
      {
        "prompt": "an ox holding a sign that says 'Oxen.ai'"
      },
      {
        "prompt": "a herd of oxen running in a field"
      }
    ],
    "steps": 3000,
    "timestep_type": "weighted",
    "use_lora": true
  }
}

Field Details

`batch_size`

Type: integer Default: 1 Minimum: 1

`caption_column`

Type: string

`gradient_accumulation`

Type: integer Default: 1 Minimum: 1

`image_column`

Type: string

`learning_rate`

Type: number Default: 0.0002

`lora_alpha`

Type: integer Default: 16 Minimum: 1

`lora_rank`

Type: integer Default: 16 Minimum: 1

`num_frames`

Type: integer Default: 81 Minimum: 1

`sample_every`

Type: integer Default: 200 Minimum: 1

`samples`

Samples Type: array Used to show progress during the fine-tuning process Default: [{"prompt": "an ox holding a sign that says 'Oxen.ai'"}, {"prompt": "a herd of oxen running in a field"}]

`steps`

Type: integer Default: 3000 Minimum: 1

`timestep_type`

Type: string Default: "weighted" Options: weighted, linear, sigmoid

`use_lora`

Type: boolean

Fine-Tuning API

fine_tunes

Fine-Tune: Image To Video

Overview

Schema Type

Supported Models

Request Schema

Required Fields

Example Request

Field Details

`batch_size`

`caption_column`

`gradient_accumulation`

`image_column`

`learning_rate`

`lora_alpha`

`lora_rank`

`num_frames`

`sample_every`

`samples`

`steps`

`timestep_type`

`use_lora`

Fine-Tuning API

fine_tunes

​Overview

​Schema Type

​Supported Models

​Request Schema

​Required Fields

​Example Request

​Field Details

​batch_size

​caption_column

​gradient_accumulation

​image_column

​learning_rate

​lora_alpha

​lora_rank

​num_frames

​sample_every

​samples

​steps

​timestep_type

​use_lora

Overview

Schema Type

Supported Models

Request Schema

Required Fields

Example Request

Field Details

`batch_size`

`caption_column`

`gradient_accumulation`

`image_column`

`learning_rate`

`lora_alpha`

`lora_rank`

`num_frames`

`sample_every`

`samples`

`steps`

`timestep_type`

`use_lora`