Skip to main content

Overview

Fine-tune image editing models to learn specific transformations: style transfer, object manipulation, background changes, or any image-to-image task.

Your Data

Your training data needs three columns:
  • Control/Input image - The original image
  • Target/Output image - The transformed image
  • Caption - Text describing the transformation
Example data in edits.parquet:
control_imageedited_imagecaption
inputs/001.jpgoutputs/001.jpgadd sunglasses to the person
inputs/002.jpgoutputs/002.jpgchange background to beach
inputs/003.jpgoutputs/003.jpgapply vintage filter

Minimal Example

import requests

url = "https://hub.oxen.ai/api/repos/YOUR_NAMESPACE/YOUR_REPO/fine_tunes"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

# Create fine-tune
data = {
    "resource": "main/edits.parquet",
    "base_model": "black-forest-labs/FLUX.1-Kontext-dev",
    "script_type": "image_editing",
    "training_params": {
        "control_image_column": "control_image",  # Input image column
        "image_column": "edited_image",           # Output image column
        "caption_column": "caption",              # Description column
        "steps": 3000
    }
}

response = requests.post(url, headers=headers, json=data)
fine_tune_id = response.json()["fine_tune"]["id"]

# Start training
run_url = f"{url}/{fine_tune_id}/actions/run"
requests.post(run_url, headers=headers)

print(f"Fine-tune started: {fine_tune_id}")

Key Parameters

Only these fields are required to start:
ParameterDescriptionExample
control_image_columnInput/original image column"control_image", "input", "source"
image_columnOutput/transformed image column"edited_image", "output", "target"
caption_columnTransformation description column"caption", "prompt", "description"
stepsNumber of training steps (2000-5000 typical)3000
All other parameters use sensible defaults.

Supported Models

Popular choices for image editing:
  • black-forest-labs/FLUX.1-Kontext-dev - High quality, versatile
  • Qwen/Qwen-Image-Edit - Fast, good for quick iterations
See the full model list for all available options.

Data Requirements

For best results:
  • Quantity: 20-100 image pairs minimum
  • Quality: High resolution, aligned transformations
  • Captions: Clear descriptions of what changed between input and output
  • Consistency: Transformations should follow a consistent pattern or style

Sample During Training

Add sample prompts to see progress during training:
data = {
    "resource": "main/edits.parquet",
    "base_model": "black-forest-labs/FLUX.1-Kontext-dev",
    "script_type": "image_editing",
    "training_params": {
        "control_image_column": "control_image",
        "image_column": "edited_image",
        "caption_column": "caption",
        "steps": 3000,
        "samples": [
            {
                "ctrl_img_url": "https://your-repo.com/test_image.jpg",
                "prompt": "apply the trained style transformation"
            }
        ],
        "sample_every": 200  # Generate sample every 200 steps
    }
}

Monitor Progress

status_url = f"https://hub.oxen.ai/api/repos/YOUR_NAMESPACE/YOUR_REPO/fine_tunes/{fine_tune_id}"
response = requests.get(status_url, headers=headers)
fine_tune = response.json()["fine_tune"]

print(f"Status: {fine_tune['status']}")
print(f"Current step: {fine_tune.get('current_step', 0)}")

Next Steps

Common Issues

Ensure input and output images show the same scene/subject. The model learns the transformation between them.
Adjust learning_rate (lower for subtle, higher for stronger). Default is 0.0002.
Reduce batch_size to 1 and sample_height/sample_width to 512 or 768.
Ensure your captions consistently describe the transformation. Train for more steps (4000-5000) or increase dataset size.