Skip to main content

What is Fine-Tuning?

Fine-tuning allows you to customize pre-trained models with your own data, adapting them to your specific use cases. On Oxen.ai, you can fine-tune models for:
  • Text Generation - Chatbots, Q&A systems, content generation
  • Image Generation - Custom image styles, branded content
  • Image Editing - Style transfer, image-to-image transformations
  • Video Generation - Custom video styles and content
  • Vision-Language Tasks - Image captioning, visual Q&A

Getting Started

Prerequisites

Before fine-tuning, you need:
  1. An Oxen.ai account - Sign up if you haven’t already
  2. A repository with your training data
  3. Training data in a supported format (Parquet, CSV, etc.)
  4. An API key for authentication

Authentication

All fine-tuning API requests require authentication using a bearer token:
curl -H "Authorization: Bearer YOUR_API_KEY" \
  https://hub.oxen.ai/api/repos/{namespace}/{repo}/fine_tunes
Get your API key from your account settings.

Request Structure

All fine-tuning requests follow the same base structure:
{
  "resource": "main/your-dataset.parquet",
  "base_model": "<model-canonical-name>",
  "script_type": "<operation-type>",
  "training_params": {
    // Operation-specific parameters
  }
}

Common Fields

FieldDescriptionRequired
resourcePath to your training data (e.g., main/train.parquet)Yes
base_modelThe model to fine-tune (e.g., meta-llama/Llama-3.2-1B-Instruct)Yes
script_typeThe type of fine-tuning operationYes
training_paramsOperation-specific training parametersYes

Operation Types (script_type)

The script_type determines what kind of fine-tuning you’re doing:
  • text_generation - For text-based models (Q&A, chatbots, completion)
  • text_chat_messages - For conversational chat models
  • image_generation - For text-to-image models
  • image_editing - For image-to-image transformation
  • image_to_text - For image captioning and VLMs
  • image_to_video - For image-to-video generation
  • text_to_video - For text-to-video generation
  • multi_image_editing - For multi-image editing models

Common Training Parameters

While each operation type has specific parameters, many share common training configuration:

LoRA Parameters

Most fine-tuning uses LoRA (Low-Rank Adaptation) for efficient training:
  • use_lora - Enable LoRA (typically true)
  • lora_rank - Rank of LoRA matrices (default: 16, lower = faster/less memory)
  • lora_alpha - LoRA scaling factor (default: 16)

Training Configuration

  • batch_size - Number of samples per training step (default: 1)
  • learning_rate - Step size for optimization (typical: 0.0001-0.0002)
  • epochs or steps - Training duration (text models use epochs, image models use steps)
  • gradient_accumulation or grad_accum - Accumulate gradients across multiple steps

Data Configuration

Each operation type requires specific data columns: Text models:
  • question_column - Input text column
  • answer_column - Output/response column
Image models:
  • image_column - Output image column
  • caption_column - Text prompt column
  • control_image_column - Input image column (for editing)

Quick Start Guides

Choose your use case to get started with minimal examples:

Detailed API Reference

For complete parameter documentation and advanced configuration:

Parameter Guide

Learn about common training parameters and how to tune them:

Next Steps

  1. Choose your use case from the Quick Start guides above
  2. Prepare your data in the required format
  3. Start your first fine-tune using the API
  4. Monitor progress and deploy your model
Need help? Join our Discord community or check out the detailed examples.