What is Fine-Tuning?
Fine-tuning allows you to customize pre-trained models with your own data, adapting them to your specific use cases. On Oxen.ai, you can fine-tune models for:- Text Generation - Chatbots, Q&A systems, content generation
- Image Generation - Custom image styles, branded content
- Image Editing - Style transfer, image-to-image transformations
- Video Generation - Custom video styles and content
- Vision-Language Tasks - Image captioning, visual Q&A
Getting Started
Prerequisites
Before fine-tuning, you need:- An Oxen.ai account - Sign up if you haven’t already
- A repository with your training data
- Training data in a supported format (Parquet, CSV, etc.)
- An API key for authentication
Authentication
All fine-tuning API requests require authentication using a bearer token:Request Structure
All fine-tuning requests follow the same base structure:Common Fields
| Field | Description | Required |
|---|---|---|
resource | Path to your training data (e.g., main/train.parquet) | Yes |
base_model | The model to fine-tune (e.g., meta-llama/Llama-3.2-1B-Instruct) | Yes |
script_type | The type of fine-tuning operation | Yes |
training_params | Operation-specific training parameters | Yes |
Operation Types (script_type)
The script_type determines what kind of fine-tuning you’re doing:
text_generation- For text-based models (Q&A, chatbots, completion)text_chat_messages- For conversational chat modelsimage_generation- For text-to-image modelsimage_editing- For image-to-image transformationimage_to_text- For image captioning and VLMsimage_to_video- For image-to-video generationtext_to_video- For text-to-video generationmulti_image_editing- For multi-image editing models
Common Training Parameters
While each operation type has specific parameters, many share common training configuration:LoRA Parameters
Most fine-tuning uses LoRA (Low-Rank Adaptation) for efficient training:use_lora- Enable LoRA (typicallytrue)lora_rank- Rank of LoRA matrices (default: 16, lower = faster/less memory)lora_alpha- LoRA scaling factor (default: 16)
Training Configuration
batch_size- Number of samples per training step (default: 1)learning_rate- Step size for optimization (typical: 0.0001-0.0002)epochsorsteps- Training duration (text models use epochs, image models use steps)gradient_accumulationorgrad_accum- Accumulate gradients across multiple steps
Data Configuration
Each operation type requires specific data columns: Text models:question_column- Input text columnanswer_column- Output/response column
image_column- Output image columncaption_column- Text prompt columncontrol_image_column- Input image column (for editing)
Quick Start Guides
Choose your use case to get started with minimal examples:Text Generation
Fine-tune chatbots and Q&A models
Image Generation
Create custom image styles
Image Editing
Fine-tune image transformation models
Video Generation
Generate custom videos
Detailed API Reference
For complete parameter documentation and advanced configuration:- Text Generation Reference
- Text Chat Messages Reference
- Image Generation Reference
- Image Editing Reference
- Image to Text Reference
- Image to Video Reference
- Text to Video Reference
- Multi-Image Editing Reference
Parameter Guide
Learn about common training parameters and how to tune them:- Understanding LoRA
- Learning Rate and Optimization
- Batch Size and Memory Management
- Training Duration
Next Steps
- Choose your use case from the Quick Start guides above
- Prepare your data in the required format
- Start your first fine-tune using the API
- Monitor progress and deploy your model