Skip to main content

Overview

Fine-tune text generation models for chatbots, Q&A systems, or content generation. This guide shows the minimal setup to get started.

Your Data

Your training data should have two columns:
  • Input column - The user prompt or question
  • Output column - The expected response or answer
Example data in train.parquet:
textsentiment
The product exceeded expectationspositive
Terrible customer servicenegative
Average experience, nothing specialneutral

Minimal Example

import requests

url = "https://hub.oxen.ai/api/repos/YOUR_NAMESPACE/YOUR_REPO/fine_tunes"
headers = {
    "Authorization": "Bearer YOUR_API_KEY",
    "Content-Type": "application/json"
}

# Create fine-tune
data = {
    "resource": "main/train.parquet",
    "base_model": "meta-llama/Llama-3.2-1B-Instruct",
    "script_type": "text_generation",
    "training_params": {
        "question_column": "text",        # Your input column name
        "answer_column": "sentiment",      # Your output column name
        "epochs": 1
    }
}

response = requests.post(url, headers=headers, json=data)
fine_tune_id = response.json()["fine_tune"]["id"]

# Start training
run_url = f"{url}/{fine_tune_id}/actions/run"
requests.post(run_url, headers=headers)

print(f"Fine-tune started: {fine_tune_id}")

Key Parameters

Only these fields are required to start:
ParameterDescriptionExample
question_columnName of your input/prompt column"text", "question", "prompt"
answer_columnName of your output/response column"sentiment", "answer", "response"
epochsNumber of training passes (1-3 typical)1
All other parameters use sensible defaults.

Supported Models

Popular choices for text generation:
  • meta-llama/Llama-3.2-1B-Instruct - Fast, good for Q&A
  • meta-llama/Llama-3.2-3B-Instruct - Balanced performance
  • meta-llama/Llama-3.1-8B-Instruct - Higher quality, slower
  • Qwen/Qwen3-0.6B - Very fast, lightweight
See the full model list for all available options.

Monitor Progress

Check the status of your fine-tune:
status_url = f"https://hub.oxen.ai/api/repos/YOUR_NAMESPACE/YOUR_REPO/fine_tunes/{fine_tune_id}"
response = requests.get(status_url, headers=headers)
status = response.json()["fine_tune"]["status"]
print(f"Status: {status}")
Status values: created, running, completed, errored

Next Steps

Common Issues

Double-check your question_column and answer_column names match your data exactly. Column names are case-sensitive.
Reduce batch_size to 1 or try a smaller model like Llama-3.2-1B-Instruct.
Start with 1 epoch. If results aren’t good enough, try 2-3 epochs. More isn’t always better.