Overview
This guide explains common training parameters across all fine-tuning operations. Use this reference to understand what each parameter does and how to adjust them for your use case.LoRA (Low-Rank Adaptation)
LoRA is a technique for efficient fine-tuning that drastically reduces memory requirements and training time.use_lora
Type: boolean
Default: true
Applies to: All models
Whether to use LoRA for fine-tuning. Almost always recommended.
true- Use LoRA (faster, less memory, recommended)false- Full fine-tuning (slower, more memory, rarely needed)
lora_rank
Type: integer
Default: 16
Range: 1-128 (typical: 8-64)
Applies to: All models when use_lora: true
The rank of LoRA matrices. Lower rank = faster training and less memory, but potentially less expressive.
When to adjust:
- Reduce to 8 if you’re out of memory or want faster training
- Increase to 32-64 if you have a large, complex dataset and need more capacity
lora_alpha
Type: integer
Default: 16
Typical: Same as lora_rank
Applies to: All models when use_lora: true
Scaling factor for LoRA updates. Typically set equal to lora_rank.
When to adjust:
- Keep equal to
lora_rankin most cases - Increase to make LoRA updates stronger (rare)
- Decrease to make updates more subtle (rare)
Learning Rate and Optimization
learning_rate
Type: number
Default: 0.0001 (text), 0.0002 (image)
Typical range: 0.00001-0.001
Applies to: All models
The step size for parameter updates. Too high = unstable training, too low = slow convergence.
When to adjust:
- Decrease by 10x if training is unstable or loss is spiking
- Increase by 2-3x if training is too slow or plateau early
- Text models: Start with
0.0001 - Image models: Start with
0.0002
Batch Size and Memory
batch_size
Type: integer
Default: 1
Typical range: 1-8
Applies to: All models
Number of samples processed together in one training step.
Trade-offs:
- Larger batch size = faster training, more stable, but more memory
- Smaller batch size = slower training, less stable, but less memory
- Reduce to 1 if you get out-of-memory errors
- Increase to 2-4 if you have GPU memory to spare and want faster training
gradient_accumulation / grad_accum
Type: integer
Default: 1
Typical range: 1-16
Applies to: All models
Accumulate gradients over multiple steps before updating parameters. This simulates a larger batch size without using more memory.
When to use:
- Set to 4-8 if you want the stability of larger batches but don’t have the memory
- Effective batch size =
batch_size × gradient_accumulation
Training Duration
epochs (Text Models)
Type: integer
Default: 1
Typical range: 1-5
Applies to: Text generation models
Number of complete passes through the training dataset.
Guidelines:
- 1 epoch - Good starting point, often sufficient
- 2-3 epochs - For better learning on small datasets
- >5 epochs - Risk of overfitting
steps (Image/Video Models)
Type: integer
Default: 2000 (image), 3000 (editing)
Typical range: 1000-5000
Applies to: Image and video generation models
Total number of training steps (optimizer updates).
Guidelines:
- 1000 steps - Quick test runs
- 2000-3000 steps - Standard training
- 4000-5000 steps - Complex styles or large datasets
Logging and Checkpointing
logging_steps
Type: integer
Default: 10
Applies to: Text models
How often to log training metrics (loss, learning rate, etc.).
save_steps_ratio
Type: number
Default: 0.25
Range: 0.0-1.0
Applies to: Text models
Save checkpoints at this fraction of total training. For example, 0.25 with 4 epochs saves after each epoch.
save_strategy
Type: string
Default: "epoch"
Options: "epoch", "steps"
Applies to: Text models
When to save checkpoints:
"epoch"- Save at the end of each epoch"steps"- Save based onsave_steps_ratio
sample_every (Image/Video Models)
Type: integer
Default: 200
Applies to: Image and video models
Generate sample outputs every N steps to monitor progress visually.
Model-Specific Parameters
Text Generation
seq_length
Type: integer
Default: 1024
Range: 128-4096
Maximum sequence length for text. Longer = more context, but more memory.
neftune_noise_alpha
Type: number
Default: 0
Range: 0-15
Add noise during training for better generalization (NEFTune). Set to 5-15 to enable.
Image Generation/Editing
timestep_type
Type: string
Default: "sigmoid" (generation), "weighted" (editing)
Options: "weighted", "sigmoid", "linear"
How to sample timesteps during diffusion training.
"sigmoid"- Focus on mid-range timesteps (balanced)"weighted"- Focus on difficult timesteps"linear"- Uniform sampling (simple)
sample_width / sample_height
Type: integer
Default: 1024
Applies to: Image editing
Resolution for sample generation during training.
cache_text_embeddings
Type: boolean
Default: false
Applies to: Image models
Pre-compute and cache text embeddings for faster training.
Quick Reference Tables
Common Parameter Sets
Fast Iteration (Test Runs):Troubleshooting
Out of memory errors
Out of memory errors
- Reduce
batch_sizeto 1 - Reduce
lora_rankto 8 - Reduce
seq_length(text) orsample_width/sample_height(image) - Enable
cache_text_embeddings(image models)
Training loss not decreasing
Training loss not decreasing
- Increase
learning_rateby 2-3x - Check your data quality and column mappings
- Increase training duration (
epochsorsteps)
Loss spiking or unstable
Loss spiking or unstable
- Decrease
learning_rateby 10x - Increase
gradient_accumulationto smooth updates - Reduce
batch_sizeto 1
Results not matching my style
Results not matching my style
- Increase training duration (more
epochsorsteps) - Increase
lora_rankto 32 or 64 - Ensure your captions clearly describe the unique aspects
- Add more training data
Next Steps
- Quick Start Guides - Apply these parameters
- API Reference - Complete parameter lists
- Examples - See parameters in action