Swiftorial Logo
Home
Swift Lessons
AI Tools
Learn More
Career
Resources

How to Fine-Tune LLMs

A step-by-step guide to adapting large language models to your domain with code samples, best practices, and troubleshooting tips.

1. Introduction to Fine-Tuning

Fine-tuning takes a pre-trained large language model and continues training it on your specific dataset, aligning it to your task, style, or domain.

  • Why It Matters: Achieve higher accuracy and consistency than prompt-only methods.
  • When to Use: Specialty domains like legal, medical, or internal policies.

2. Preparing Your Dataset

2.1 Data Collection

  • Identify representative examples: FAQs, support tickets, code snippets.
  • Include edge cases and failure scenarios to improve robustness.
  • Balance positive and negative examples for classification tasks.

2.2 Data Formatting

Use JSONL with clear prompt and completion fields:

{"prompt": "Summarize the following meeting notes:\n...", "completion": "Action items: ..."}

Tip: Tokenize and inspect lengths to avoid truncation.

3. Choosing a Fine-Tuning Method

3.1 Full-Parameter Tuning

Updates all model weights for maximum flexibility.

Tradeoff: High GPU/TPU cost and memory usage.

3.2 Parameter-Efficient Methods

  • LoRA (Low-Rank Adapters): Insert small trainable matrices. Minimal overhead.
  • Prefix Tuning: Learn soft prompt tokens. No model weight changes.
  • Adapters: Lightweight modules between layers. Easy to switch.

4. Environment Setup

  • Python: ≥3.8
  • Libraries: transformers, datasets, accelerate, peft
  • Hardware: NVIDIA GPUs with ≥16 GB VRAM or TPU v3.
  • Version Control: Track data and code in Git.

Install:

pip install transformers datasets accelerate peft

5. Full-Parameter Fine-Tuning Example

from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from datasets import load_dataset

# 1. Load model & tokenizer
model_name = 'gpt2-medium'
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

dataset = load_dataset('json', data_files='train.jsonl')

def preprocess(batch):
    inputs = tokenizer(batch['prompt'], truncation=True, padding='max_length')
    targets = tokenizer(batch['completion'], truncation=True, padding='max_length')
    inputs['labels'] = targets['input_ids']
    return inputs

dataset = dataset.map(preprocess, batched=True)

# 2. Training config
training_args = TrainingArguments(
    output_dir='out_full', num_train_epochs=3,
    per_device_train_batch_size=2, gradient_accumulation_steps=4,
    fp16=True, logging_steps=100, save_total_limit=2
)

# 3. Train
trainer = Trainer(model=model, args=training_args, train_dataset=dataset['train'])
trainer.train()

6. LoRA Fine-Tuning Example

from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments
from peft import LoraConfig, get_peft_model

# 1. Base model
base_model = AutoModelForCausalLM.from_pretrained('gpt2-medium')

# 2. LoRA config
lora_config = LoraConfig(
    r=4, lora_alpha=16, target_modules=['c_attn'],
    lora_dropout=0.05, task_type='CAUSAL_LM'
)
model = get_peft_model(base_model, lora_config)

tokenizer = AutoTokenizer.from_pretrained('gpt2-medium')
# 3. Training args same as above

# 4. Trainer & train
trainer = Trainer(model=model, args=training_args, train_dataset=dataset['train'])
trainer.train()

7. Hyperparameter Tuning

  • Learning Rate: Start at 1e-4 for full, 1e-3 for LoRA.
  • Batch Size: Maximize GPU usage without OOM.
  • Epochs: 2–5 depending on dataset size.
  • Warmup Steps: 5–10% of total steps to stabilize training.

8. Evaluation & Metrics

  • Use a held-out validation set (e.g., 10%).
  • Automatic Metrics: Perplexity, BLEU, ROUGE, EM accuracy.
  • Human Review: Randomly sample outputs for quality checks.

9. Deployment & Monitoring

  • Export: Save model with model.save_pretrained().
  • Inference: Use transformers.pipeline or FastAPI.
  • Logging: Track invocation latency and errors.
  • Drift Detection: Periodically evaluate on fresh data.

10. Troubleshooting Tips

  • OOM Errors: Reduce batch size or use gradient accumulation.
  • Unstable Loss: Lower learning rate or increase warmup.
  • Poor Quality: Add more diverse examples or augment data.
  • Adapter Issues: Check target_modules and dropout settings.

11. Example Use Cases

11.1 Customer Support

Automate responses using past tickets as training data, reducing response time by 40%.

11.2 Code Completion

Fine-tune on internal repos to suggest idiomatic functions and standards.

11.3 Medical Summaries

Summarize patient notes with 95% accuracy on key medical entities.

← Back to Articles