What is Fine-tuning?

About 5 minutes

Those who want to understand how to customize LLMs, and those looking to understand when to use fine-tuning versus RAG

Basic knowledge of Generative AI and LLMs

Fine-tuning is the process of taking a large pre-trained model and applying additional training to adapt it to a specific task or domain. Rather than training from scratch, fine-tuning starts from a model that already contains rich general knowledge, making it possible to achieve high performance with less data and computational cost.

Learning Path for Fine-tuning

This section covers fine-tuning comprehensively — from selecting the right method to data preparation, knowledge distillation, and cost management.

This page: Understand the core concept of fine-tuning and when to use it versus RAG or prompt engineering
Fine-tuning Methods: Learn the mechanics and tradeoffs of Full Fine-tuning, LoRA, QLoRA, and Adapters
Data Preparation: Practical guidance on collecting, formatting, and quality-managing training data
Knowledge Distillation: Understand how to transfer knowledge from large models to smaller ones

Why Fine-tuning Is Needed

LLMs have broad general capabilities, but in the following situations, a standard model may be insufficient:

Handling specific industry terminology or internal jargon accurately
Producing output with a consistent tone, style, or format
Achieving stable, high accuracy on specific tasks (classification, extraction, summarization)
Embedding task-specific behavior without long prompts
Reducing API costs or inference latency by using a smaller specialized model

Fine-tuning is analogous to surgery that changes the model’s “personality” or “expertise.”

Choosing Between Fine-tuning, RAG, and Prompting

Each approach serves different purposes:

Approach	Best For	Update Cost	Data Requirements
Prompt Engineering	General tasks, prototyping, short-term use	Near zero	None
RAG	Referencing current information and internal documents	Low (just add documents)	Documents to search
Fine-tuning	Behavior, style, and specialized task improvement	Re-training cost	Training data (hundreds to tens of thousands of examples)

In practice, the standard approach is to try prompt engineering first, then consider RAG, and only move to fine-tuning when those are insufficient.

flowchart TD
    A["Define the task requirements"] --> B{"Do you need\ncurrent info or\ninternal documents?"}
    B -- Yes --> C["Consider RAG"]
    B -- No --> D{"Is prompt-only\naccuracy sufficient?"}
    D -- Yes --> E["Prompt Engineering"]
    D -- No --> F{"Is the goal to improve\nbehavior, style,\nor specialized tasks?"}
    F -- Yes --> G["Consider Fine-tuning"]
    F -- No --> H["Revisit requirements"]
    C --> I["RAG + FT combination is also possible"]
    G --> I

Overview of Key Methods

Fine-tuning methods differ in computational cost and performance tradeoffs:

Method	Description	Parameters Updated	Cost	Best For
Full Fine-tuning	Updates all model parameters	All parameters	High (large GPU usage)	Sufficient GPU resources, major specialization needed
LoRA	Adds low-rank matrices, trains a small number of parameters	Added matrices only	Low–Medium	General domain adaptation, cost-sensitive use cases
QLoRA	Combines LoRA with 4-bit quantization	Added matrices only	Low (consumer GPU possible)	Running on local GPU, minimizing cost
Adapter	Inserts small modules between model layers	Adapters only	Low–Medium	Modular switching between multiple tasks
Knowledge Distillation	Trains a small model using a large model’s outputs as teacher	Full small model	Medium (teacher inference cost)	Need for a small, fast model

LoRA was proposed as a method that adds low-rank matrices to reduce the number of trainable parameters, while QLoRA applies LoRA to a 4-bit quantized base model.[1][2] Adapters are another representative PEFT method that inserts small modules into Transformer layers.[3] Libraries such as Hugging Face PEFT provide implementations of these methods.[4] Detailed implementation for each method is covered in Fine-tuning Methods.

The Fine-tuning Workflow

graph LR
    A["Define objectives\ntask & metrics"] --> B["Collect & format\ntraining data"]
    B --> C["Select base model"]
    C --> D["Choose method\nLoRA / QLoRA / Full"]
    D --> E["Run training"]
    E --> F["Evaluate & validate"]
    F -->|Needs improvement| B
    F -->|Sufficient performance| G["Deploy & monitor"]

Fine-tuning is not a one-time process — it’s a cycle of improving data quality, tuning hyperparameters, and iterating on evaluation.

Key Pitfalls

Overfitting

When the model becomes too specialized on training data, its general capabilities degrade. Always prepare a separate evaluation set that is isolated from training data.

Catastrophic Forgetting

Fine-tuning can cause the model to lose general capabilities it had before. Methods like LoRA are considered to be less susceptible to this risk.

Data Quality Is Critical

Poor-quality training data will produce poor results regardless of which method you choose. Noisy data, contradictory data, and biased data require special attention. See Data Preparation for details.

Licenses and Terms of Use

Some base models have restrictions on commercial use or redistribution. Always review the license before fine-tuning.

Summary

Fine-tuning applies additional training to a pre-trained model to adapt it to specific tasks or domains
The general approach is: prompt engineering → RAG → fine-tuning
LoRA and QLoRA are widely used methods that enable fine-tuning with lower computational cost
Data quality is the key to success, and building an evaluation set is essential

FAQ

Q: How much data is needed for fine-tuning?

A: It depends on the task and method. For simple classification tasks with LoRA, you can start with a few hundred examples. Complex generation tasks or broad behavioral changes may require thousands to tens of thousands of examples. Quality matters more than quantity.

Q: Can I fine-tune ChatGPT or Claude?

A: OpenAI provides model optimization features such as supervised fine-tuning, direct preference optimization, and reinforcement fine-tuning.[5] For Claude or open-source models, check each provider’s current documentation and license before planning fine-tuning.

Q: Can RAG and fine-tuning be combined?

A: Yes. A common production pattern is to use fine-tuning to improve the model’s output style and handling of specialized terminology, while using RAG to reference current information and internal documents.

Q: Are there ways to customize a model without fine-tuning?

A: System prompts, few-shot examples, and prompt templates are the most accessible form of customization. Anthropic’s Context Engineering techniques are also worth exploring.

References

Fine-tuning Methods Compared

Choosing a Vector Database