Skip to content
LinkedInX

What is Fine-tuning?

About 5 minutes

Target audience: Those who want to understand how to customize LLMs, and those looking to understand when to use fine-tuning versus RAG
Prerequisites: Basic knowledge of Generative AI and LLMs

Fine-tuning is the process of taking a large pre-trained model and applying additional training to adapt it to a specific task or domain. Rather than training from scratch, fine-tuning starts from a model that already contains rich general knowledge, making it possible to achieve high performance with less data and computational cost.

This section covers fine-tuning comprehensively — from selecting the right method to data preparation, knowledge distillation, and cost management.

  1. This page: Understand the core concept of fine-tuning and when to use it versus RAG or prompt engineering
  2. Fine-tuning Methods: Learn the mechanics and tradeoffs of Full Fine-tuning, LoRA, QLoRA, and Adapters
  3. Data Preparation: Practical guidance on collecting, formatting, and quality-managing training data
  4. Knowledge Distillation: Understand how to transfer knowledge from large models to smaller ones

LLMs have broad general capabilities, but in the following situations, a standard model may be insufficient:

  • Handling specific industry terminology or internal jargon accurately
  • Producing output with a consistent tone, style, or format
  • Achieving stable, high accuracy on specific tasks (classification, extraction, summarization)
  • Embedding task-specific behavior without long prompts
  • Reducing API costs or inference latency by using a smaller specialized model

Fine-tuning is analogous to surgery that changes the model’s “personality” or “expertise.”

Choosing Between Fine-tuning, RAG, and Prompting

Section titled “Choosing Between Fine-tuning, RAG, and Prompting”

Each approach serves different purposes:

ApproachBest ForUpdate CostData Requirements
Prompt EngineeringGeneral tasks, prototyping, short-term useNear zeroNone
RAGReferencing current information and internal documentsLow (just add documents)Documents to search
Fine-tuningBehavior, style, and specialized task improvementRe-training costTraining data (hundreds to tens of thousands of examples)

In practice, the standard approach is to try prompt engineering first, then consider RAG, and only move to fine-tuning when those are insufficient.

flowchart TD
    A["Define the task requirements"] --> B{"Do you need\ncurrent info or\ninternal documents?"}
    B -- Yes --> C["Consider RAG"]
    B -- No --> D{"Is prompt-only\naccuracy sufficient?"}
    D -- Yes --> E["Prompt Engineering"]
    D -- No --> F{"Is the goal to improve\nbehavior, style,\nor specialized tasks?"}
    F -- Yes --> G["Consider Fine-tuning"]
    F -- No --> H["Revisit requirements"]
    C --> I["RAG + FT combination is also possible"]
    G --> I

Fine-tuning methods differ in computational cost and performance tradeoffs:

MethodDescriptionParameters UpdatedCostBest For
Full Fine-tuningUpdates all model parametersAll parametersHigh (large GPU usage)Sufficient GPU resources, major specialization needed
LoRAAdds low-rank matrices, trains a small number of parametersAdded matrices onlyLow–MediumGeneral domain adaptation, cost-sensitive use cases
QLoRACombines LoRA with 4-bit quantizationAdded matrices onlyLow (consumer GPU possible)Running on local GPU, minimizing cost
AdapterInserts small modules between model layersAdapters onlyLow–MediumModular switching between multiple tasks
Knowledge DistillationTrains a small model using a large model’s outputs as teacherFull small modelMedium (teacher inference cost)Need for a small, fast model

LoRA was proposed as a method that adds low-rank matrices to reduce the number of trainable parameters, while QLoRA applies LoRA to a 4-bit quantized base model.[1][2] Adapters are another representative PEFT method that inserts small modules into Transformer layers.[3] Libraries such as Hugging Face PEFT provide implementations of these methods.[4] Detailed implementation for each method is covered in Fine-tuning Methods.

graph LR
    A["Define objectives\ntask & metrics"] --> B["Collect & format\ntraining data"]
    B --> C["Select base model"]
    C --> D["Choose method\nLoRA / QLoRA / Full"]
    D --> E["Run training"]
    E --> F["Evaluate & validate"]
    F -->|Needs improvement| B
    F -->|Sufficient performance| G["Deploy & monitor"]

Fine-tuning is not a one-time process — it’s a cycle of improving data quality, tuning hyperparameters, and iterating on evaluation.

When the model becomes too specialized on training data, its general capabilities degrade. Always prepare a separate evaluation set that is isolated from training data.

Fine-tuning can cause the model to lose general capabilities it had before. Methods like LoRA are considered to be less susceptible to this risk.

Poor-quality training data will produce poor results regardless of which method you choose. Noisy data, contradictory data, and biased data require special attention. See Data Preparation for details.

Some base models have restrictions on commercial use or redistribution. Always review the license before fine-tuning.

  • Fine-tuning applies additional training to a pre-trained model to adapt it to specific tasks or domains
  • The general approach is: prompt engineering → RAG → fine-tuning
  • LoRA and QLoRA are widely used methods that enable fine-tuning with lower computational cost
  • Data quality is the key to success, and building an evaluation set is essential

Q: How much data is needed for fine-tuning?

A: It depends on the task and method. For simple classification tasks with LoRA, you can start with a few hundred examples. Complex generation tasks or broad behavioral changes may require thousands to tens of thousands of examples. Quality matters more than quantity.

Q: Can I fine-tune ChatGPT or Claude?

A: OpenAI provides model optimization features such as supervised fine-tuning, direct preference optimization, and reinforcement fine-tuning.[5] For Claude or open-source models, check each provider’s current documentation and license before planning fine-tuning.

Q: Can RAG and fine-tuning be combined?

A: Yes. A common production pattern is to use fine-tuning to improve the model’s output style and handling of specialized terminology, while using RAG to reference current information and internal documents.

Q: Are there ways to customize a model without fine-tuning?

A: System prompts, few-shot examples, and prompt templates are the most accessible form of customization. Anthropic’s Context Engineering techniques are also worth exploring.

  1. LoRA: Low-Rank Adaptation of Large Language Models
  2. QLoRA: Efficient Finetuning of Quantized LLMs
  3. Parameter-Efficient Transfer Learning for NLP
  4. Hugging Face PEFT Library
  5. OpenAI Fine-tuning Guide