Fine-Tuning

The process of taking a pre-trained language model and training it further on a smaller, targeted dataset to teach it specific behaviours — following instructions, answering questions, refusing harmful requests, or specialising in a domain.


What is it?

After pre-training, a language model can predict the next token with impressive accuracy. But it has a problem: it does not know how to be useful. Ask it “What is the capital of France?” and it might continue with “What is the capital of Germany? What is the capital of Spain?” — because it learned to complete text patterns, not to answer questions.1

Fine-tuning solves this. It takes the pre-trained model (with its billions of learned weights) and continues training on a smaller, carefully curated dataset of examples that demonstrate the desired behaviour: instruction-response pairs, question-answer pairs, or domain-specific text. The model’s weights shift slightly to favour the new patterns while retaining the broad knowledge from pre-training.2

The result is the difference between a model that can write anything and a model that writes what you need. Modern LLMs go through two stages of post-training: supervised fine-tuning (SFT), which teaches the model to follow instructions, and alignment (RLHF/RLAIF), which teaches it to produce responses that humans prefer.3

In plain terms

Pre-training is reading every book in the library. Fine-tuning is job training. A new graduate (base model) has broad knowledge but does not know how your company works. On-the-job training (fine-tuning) teaches them to apply their knowledge to your specific needs — without making them forget everything they learned in school.


At a glance


How does it work?

1. Supervised Fine-Tuning (SFT) — learning from examples

SFT presents the model with thousands of high-quality instruction-response pairs:2

InstructionExpected response
”Summarise this article in three bullet points.”[A three-bullet summary]
“Translate this sentence to French.”[The French translation]
“Write a Python function that sorts a list.”[Working Python code]

The training process is the same as pre-training (predict the next token, compute loss, adjust weights) but the dataset is different: instead of raw web text, it consists of curated examples of helpful, well-formatted responses.1

The model learns patterns like:

  • Questions expect answers, not more questions
  • Instructions expect completion, not commentary
  • Conversations follow turn-taking structure
  • Responses should be relevant to the instruction

Think of it like...

Teaching by example. Instead of explaining the rules of good customer service, you show a new employee 10,000 recordings of excellent customer service interactions. They learn the patterns: greet the customer, listen to the problem, provide a solution, confirm satisfaction.

2. RLHF — learning from preferences

Supervised fine-tuning teaches the model what to do. Reinforcement Learning from Human Feedback (RLHF) teaches it how well to do it.3

The process:

  1. Generate multiple responses. The SFT model generates 2-4 responses to the same prompt.
  2. Human ranking. Human evaluators rank the responses from best to worst, considering helpfulness, accuracy, safety, and tone.
  3. Train a reward model. The rankings are used to train a separate model that predicts how humans would rate any given response. This reward model converts human preferences into a numerical score.3
  4. Optimise the language model. Using reinforcement learning (specifically PPO or DPO), the language model is adjusted to produce responses that score higher according to the reward model.4
graph TD
    A[Same prompt] --> B[Response A]
    A --> C[Response B]
    A --> D[Response C]
    B --> E[Human ranks: B best, A second, C worst]
    C --> E
    D --> E
    E --> F[Train reward model]
    F --> G[Optimise LLM to score higher]

    style E fill:#4a9ede,color:#fff
    style G fill:#5cb85c,color:#fff

RLAIF (Reinforcement Learning from AI Feedback) is a variant where another AI model provides the rankings instead of human evaluators, reducing cost while maintaining quality for many tasks.4

3. What RLHF actually changes

RLHF produces subtle but important shifts in model behaviour:3

Before RLHFAfter RLHF
Answers may be technically correct but unhelpfulAnswers are structured to be useful
May produce harmful or offensive contentLearns to refuse harmful requests
Verbose and unfocusedCalibrates response length to the question
No sense of uncertaintyExpresses uncertainty when appropriate
Treats all questions equallyPrioritises safety for sensitive topics

4. Domain fine-tuning — specialising for a field

Beyond instruction-following and alignment, fine-tuning can adapt a model to a specific domain:2

  • Medical models fine-tuned on clinical notes and medical literature
  • Legal models fine-tuned on case law and regulatory text
  • Code models fine-tuned on high-quality codebases with test coverage

Domain fine-tuning does not replace pre-training. It nudges the model’s probability distributions toward domain-specific patterns while retaining general capabilities.

5. Parameter-efficient fine-tuning — making it affordable

Full fine-tuning adjusts all of a model’s parameters, which for a 70B model means updating 70 billion numbers. This requires enormous GPU memory. LoRA (Low-Rank Adaptation) solves this by freezing the original weights and training small adapter matrices that modify the model’s behaviour with a fraction of the parameters:5

MethodParameters updatedGPU memory
Full fine-tuningAll (70B)Hundreds of GB
LoRA0.1-1% of totalA single high-end GPU
QLoRA0.1-1% (quantized)A consumer GPU (24GB)

This makes fine-tuning accessible to individual researchers and small teams, not just large labs.5


Why do we use it?

Key reasons

1. Behaviour. Pre-training gives the model knowledge. Fine-tuning gives it behaviour. Without fine-tuning, the model is a text completion engine, not an assistant.1

2. Alignment. RLHF aligns the model with human values and preferences — making it helpful, honest, and safe. This alignment does not emerge from pre-training alone.3

3. Specialisation. Domain fine-tuning transforms a general-purpose model into a specialist, improving accuracy and relevance for specific fields without the cost of training from scratch.2


When do we use it?

  • When building a custom model for a specific domain or task — fine-tuning adapts a pre-trained model to your data
  • When understanding model behaviour — knowing that the model was fine-tuned to be helpful explains why it answers questions instead of just completing text
  • When evaluating model quality — differences between models often come from fine-tuning and alignment choices, not from pre-training
  • When choosing between models — some models are fine-tuned for conversation, others for code, others for specific domains

Rule of thumb

If a model follows your instructions well, thank fine-tuning. If it refuses to help with something dangerous, thank RLHF. If it knows a lot but behaves oddly, it might be a base model without fine-tuning.


How can I think about it?

The medical school pipeline

Medical training follows a sequence remarkably similar to LLM training.

  • Pre-training = undergraduate education (broad knowledge across many subjects)
  • Supervised fine-tuning = medical school (learning from curated examples of good medical practice: patient histories, diagnoses, treatment plans)
  • RLHF = residency (experienced doctors evaluate the resident’s work and provide feedback, the resident adjusts their approach based on this supervision)
  • Domain fine-tuning = specialisation (a cardiologist focuses further on heart-related cases)
  • LoRA = a weekend workshop (you don’t redo your entire education; you add a small, focused skill on top of everything you already know)

The raw material to finished product

Think of manufacturing.

  • Pre-training = mining and refining raw metal (expensive, creates a versatile material)
  • Fine-tuning = forging and shaping (turning the metal into a specific tool: a hammer, a scalpel, a wrench)
  • RLHF = quality control (testing the tool against user expectations and adjusting until it meets standards)
  • The finished tool = the deployed model (shaped for a purpose, tested against standards, built from quality material)
  • LoRA = attaching a specialised head to an existing tool (swap the drill bit, not the whole drill)

Concepts to explore next

ConceptWhat it coversStatus
pre-trainingThe first training stage that gives the model its broad knowledgecomplete
evaluator-optimiserA pattern where one model evaluates and another optimises, related to the RLHF reward model dynamiccomplete
guardrailsConstraints that prevent models from acting outside intended scope, built on top of alignmentcomplete
human-in-the-loopKeeping humans involved in AI decision-making, the principle behind RLHFcomplete

Check your understanding


Where this concept fits

Position in the knowledge graph

graph TD
    AI[AI and Machine Learning] --> PT[Pre-Training]
    AI --> FT[Fine-Tuning]
    PT -->|produces base model| FT
    FT --> SFT[Supervised Fine-Tuning]
    FT --> RLHF[RLHF / Alignment]
    style FT fill:#4a9ede,color:#fff

Related concepts:

  • pre-training — fine-tuning builds on the foundation created during pre-training; you cannot fine-tune a model that has not been pre-trained
  • evaluator-optimiser — RLHF uses a reward model (evaluator) to improve the language model (optimiser), following the same pattern
  • human-in-the-loop — RLHF is a systematic application of human-in-the-loop design: human judgement shapes model behaviour
  • guardrails — alignment through RLHF is a training-time guardrail; runtime guardrails add additional safety layers on top

Sources


Further reading

Resources

Footnotes

  1. DataSci Ocean. (2025). The 3 Stages of LLM Training: A Deep Dive into RLHF. DataSci Ocean. 2 3

  2. Turing. (2026). What is Fine-Tuning LLM? Methods and Step-by-Step Guide. Turing. 2 3 4

  3. AWS. (2025). Fine-tune large language models with reinforcement learning from human or AI feedback. AWS Machine Learning Blog. 2 3 4 5

  4. Red Hat. (2025). Post-training methods for language models. Red Hat Developer. 2

  5. SolGuruz. (2026). Fine-Tuning LLMs: Complete Guide for 2026. SolGuruz. 2