Fine-Tuning

The process of taking a pre-trained language model and training it further on a smaller, targeted dataset to teach it specific behaviours — following instructions, answering questions, refusing harmful requests, or specialising in a domain.

What is it?

After pre-training, a language model can predict the next token with impressive accuracy. But it has a problem: it does not know how to be useful. Ask it “What is the capital of France?” and it might continue with “What is the capital of Germany? What is the capital of Spain?” — because it learned to complete text patterns, not to answer questions.¹

Fine-tuning solves this. It takes the pre-trained model (with its billions of learned weights) and continues training on a smaller, carefully curated dataset of examples that demonstrate the desired behaviour: instruction-response pairs, question-answer pairs, or domain-specific text. The model’s weights shift slightly to favour the new patterns while retaining the broad knowledge from pre-training.²

The result is the difference between a model that can write anything and a model that writes what you need. Modern LLMs go through two stages of post-training: supervised fine-tuning (SFT), which teaches the model to follow instructions, and alignment (RLHF/RLAIF), which teaches it to produce responses that humans prefer.³

In plain terms

Pre-training is reading every book in the library. Fine-tuning is job training. A new graduate (base model) has broad knowledge but does not know how your company works. On-the-job training (fine-tuning) teaches them to apply their knowledge to your specific needs — without making them forget everything they learned in school.

At a glance

From base model to deployed model (click to expand)
graph LR
    A[Base model] --> B[Supervised Fine-Tuning]
    B --> C[Instruction model]
    C --> D[RLHF / RLAIF]
    D --> E[Aligned model]

    style A fill:#e8b84b,color:#fff
    style B fill:#4a9ede,color:#fff
    style D fill:#5cb85c,color:#fff
    style E fill:#9b59b6,color:#fff
Key: The base model from pre-training goes through two stages. SFT teaches it to follow instructions. RLHF teaches it to produce responses humans prefer. The aligned model is what you interact with as a user.

How does it work?

1. Supervised Fine-Tuning (SFT) — learning from examples

SFT presents the model with thousands of high-quality instruction-response pairs:²

Instruction	Expected response
”Summarise this article in three bullet points.”	[A three-bullet summary]
“Translate this sentence to French.”	[The French translation]
“Write a Python function that sorts a list.”	[Working Python code]

The training process is the same as pre-training (predict the next token, compute loss, adjust weights) but the dataset is different: instead of raw web text, it consists of curated examples of helpful, well-formatted responses.¹

The model learns patterns like:

Questions expect answers, not more questions
Instructions expect completion, not commentary
Conversations follow turn-taking structure
Responses should be relevant to the instruction

Think of it like...

Teaching by example. Instead of explaining the rules of good customer service, you show a new employee 10,000 recordings of excellent customer service interactions. They learn the patterns: greet the customer, listen to the problem, provide a solution, confirm satisfaction.

2. RLHF — learning from preferences

Supervised fine-tuning teaches the model what to do. Reinforcement Learning from Human Feedback (RLHF) teaches it how well to do it.³

The process:

Generate multiple responses. The SFT model generates 2-4 responses to the same prompt.
Human ranking. Human evaluators rank the responses from best to worst, considering helpfulness, accuracy, safety, and tone.
Train a reward model. The rankings are used to train a separate model that predicts how humans would rate any given response. This reward model converts human preferences into a numerical score.³
Optimise the language model. Using reinforcement learning (specifically PPO or DPO), the language model is adjusted to produce responses that score higher according to the reward model.⁴

graph TD
    A[Same prompt] --> B[Response A]
    A --> C[Response B]
    A --> D[Response C]
    B --> E[Human ranks: B best, A second, C worst]
    C --> E
    D --> E
    E --> F[Train reward model]
    F --> G[Optimise LLM to score higher]

    style E fill:#4a9ede,color:#fff
    style G fill:#5cb85c,color:#fff

RLAIF (Reinforcement Learning from AI Feedback) is a variant where another AI model provides the rankings instead of human evaluators, reducing cost while maintaining quality for many tasks.⁴

3. What RLHF actually changes

RLHF produces subtle but important shifts in model behaviour:³

Before RLHF	After RLHF
Answers may be technically correct but unhelpful	Answers are structured to be useful
May produce harmful or offensive content	Learns to refuse harmful requests
Verbose and unfocused	Calibrates response length to the question
No sense of uncertainty	Expresses uncertainty when appropriate
Treats all questions equally	Prioritises safety for sensitive topics

Example: SFT vs RLHF response (click to expand)

Prompt: “How do I pick a lock?”

SFT model: Provides step-by-step lock-picking instructions (it learned this pattern from training data).

RLHF model: “Lock picking is a legitimate skill used by locksmiths and security professionals. If you’re locked out of your own property, I’d recommend contacting a licensed locksmith. If you’re interested in the skill professionally, consider a locksmith training program.”

The RLHF model learned that human evaluators preferred responses that consider context, safety, and intent.

4. Domain fine-tuning — specialising for a field

Beyond instruction-following and alignment, fine-tuning can adapt a model to a specific domain:²

Medical models fine-tuned on clinical notes and medical literature
Legal models fine-tuned on case law and regulatory text
Code models fine-tuned on high-quality codebases with test coverage

Domain fine-tuning does not replace pre-training. It nudges the model’s probability distributions toward domain-specific patterns while retaining general capabilities.

5. Parameter-efficient fine-tuning — making it affordable

Full fine-tuning adjusts all of a model’s parameters, which for a 70B model means updating 70 billion numbers. This requires enormous GPU memory. LoRA (Low-Rank Adaptation) solves this by freezing the original weights and training small adapter matrices that modify the model’s behaviour with a fraction of the parameters:⁵

Method	Parameters updated	GPU memory
Full fine-tuning	All (70B)	Hundreds of GB
LoRA	0.1-1% of total	A single high-end GPU
QLoRA	0.1-1% (quantized)	A consumer GPU (24GB)

This makes fine-tuning accessible to individual researchers and small teams, not just large labs.⁵

Why do we use it?

Key reasons

1. Behaviour. Pre-training gives the model knowledge. Fine-tuning gives it behaviour. Without fine-tuning, the model is a text completion engine, not an assistant.¹

2. Alignment. RLHF aligns the model with human values and preferences — making it helpful, honest, and safe. This alignment does not emerge from pre-training alone.³

3. Specialisation. Domain fine-tuning transforms a general-purpose model into a specialist, improving accuracy and relevance for specific fields without the cost of training from scratch.²

When do we use it?

When building a custom model for a specific domain or task — fine-tuning adapts a pre-trained model to your data
When understanding model behaviour — knowing that the model was fine-tuned to be helpful explains why it answers questions instead of just completing text
When evaluating model quality — differences between models often come from fine-tuning and alignment choices, not from pre-training
When choosing between models — some models are fine-tuned for conversation, others for code, others for specific domains

Rule of thumb

If a model follows your instructions well, thank fine-tuning. If it refuses to help with something dangerous, thank RLHF. If it knows a lot but behaves oddly, it might be a base model without fine-tuning.

How can I think about it?

The medical school pipeline

Medical training follows a sequence remarkably similar to LLM training.

Pre-training = undergraduate education (broad knowledge across many subjects)

Supervised fine-tuning = medical school (learning from curated examples of good medical practice: patient histories, diagnoses, treatment plans)

RLHF = residency (experienced doctors evaluate the resident’s work and provide feedback, the resident adjusts their approach based on this supervision)

Domain fine-tuning = specialisation (a cardiologist focuses further on heart-related cases)

LoRA = a weekend workshop (you don’t redo your entire education; you add a small, focused skill on top of everything you already know)

The raw material to finished product

Think of manufacturing.

Pre-training = mining and refining raw metal (expensive, creates a versatile material)

Fine-tuning = forging and shaping (turning the metal into a specific tool: a hammer, a scalpel, a wrench)

RLHF = quality control (testing the tool against user expectations and adjusting until it meets standards)

The finished tool = the deployed model (shaped for a purpose, tested against standards, built from quality material)

LoRA = attaching a specialised head to an existing tool (swap the drill bit, not the whole drill)

Concepts to explore next

Concept	What it covers	Status
pre-training	The first training stage that gives the model its broad knowledge	complete
evaluator-optimiser	A pattern where one model evaluates and another optimises, related to the RLHF reward model dynamic	complete
guardrails	Constraints that prevent models from acting outside intended scope, built on top of alignment	complete
human-in-the-loop	Keeping humans involved in AI decision-making, the principle behind RLHF	complete

Check your understanding

Test yourself (click to expand)

Explain why a pre-trained model needs fine-tuning before it can be a useful assistant. What is it missing?

Describe the difference between supervised fine-tuning and RLHF. What does each stage teach the model?

Distinguish between full fine-tuning and LoRA. When would you choose one over the other?

Interpret this scenario: two models are based on the same pre-trained base but produce very different responses to the same prompt. What is the most likely explanation?

Connect fine-tuning to the concept of guardrails. How does RLHF-based alignment relate to runtime guardrails in a deployed system?

Where this concept fits

Position in the knowledge graph
graph TD
    AI[AI and Machine Learning] --> PT[Pre-Training]
    AI --> FT[Fine-Tuning]
    PT -->|produces base model| FT
    FT --> SFT[Supervised Fine-Tuning]
    FT --> RLHF[RLHF / Alignment]
    style FT fill:#4a9ede,color:#fff
Related concepts:

pre-training — fine-tuning builds on the foundation created during pre-training; you cannot fine-tune a model that has not been pre-trained

evaluator-optimiser — RLHF uses a reward model (evaluator) to improve the language model (optimiser), following the same pattern

human-in-the-loop — RLHF is a systematic application of human-in-the-loop design: human judgement shapes model behaviour

guardrails — alignment through RLHF is a training-time guardrail; runtime guardrails add additional safety layers on top

Explorer

Fine-Tuning

Fine-Tuning

What is it?

At a glance

How does it work?

1. Supervised Fine-Tuning (SFT) — learning from examples

2. RLHF — learning from preferences

3. What RLHF actually changes

4. Domain fine-tuning — specialising for a field

5. Parameter-efficient fine-tuning — making it affordable

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Graph View

Table of Contents

Backlinks

Explorer

Fine-Tuning

Fine-Tuning

What is it?

At a glance

How does it work?

1. Supervised Fine-Tuning (SFT) — learning from examples

2. RLHF — learning from preferences

3. What RLHF actually changes

4. Domain fine-tuning — specialising for a field

5. Parameter-efficient fine-tuning — making it affordable

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Footnotes

Graph View

Table of Contents

Backlinks