Fine-Tuning
The process of taking a pre-trained language model and training it further on a smaller, targeted dataset to teach it specific behaviours — following instructions, answering questions, refusing harmful requests, or specialising in a domain.
What is it?
After pre-training, a language model can predict the next token with impressive accuracy. But it has a problem: it does not know how to be useful. Ask it “What is the capital of France?” and it might continue with “What is the capital of Germany? What is the capital of Spain?” — because it learned to complete text patterns, not to answer questions.1
Fine-tuning solves this. It takes the pre-trained model (with its billions of learned weights) and continues training on a smaller, carefully curated dataset of examples that demonstrate the desired behaviour: instruction-response pairs, question-answer pairs, or domain-specific text. The model’s weights shift slightly to favour the new patterns while retaining the broad knowledge from pre-training.2
The result is the difference between a model that can write anything and a model that writes what you need. Modern LLMs go through two stages of post-training: supervised fine-tuning (SFT), which teaches the model to follow instructions, and alignment (RLHF/RLAIF), which teaches it to produce responses that humans prefer.3
In plain terms
Pre-training is reading every book in the library. Fine-tuning is job training. A new graduate (base model) has broad knowledge but does not know how your company works. On-the-job training (fine-tuning) teaches them to apply their knowledge to your specific needs — without making them forget everything they learned in school.
At a glance
From base model to deployed model (click to expand)
graph LR A[Base model] --> B[Supervised Fine-Tuning] B --> C[Instruction model] C --> D[RLHF / RLAIF] D --> E[Aligned model] style A fill:#e8b84b,color:#fff style B fill:#4a9ede,color:#fff style D fill:#5cb85c,color:#fff style E fill:#9b59b6,color:#fffKey: The base model from pre-training goes through two stages. SFT teaches it to follow instructions. RLHF teaches it to produce responses humans prefer. The aligned model is what you interact with as a user.
How does it work?
1. Supervised Fine-Tuning (SFT) — learning from examples
SFT presents the model with thousands of high-quality instruction-response pairs:2
| Instruction | Expected response |
|---|---|
| ”Summarise this article in three bullet points.” | [A three-bullet summary] |
| “Translate this sentence to French.” | [The French translation] |
| “Write a Python function that sorts a list.” | [Working Python code] |
The training process is the same as pre-training (predict the next token, compute loss, adjust weights) but the dataset is different: instead of raw web text, it consists of curated examples of helpful, well-formatted responses.1
The model learns patterns like:
- Questions expect answers, not more questions
- Instructions expect completion, not commentary
- Conversations follow turn-taking structure
- Responses should be relevant to the instruction
Think of it like...
Teaching by example. Instead of explaining the rules of good customer service, you show a new employee 10,000 recordings of excellent customer service interactions. They learn the patterns: greet the customer, listen to the problem, provide a solution, confirm satisfaction.
2. RLHF — learning from preferences
Supervised fine-tuning teaches the model what to do. Reinforcement Learning from Human Feedback (RLHF) teaches it how well to do it.3
The process:
- Generate multiple responses. The SFT model generates 2-4 responses to the same prompt.
- Human ranking. Human evaluators rank the responses from best to worst, considering helpfulness, accuracy, safety, and tone.
- Train a reward model. The rankings are used to train a separate model that predicts how humans would rate any given response. This reward model converts human preferences into a numerical score.3
- Optimise the language model. Using reinforcement learning (specifically PPO or DPO), the language model is adjusted to produce responses that score higher according to the reward model.4
graph TD A[Same prompt] --> B[Response A] A --> C[Response B] A --> D[Response C] B --> E[Human ranks: B best, A second, C worst] C --> E D --> E E --> F[Train reward model] F --> G[Optimise LLM to score higher] style E fill:#4a9ede,color:#fff style G fill:#5cb85c,color:#fff
RLAIF (Reinforcement Learning from AI Feedback) is a variant where another AI model provides the rankings instead of human evaluators, reducing cost while maintaining quality for many tasks.4
3. What RLHF actually changes
RLHF produces subtle but important shifts in model behaviour:3
| Before RLHF | After RLHF |
|---|---|
| Answers may be technically correct but unhelpful | Answers are structured to be useful |
| May produce harmful or offensive content | Learns to refuse harmful requests |
| Verbose and unfocused | Calibrates response length to the question |
| No sense of uncertainty | Expresses uncertainty when appropriate |
| Treats all questions equally | Prioritises safety for sensitive topics |
Example: SFT vs RLHF response (click to expand)
Prompt: “How do I pick a lock?”
SFT model: Provides step-by-step lock-picking instructions (it learned this pattern from training data).
RLHF model: “Lock picking is a legitimate skill used by locksmiths and security professionals. If you’re locked out of your own property, I’d recommend contacting a licensed locksmith. If you’re interested in the skill professionally, consider a locksmith training program.”
The RLHF model learned that human evaluators preferred responses that consider context, safety, and intent.
4. Domain fine-tuning — specialising for a field
Beyond instruction-following and alignment, fine-tuning can adapt a model to a specific domain:2
- Medical models fine-tuned on clinical notes and medical literature
- Legal models fine-tuned on case law and regulatory text
- Code models fine-tuned on high-quality codebases with test coverage
Domain fine-tuning does not replace pre-training. It nudges the model’s probability distributions toward domain-specific patterns while retaining general capabilities.
5. Parameter-efficient fine-tuning — making it affordable
Full fine-tuning adjusts all of a model’s parameters, which for a 70B model means updating 70 billion numbers. This requires enormous GPU memory. LoRA (Low-Rank Adaptation) solves this by freezing the original weights and training small adapter matrices that modify the model’s behaviour with a fraction of the parameters:5
| Method | Parameters updated | GPU memory |
|---|---|---|
| Full fine-tuning | All (70B) | Hundreds of GB |
| LoRA | 0.1-1% of total | A single high-end GPU |
| QLoRA | 0.1-1% (quantized) | A consumer GPU (24GB) |
This makes fine-tuning accessible to individual researchers and small teams, not just large labs.5
Why do we use it?
Key reasons
1. Behaviour. Pre-training gives the model knowledge. Fine-tuning gives it behaviour. Without fine-tuning, the model is a text completion engine, not an assistant.1
2. Alignment. RLHF aligns the model with human values and preferences — making it helpful, honest, and safe. This alignment does not emerge from pre-training alone.3
3. Specialisation. Domain fine-tuning transforms a general-purpose model into a specialist, improving accuracy and relevance for specific fields without the cost of training from scratch.2
When do we use it?
- When building a custom model for a specific domain or task — fine-tuning adapts a pre-trained model to your data
- When understanding model behaviour — knowing that the model was fine-tuned to be helpful explains why it answers questions instead of just completing text
- When evaluating model quality — differences between models often come from fine-tuning and alignment choices, not from pre-training
- When choosing between models — some models are fine-tuned for conversation, others for code, others for specific domains
Rule of thumb
If a model follows your instructions well, thank fine-tuning. If it refuses to help with something dangerous, thank RLHF. If it knows a lot but behaves oddly, it might be a base model without fine-tuning.
How can I think about it?
The medical school pipeline
Medical training follows a sequence remarkably similar to LLM training.
- Pre-training = undergraduate education (broad knowledge across many subjects)
- Supervised fine-tuning = medical school (learning from curated examples of good medical practice: patient histories, diagnoses, treatment plans)
- RLHF = residency (experienced doctors evaluate the resident’s work and provide feedback, the resident adjusts their approach based on this supervision)
- Domain fine-tuning = specialisation (a cardiologist focuses further on heart-related cases)
- LoRA = a weekend workshop (you don’t redo your entire education; you add a small, focused skill on top of everything you already know)
The raw material to finished product
Think of manufacturing.
- Pre-training = mining and refining raw metal (expensive, creates a versatile material)
- Fine-tuning = forging and shaping (turning the metal into a specific tool: a hammer, a scalpel, a wrench)
- RLHF = quality control (testing the tool against user expectations and adjusting until it meets standards)
- The finished tool = the deployed model (shaped for a purpose, tested against standards, built from quality material)
- LoRA = attaching a specialised head to an existing tool (swap the drill bit, not the whole drill)
Concepts to explore next
| Concept | What it covers | Status |
|---|---|---|
| pre-training | The first training stage that gives the model its broad knowledge | complete |
| evaluator-optimiser | A pattern where one model evaluates and another optimises, related to the RLHF reward model dynamic | complete |
| guardrails | Constraints that prevent models from acting outside intended scope, built on top of alignment | complete |
| human-in-the-loop | Keeping humans involved in AI decision-making, the principle behind RLHF | complete |
Check your understanding
Test yourself (click to expand)
- Explain why a pre-trained model needs fine-tuning before it can be a useful assistant. What is it missing?
- Describe the difference between supervised fine-tuning and RLHF. What does each stage teach the model?
- Distinguish between full fine-tuning and LoRA. When would you choose one over the other?
- Interpret this scenario: two models are based on the same pre-trained base but produce very different responses to the same prompt. What is the most likely explanation?
- Connect fine-tuning to the concept of guardrails. How does RLHF-based alignment relate to runtime guardrails in a deployed system?
Where this concept fits
Position in the knowledge graph
graph TD AI[AI and Machine Learning] --> PT[Pre-Training] AI --> FT[Fine-Tuning] PT -->|produces base model| FT FT --> SFT[Supervised Fine-Tuning] FT --> RLHF[RLHF / Alignment] style FT fill:#4a9ede,color:#fffRelated concepts:
- pre-training — fine-tuning builds on the foundation created during pre-training; you cannot fine-tune a model that has not been pre-trained
- evaluator-optimiser — RLHF uses a reward model (evaluator) to improve the language model (optimiser), following the same pattern
- human-in-the-loop — RLHF is a systematic application of human-in-the-loop design: human judgement shapes model behaviour
- guardrails — alignment through RLHF is a training-time guardrail; runtime guardrails add additional safety layers on top
Sources
Further reading
Resources
- The 3 Stages of LLM Training (DataSci Ocean) — Clear overview of the full training pipeline from pre-training through RLHF
- Fine-tune LLMs with RLHF (AWS) — Technical walkthrough of RLHF and RLAIF with practical implementation details
- Post-training methods for language models (Red Hat) — Comprehensive overview of SFT, RLHF, DPO, and modern alignment techniques
- Fine-Tuning LLMs: Complete Guide (Turing) — Step-by-step guide covering full fine-tuning, LoRA, and QLoRA with practical advice
- A Comprehensive Guide to Fine-Tuning LLMs using RLHF (Ionio AI) — Deep technical reference on the RLHF process and reward model training
Footnotes
-
DataSci Ocean. (2025). The 3 Stages of LLM Training: A Deep Dive into RLHF. DataSci Ocean. ↩ ↩2 ↩3
-
Turing. (2026). What is Fine-Tuning LLM? Methods and Step-by-Step Guide. Turing. ↩ ↩2 ↩3 ↩4
-
AWS. (2025). Fine-tune large language models with reinforcement learning from human or AI feedback. AWS Machine Learning Blog. ↩ ↩2 ↩3 ↩4 ↩5
-
Red Hat. (2025). Post-training methods for language models. Red Hat Developer. ↩ ↩2
-
SolGuruz. (2026). Fine-Tuning LLMs: Complete Guide for 2026. SolGuruz. ↩ ↩2
