Prompt Chaining

Breaking a complex task into a strict sequence of LLM calls where each step consumes the previous step’s output and performs a single, focused transformation.

What is it?

When you ask a language model to do something simple — summarise a paragraph, translate a sentence, extract a date — a single prompt works well. But the moment you need the model to research, then analyse, then write, then format, a single prompt starts to buckle. It has too many objectives competing for attention, no checkpoints, and no way to tell where things went wrong when the output is off.¹

Prompt chaining is the fix. Instead of one massive prompt that says “do everything,” you write a short sequence of prompts, each with one clear job. The output of step 1 becomes the input of step 2, and so on until the final result emerges. Between steps, you can insert programmatic checks — called gates — that verify the intermediate output before passing it forward.²

Anthropic’s Building Effective Agents guide identifies prompt chaining as the foundational workflow pattern — the simplest and most common way to structure an LLM pipeline. It sits at the base of a complexity ladder: you start with prompt chaining and only reach for more advanced patterns (routing, parallelisation, orchestrator-workers) when sequential decomposition is not enough.²

The parent concept, llm-pipelines, covers the full landscape of pipeline patterns. Prompt chaining is the entry point — the pattern you should learn first and use most often.

In plain terms

A single prompt is like asking someone to bake a cake by saying “make me a cake.” Prompt chaining is like handing them a recipe card for each stage: first measure the ingredients, then mix the batter, then bake, then decorate. Each card has one instruction. If the batter is wrong, you catch it before you waste an hour baking.

At a glance

Sequential flow with validation gates (click to expand)
graph LR
    A[Input] --> B[Step 1 - Extract]
    B --> G1{Gate}
    G1 -->|pass| C[Step 2 - Transform]
    C --> G2{Gate}
    G2 -->|pass| D[Step 3 - Generate]
    D --> E[Output]
    G1 -->|fail| B
    G2 -->|fail| C
Key: Each rectangle is a single LLM call with a focused job. Diamond gates are programmatic checks (not LLM calls) that verify intermediate output. If a gate fails, the step is retried or repaired without restarting the entire chain.

How does it work?

1. Decomposition — one job per step

The core principle is that each step in the chain should have exactly one clear job.² If you find yourself writing a step that does two things — say, “extract the key facts and then evaluate their reliability” — split it into two steps.

This matters because LLMs perform better when they focus on a single reasoning task at a time. Packing multiple objectives into one prompt forces the model to juggle competing concerns, and important details get lost or deprioritised.¹ Research on how language models use long contexts shows that information buried in the middle of a large prompt is often missed — the “lost-in-the-middle” effect.³

Think of it like...

A relay race. Each runner covers one leg. No runner tries to run the entire distance. The baton (the output) passes cleanly at each handoff point. If one runner stumbles, only that leg needs to be re-run.

2. The input-output contract

Every step in a chain has an implicit contract: it expects input in a specific shape and produces output in a specific shape. This contract is what makes chaining reliable.⁴

For example, if step 1 outputs a JSON list of facts and step 2 expects a JSON list of facts, the handoff is clean. If step 1 outputs free-form prose and step 2 tries to parse it as structured data, the chain breaks.

Best practice is to specify the output format explicitly in each step’s prompt — “Output: a numbered list of facts, one per line” or “Output: a JSON object with keys title, summary, tags.” Structured outputs reduce ambiguity and make gates easier to implement.⁴

Example: a content pipeline with explicit contracts (click to expand)

Consider a three-step content creation chain:

Step Instruction Expected output
1 - Extract Read this article and extract the 5 most important points Numbered list of 5 bullet points
2 - Draft Write a 300-word summary using only these 5 points Prose paragraph, 250-350 words
3 - Format Reformat this summary as a LinkedIn post with a hook, body, and call to action Structured post with labelled sections

Each step receives a clear input and produces a clear output. The chain progressively transforms raw material into a finished product, with each transformation being simple enough that the model handles it reliably.

Step	Instruction	Expected output
1 - Extract	Read this article and extract the 5 most important points	Numbered list of 5 bullet points
2 - Draft	Write a 300-word summary using only these 5 points	Prose paragraph, 250-350 words
3 - Format	Reformat this summary as a LinkedIn post with a hook, body, and call to action	Structured post with labelled sections

3. Validation gates

A gate is a programmatic check inserted between steps. It inspects the output of the previous step and decides whether to pass it forward, retry the step, or halt the chain.²

Gates are not LLM calls — they are deterministic code. Examples include:

Length check: Does the output have the expected number of items?
Format check: Is the output valid JSON? Does it contain the required keys?
Content check: Does the output mention all required topics? Are there any prohibited terms?
Threshold check: Does a confidence score exceed the minimum?

Gates are what give prompt chaining its reliability advantage over monolithic prompts. Without gates, errors in early steps silently propagate through the entire chain. With gates, errors are caught at their source.⁵

Key distinction

A gate is binary — pass or fail. It catches problems. A feedback loop (the evaluator-optimiser pattern) is iterative — it provides detailed feedback that guides revision. Gates are checkpoints; feedback loops are coaching sessions. Start with gates. Add feedback loops only when gate failures are frequent and the fix requires nuanced judgement.

4. Error handling and retry logic

When a gate fails, you have three options:⁴

Retry the step with the same input (useful when failures are stochastic — the model might succeed on a second attempt)
Retry with modified input — add clarifying instructions or rephrase the prompt
Halt the chain and surface the error for human review

A common pattern is to allow one automatic retry before halting. This keeps the chain moving without entering infinite loops.

Think of it like...

A factory quality inspector at the end of an assembly station. If a part fails inspection, the inspector does not redesign the entire factory. They send the part back to that one station for rework. If it fails again, they pull it off the line for a human to examine.

Why do we use it?

Key reasons

1. Accuracy through focus. Each step asks the model to do one thing well rather than juggling multiple concerns. Prompt chaining can improve task accuracy by 10 to 30 percent on complex multi-step operations compared to equivalent single-prompt approaches.¹

2. Debuggability. Every intermediate output is an inspectable artifact. When the final result is wrong, you can trace back through the chain to find exactly which step produced the error — turning “the AI gave a bad answer” into “step 2 missed a key fact.”⁵

3. Independent optimisation. Each step can use a different prompt, a different model, or different settings. A lightweight model can handle extraction (fast, cheap) while a powerful model handles generation (slower, more capable). You pay for capability only where you need it.²

4. Reusability. A well-designed extraction step can be reused across multiple chains. This is the same composability principle that makes Unix pipes powerful — small tools that do one thing well, assembled into larger workflows.⁵

When do we use it?

When a task has multiple distinct phases — research, then analyse, then write, then review
When you need auditable checkpoints between phases to verify correctness
When output quality matters more than speed and you are willing to trade latency for accuracy
When a single prompt keeps drifting or forgetting instructions partway through its response
When you want to reuse individual steps across different workflows

Rule of thumb

If you can describe the task as a single, clear instruction (“translate this sentence to French”), a single prompt is fine. If you find yourself writing a prompt with numbered steps, conditionals, or multiple output sections, you are describing a chain — build one.²

How can I think about it?

The assembly line

Prompt chaining works like a factory assembly line.

Raw materials arrive (input) — an article, a dataset, a user request

Station 1 cuts the raw material to size (extraction — pull out key facts)

Quality inspector checks the cut pieces (gate — are all required facts present?)

Station 2 shapes the pieces (transformation — organise facts into an outline)

Station 3 assembles the final product (generation — write the finished output)

Final inspection checks the product before shipping (output validation)

No station tries to do everything. Each one has specialised tools and a narrow job. The assembly line’s quality comes from the structure, not from any single station being perfect. And if Station 2 produces a defective piece, only Station 2 needs to redo its work.

The bucket brigade

Before fire engines, towns fought fires with bucket brigades — a line of people passing buckets of water from a well to the fire.

Each person has one job: receive a bucket, pass it to the next person

The bucket (data) moves through the line in one direction

If someone drops a bucket (gate failure), only that handoff needs to be repeated

The brigade works because no single person has to run from the well to the fire and back — the task is distributed across the chain

Adding more people (steps) makes the brigade longer but each person’s job stays simple

Prompt chaining distributes cognitive work the same way a bucket brigade distributes physical work.

Concepts to explore next

Concept	What it covers	Status
parallelisation	Running independent sub-tasks simultaneously instead of sequentially	stub
evaluator-optimiser	Using one LLM to generate and another to critique in an iterative loop	stub
structured-output	Constraining LLM output to a defined schema for reliable parsing	stub
context-cascading	Layering context from general to specific across pipeline stages	complete

Some cards don't exist yet

A broken link is a placeholder for future learning, not an error.

Check your understanding

Test yourself (click to expand)

Explain why breaking a task into a sequence of focused LLM calls produces better results than a single large prompt. What structural property of chaining makes the difference?

Name the three components of a prompt chain (steps, input-output contracts, gates) and describe the role of each.

Distinguish between a validation gate and a feedback loop. When would you use each?

Interpret this scenario: a three-step chain produces a well-structured but factually incorrect final output. The gate after step 1 (extraction) passed. Where did the error most likely originate, and what would you check first?

Connect prompt chaining to the concept of context cascading. How does controlling what context each step receives improve the chain’s reliability?

Where this concept fits

Position in the knowledge graph
graph TD
    AS[Agentic Systems] --> LP[LLM Pipelines]
    LP --> PC[Prompt Chaining]
    LP --> PR[Prompt Routing]
    LP --> CC[Context Cascading]
    LP --> RAG[RAG]
    style PC fill:#4a9ede,color:#fff
Related concepts:

parallelisation — where chaining processes steps sequentially, parallelisation runs independent steps simultaneously for speed

evaluator-optimiser — extends chaining with an iterative critique loop, adding a feedback mechanism beyond simple pass/fail gates

structured-output — defining output schemas is what makes input-output contracts between chain steps reliable

context-cascading — controls what information each step in a chain receives, preventing context overload

Explorer

Prompt Chaining

Prompt Chaining

What is it?

At a glance

How does it work?

1. Decomposition — one job per step

2. The input-output contract

3. Validation gates

4. Error handling and retry logic

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Graph View

Table of Contents

Backlinks

Explorer

Prompt Chaining

Prompt Chaining

What is it?

At a glance

How does it work?

1. Decomposition — one job per step

2. The input-output contract

3. Validation gates

4. Error handling and retry logic

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Footnotes

Graph View

Table of Contents

Backlinks