Prompt Chaining

Breaking a complex task into a strict sequence of LLM calls where each step consumes the previous step’s output and performs a single, focused transformation.


What is it?

When you ask a language model to do something simple — summarise a paragraph, translate a sentence, extract a date — a single prompt works well. But the moment you need the model to research, then analyse, then write, then format, a single prompt starts to buckle. It has too many objectives competing for attention, no checkpoints, and no way to tell where things went wrong when the output is off.1

Prompt chaining is the fix. Instead of one massive prompt that says “do everything,” you write a short sequence of prompts, each with one clear job. The output of step 1 becomes the input of step 2, and so on until the final result emerges. Between steps, you can insert programmatic checks — called gates — that verify the intermediate output before passing it forward.2

Anthropic’s Building Effective Agents guide identifies prompt chaining as the foundational workflow pattern — the simplest and most common way to structure an LLM pipeline. It sits at the base of a complexity ladder: you start with prompt chaining and only reach for more advanced patterns (routing, parallelisation, orchestrator-workers) when sequential decomposition is not enough.2

The parent concept, llm-pipelines, covers the full landscape of pipeline patterns. Prompt chaining is the entry point — the pattern you should learn first and use most often.

In plain terms

A single prompt is like asking someone to bake a cake by saying “make me a cake.” Prompt chaining is like handing them a recipe card for each stage: first measure the ingredients, then mix the batter, then bake, then decorate. Each card has one instruction. If the batter is wrong, you catch it before you waste an hour baking.


At a glance


How does it work?

1. Decomposition — one job per step

The core principle is that each step in the chain should have exactly one clear job.2 If you find yourself writing a step that does two things — say, “extract the key facts and then evaluate their reliability” — split it into two steps.

This matters because LLMs perform better when they focus on a single reasoning task at a time. Packing multiple objectives into one prompt forces the model to juggle competing concerns, and important details get lost or deprioritised.1 Research on how language models use long contexts shows that information buried in the middle of a large prompt is often missed — the “lost-in-the-middle” effect.3

Think of it like...

A relay race. Each runner covers one leg. No runner tries to run the entire distance. The baton (the output) passes cleanly at each handoff point. If one runner stumbles, only that leg needs to be re-run.

2. The input-output contract

Every step in a chain has an implicit contract: it expects input in a specific shape and produces output in a specific shape. This contract is what makes chaining reliable.4

For example, if step 1 outputs a JSON list of facts and step 2 expects a JSON list of facts, the handoff is clean. If step 1 outputs free-form prose and step 2 tries to parse it as structured data, the chain breaks.

Best practice is to specify the output format explicitly in each step’s prompt — “Output: a numbered list of facts, one per line” or “Output: a JSON object with keys title, summary, tags.” Structured outputs reduce ambiguity and make gates easier to implement.4

3. Validation gates

A gate is a programmatic check inserted between steps. It inspects the output of the previous step and decides whether to pass it forward, retry the step, or halt the chain.2

Gates are not LLM calls — they are deterministic code. Examples include:

  • Length check: Does the output have the expected number of items?
  • Format check: Is the output valid JSON? Does it contain the required keys?
  • Content check: Does the output mention all required topics? Are there any prohibited terms?
  • Threshold check: Does a confidence score exceed the minimum?

Gates are what give prompt chaining its reliability advantage over monolithic prompts. Without gates, errors in early steps silently propagate through the entire chain. With gates, errors are caught at their source.5

Key distinction

A gate is binary — pass or fail. It catches problems. A feedback loop (the evaluator-optimiser pattern) is iterative — it provides detailed feedback that guides revision. Gates are checkpoints; feedback loops are coaching sessions. Start with gates. Add feedback loops only when gate failures are frequent and the fix requires nuanced judgement.

4. Error handling and retry logic

When a gate fails, you have three options:4

  1. Retry the step with the same input (useful when failures are stochastic — the model might succeed on a second attempt)
  2. Retry with modified input — add clarifying instructions or rephrase the prompt
  3. Halt the chain and surface the error for human review

A common pattern is to allow one automatic retry before halting. This keeps the chain moving without entering infinite loops.

Think of it like...

A factory quality inspector at the end of an assembly station. If a part fails inspection, the inspector does not redesign the entire factory. They send the part back to that one station for rework. If it fails again, they pull it off the line for a human to examine.


Why do we use it?

Key reasons

1. Accuracy through focus. Each step asks the model to do one thing well rather than juggling multiple concerns. Prompt chaining can improve task accuracy by 10 to 30 percent on complex multi-step operations compared to equivalent single-prompt approaches.1

2. Debuggability. Every intermediate output is an inspectable artifact. When the final result is wrong, you can trace back through the chain to find exactly which step produced the error — turning “the AI gave a bad answer” into “step 2 missed a key fact.”5

3. Independent optimisation. Each step can use a different prompt, a different model, or different settings. A lightweight model can handle extraction (fast, cheap) while a powerful model handles generation (slower, more capable). You pay for capability only where you need it.2

4. Reusability. A well-designed extraction step can be reused across multiple chains. This is the same composability principle that makes Unix pipes powerful — small tools that do one thing well, assembled into larger workflows.5


When do we use it?

  • When a task has multiple distinct phases — research, then analyse, then write, then review
  • When you need auditable checkpoints between phases to verify correctness
  • When output quality matters more than speed and you are willing to trade latency for accuracy
  • When a single prompt keeps drifting or forgetting instructions partway through its response
  • When you want to reuse individual steps across different workflows

Rule of thumb

If you can describe the task as a single, clear instruction (“translate this sentence to French”), a single prompt is fine. If you find yourself writing a prompt with numbered steps, conditionals, or multiple output sections, you are describing a chain — build one.2


How can I think about it?

The assembly line

Prompt chaining works like a factory assembly line.

  • Raw materials arrive (input) — an article, a dataset, a user request
  • Station 1 cuts the raw material to size (extraction — pull out key facts)
  • Quality inspector checks the cut pieces (gate — are all required facts present?)
  • Station 2 shapes the pieces (transformation — organise facts into an outline)
  • Station 3 assembles the final product (generation — write the finished output)
  • Final inspection checks the product before shipping (output validation)

No station tries to do everything. Each one has specialised tools and a narrow job. The assembly line’s quality comes from the structure, not from any single station being perfect. And if Station 2 produces a defective piece, only Station 2 needs to redo its work.

The bucket brigade

Before fire engines, towns fought fires with bucket brigades — a line of people passing buckets of water from a well to the fire.

  • Each person has one job: receive a bucket, pass it to the next person
  • The bucket (data) moves through the line in one direction
  • If someone drops a bucket (gate failure), only that handoff needs to be repeated
  • The brigade works because no single person has to run from the well to the fire and back — the task is distributed across the chain
  • Adding more people (steps) makes the brigade longer but each person’s job stays simple

Prompt chaining distributes cognitive work the same way a bucket brigade distributes physical work.


Concepts to explore next

ConceptWhat it coversStatus
parallelisationRunning independent sub-tasks simultaneously instead of sequentiallystub
evaluator-optimiserUsing one LLM to generate and another to critique in an iterative loopstub
structured-outputConstraining LLM output to a defined schema for reliable parsingstub
context-cascadingLayering context from general to specific across pipeline stagescomplete

Some cards don't exist yet

A broken link is a placeholder for future learning, not an error.


Check your understanding


Where this concept fits

Position in the knowledge graph

graph TD
    AS[Agentic Systems] --> LP[LLM Pipelines]
    LP --> PC[Prompt Chaining]
    LP --> PR[Prompt Routing]
    LP --> CC[Context Cascading]
    LP --> RAG[RAG]
    style PC fill:#4a9ede,color:#fff

Related concepts:

  • parallelisation — where chaining processes steps sequentially, parallelisation runs independent steps simultaneously for speed
  • evaluator-optimiser — extends chaining with an iterative critique loop, adding a feedback mechanism beyond simple pass/fail gates
  • structured-output — defining output schemas is what makes input-output contracts between chain steps reliable
  • context-cascading — controls what information each step in a chain receives, preventing context overload

Sources


Further reading

Resources

Footnotes

  1. Conbersa Team. (2026). What Is Prompt Chaining?. Conbersa. 2 3

  2. Schluntz, E. and Zhang, B. (2024). Building Effective Agents. Anthropic. 2 3 4 5 6

  3. Liu, N. F., et al. (2024). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics, 12.

  4. PromptBuilder. (2025). Prompt Chaining in 2026 - Reliable Agent Workflows and Templates. PromptBuilder. 2 3

  5. TheLinuxCode. (2026). Prompt Chaining: Building Reliable Multi-Step LLM Workflows. TheLinuxCode. 2 3