Parallelisation
Running independent subtasks at the same time rather than one after another, then merging the results — trading compute cost for speed and robustness.
What is it?
prompt-chaining solves complex tasks by processing them in a strict sequence: step 1, then step 2, then step 3. That works when each step depends on the previous one. But many real tasks contain parts that are completely independent of each other — checking a document for factual accuracy has nothing to do with checking it for tone, and neither needs to wait for the other.
Parallelisation is the pipeline pattern that exploits this independence. Instead of processing subtasks one after another, you fan them out to run simultaneously and then gather the results into a single output.1 The total time drops from the sum of all subtasks to the duration of the slowest one.
Anthropic’s Building Effective Agents guide identifies two distinct sub-patterns within parallelisation:1
- Sectioning — splitting a task into different independent subtasks that each handle a separate concern, running them in parallel, and merging the results.
- Voting — running the same task multiple times (often with different prompts or model configurations) to get diverse outputs, then aggregating them for higher confidence.
Both sub-patterns share the same fan-out/gather structure, but they serve different purposes. Sectioning increases speed and specialisation. Voting increases reliability and confidence.
In plain terms
Imagine you need to clean your house before guests arrive. You could vacuum every room, then dust every room, then clean every bathroom — doing one task at a time across the whole house. Or you could ask three people to each take one room and do everything in that room simultaneously. The total work is the same, but the clock time drops dramatically because the rooms are independent.
At a glance
Fan-out and gather structure (click to expand)
graph TD A[Input Task] --> F[Fan-Out] F --> B1[Subtask A] F --> B2[Subtask B] F --> B3[Subtask C] B1 --> G[Gather / Merge] B2 --> G B3 --> G G --> O[Combined Output]Key: The fan-out step splits the input into independent branches. Each branch runs its own LLM call concurrently. The gather step merges all branch outputs into a single result. No branch waits for any other branch — they run simultaneously.
How does it work?
1. Sectioning — different jobs in parallel
Sectioning breaks a task into independent concerns, assigns each concern to a separate LLM call with a specialised prompt, and runs all calls concurrently.1 Each branch focuses on one aspect of the problem, which means its prompt can be optimised for that specific concern without competing objectives.
For example, when reviewing a piece of writing, you might run three parallel branches:
| Branch | Focus | Prompt optimised for |
|---|---|---|
| Branch A | Factual accuracy | Verifying claims against source material |
| Branch B | Tone and style | Checking consistency with a style guide |
| Branch C | Structure and completeness | Ensuring all required sections are present |
Each branch produces its own assessment. An aggregation step then merges the three assessments into a single review.2
Think of it like...
A panel of specialists at a medical consultation. The cardiologist examines the heart, the neurologist examines the nervous system, and the radiologist reads the scans — all at the same time. No specialist waits for another to finish. At the end, they convene to combine their findings into a single diagnosis.
Anthropic specifically highlights guardrail implementation as a strong use case for sectioning: one model instance processes the user query while another simultaneously screens it for inappropriate content. This tends to perform better than having a single LLM call handle both the guardrail check and the core response, because the concerns compete for attention in a single prompt.1
Example: parallel code review (click to expand)
Consider a code review pipeline where quality matters across multiple dimensions:
Branch Focus Output Branch A Security vulnerabilities List of security findings with severity ratings Branch B Performance issues List of performance bottlenecks and suggestions Branch C Code style and readability List of style violations and readability improvements Aggregator: Merges all findings into a single review report, de-duplicates overlapping issues, and prioritises by severity.
Running these in parallel is faster than sequentially. More importantly, each branch uses a prompt tailored to its specific concern — a security-focused prompt does not need to worry about style, and vice versa. This specialisation improves the quality of each individual assessment.2
2. Voting — same job, multiple attempts
Voting runs the same task multiple times — often with different prompts, different temperature settings, or even different models — and aggregates the outputs to reach a more confident result.1 Where sectioning divides the work, voting multiplies it for robustness.
The aggregation method depends on the task:
| Aggregation method | When to use | Example |
|---|---|---|
| Majority vote | Binary or categorical decisions | 3 out of 5 reviewers flag content as inappropriate |
| Threshold vote | Balancing false positives and negatives | Require 4 out of 5 votes to block content |
| Best-of-N selection | Generative tasks with quality variation | Generate 3 drafts, score each, keep the best |
| Weighted average | When some evaluators are more reliable | Weight the specialist model’s vote higher than the generalist |
Anthropic highlights two voting examples: reviewing code for vulnerabilities with several different prompts that each flag problems independently, and evaluating content appropriateness with multiple prompts that use different vote thresholds to balance false positives and negatives.1
Think of it like...
A panel of judges scoring a gymnastics routine. Each judge scores independently, without seeing the others’ scores. The final score is an aggregate — typically dropping the highest and lowest and averaging the rest. No single judge’s bias can dominate the outcome, and the aggregate is more reliable than any individual score.
3. The fan-out/gather mechanic
Both sectioning and voting follow the same structural pattern:3
- Fan-out: The input is distributed to multiple parallel branches. In sectioning, each branch gets a different instruction. In voting, each branch gets the same (or similar) instruction.
- Parallel execution: All branches run concurrently. No branch depends on any other branch’s output.
- Gather: An aggregation step collects all branch outputs and combines them into a single result.
The gather step can be implemented as deterministic code (concatenation, majority vote, de-duplication) or as another LLM call that synthesises the branch outputs into a coherent whole. The choice depends on how much judgement the merging requires.2
Key distinction
Sectioning fans out different tasks to different branches. Voting fans out the same task to multiple branches. Both use the same fan-out/gather structure, but the fan-out logic and the gather logic differ.
4. The latency vs cost trade-off
Parallelisation is not free. Running three branches in parallel means three simultaneous LLM calls — tripling the compute cost compared to a single call. The benefit is that wall-clock time drops to the duration of the slowest branch rather than the sum of all branches.1
This trade-off means parallelisation makes sense when:
- Latency matters more than cost — the user is waiting for a response and you need to deliver faster
- Quality matters more than cost — voting produces more reliable results than a single attempt, and the stakes justify the extra spend
- Subtasks are genuinely independent — if branches depend on each other, you cannot run them in parallel (use prompt-chaining instead)
It does not make sense when:
- Subtasks have sequential dependencies — step 2 needs step 1’s output
- The task is simple enough for a single LLM call to handle reliably
- Cost constraints are tight and the quality or speed gain does not justify multiplying the number of calls
Yiuno example: parallel quality review (click to expand)
When this knowledge system reviews a concept card, it could run parallel checks:
Branch Check Branch A Are all technical terms explained or linked? Branch B Are all footnote citations valid and properly formatted? Branch C Does the Mermaid diagram render correctly? Each check is independent. Running them in parallel produces a complete review faster than running them sequentially, and each branch can use a prompt optimised for its specific concern.
Why do we use it?
Key reasons
1. Speed. When subtasks are independent, parallelisation reduces total processing time from the sum of all subtasks to the duration of the slowest one. For pipelines with multiple independent checks or analyses, this can cut latency dramatically.1
2. Specialisation. Each parallel branch can use a prompt tailored to its specific concern. A security review prompt does not need to share attention with a style review prompt. This focused attention improves the quality of each individual assessment.2
3. Robustness through diversity. Voting produces more reliable results than a single attempt by aggregating multiple independent assessments. A single LLM call might miss a vulnerability; three independent calls with different prompts are far less likely to all miss the same issue.1
4. Graceful degradation. If one parallel branch fails or times out, the other branches still produce their results. The system can deliver a partial result rather than failing entirely — the code review loses its style assessment but still reports security and performance findings.3
When do we use it?
- When a task has multiple independent concerns that do not depend on each other (review dimensions, guard rails, evaluation criteria)
- When latency is a constraint and you need results faster than sequential processing allows
- When you need higher confidence in a decision and can afford to run the task multiple times (voting)
- When different expertise is needed for different aspects of the same input (specialised prompts per branch)
- When building guardrails that should run alongside the main task rather than before or after it
Rule of thumb
If you can describe the subtasks as “check A and check B and check C” where none depends on the others, parallelise. If it is “do A, then use A’s result to do B,” that is a chain, not a parallel task.1
How can I think about it?
The newspaper desk
A newspaper editor receives a breaking story and needs it ready for print fast.
- Fact-checker verifies every claim against sources
- Copy editor fixes grammar, spelling, and style
- Photo editor selects and crops the accompanying images
- Layout designer prepares the page template
All four work simultaneously on different aspects of the same story. None needs to wait for the others. When all four are done, their work is merged into the final page. The story reaches print in the time it takes the slowest editor, not the sum of all four.
This is sectioning: different specialists, same input, parallel execution, merged output.
The taste-test panel
A food company testing a new recipe does not rely on one taster’s opinion. They assemble a panel of 10 tasters who each evaluate the recipe independently.
- Each taster scores the same dish on flavour, texture, and appearance
- No taster sees the others’ scores until all have submitted
- The final assessment is the aggregate of all scores
- Outlier scores (one person hates cilantro) are diluted by the majority
This is voting: same task, multiple independent attempts, aggregated result. The panel’s collective judgement is more reliable than any single taster, and the process guards against individual bias.
Concepts to explore next
| Concept | What it covers | Status |
|---|---|---|
| evaluator-optimiser | Using one LLM to generate and another to critique in an iterative refinement loop | complete |
| orchestration | How agents and pipeline stages are coordinated and managed | stub |
| multi-agent-systems | Architectures where multiple specialised agents collaborate on a task | stub |
Some cards don't exist yet
A broken link is a placeholder for future learning, not an error.
Check your understanding
Test yourself (click to expand)
- Explain the difference between sectioning and voting. When would you choose one over the other?
- Name the three stages of the fan-out/gather mechanic and describe what happens at each stage.
- Distinguish between parallelisation and prompt chaining. What property of the subtasks determines which pattern to use?
- Interpret this scenario: a document review pipeline runs three parallel branches (accuracy, tone, structure). The accuracy branch takes 8 seconds, tone takes 3 seconds, and structure takes 5 seconds. What is the total wall-clock time, and how does it compare to sequential execution?
- Connect parallelisation to the concept of guardrails. Why does Anthropic recommend running guardrail checks in parallel with the main task rather than before it?
Where this concept fits
Position in the knowledge graph
graph TD AS[Agentic Systems] --> LP[LLM Pipelines] LP --> PC[Prompt Chaining] LP --> PR[Prompt Routing] LP --> PAR[Parallelisation] LP --> EO[Evaluator-Optimiser] LP --> CC[Context Cascading] LP --> RAG[RAG] style PAR fill:#4a9ede,color:#fffRelated concepts:
- orchestration — orchestration manages which agents run and when; parallelisation is one execution strategy an orchestrator might use for independent subtasks
- evaluator-optimiser — where parallelisation runs tasks simultaneously for speed or confidence, the evaluator-optimiser runs tasks iteratively for quality refinement
- multi-agent-systems — multi-agent architectures often use parallelisation internally, assigning independent sub-problems to specialised agents that work concurrently
Sources
Further reading
Resources
- Building Effective Agents (Anthropic) — The foundational reference on workflow patterns including parallelisation, with clear definitions of sectioning and voting sub-patterns
- Design Patterns for Building Agentic Workflows (Hugging Face) — Comprehensive catalogue of six design patterns with architecture diagrams showing how parallelisation relates to other pipeline patterns
- Parallel Fan-Out Fan-In Patterns (CallSphere) — Practical walkthrough of fan-out/gather mechanics with implementation guidance for multi-agent systems
- Stop Building AI Agents: Use These 5 Patterns Instead (Decoding AI) — Practitioner-oriented overview of when parallelisation makes sense versus simpler patterns, with cost-benefit analysis
Footnotes
-
Schluntz, E. and Zhang, B. (2024). Building Effective Agents. Anthropic. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10
-
Carpintero, D. (2025). Design Patterns for Building Agentic Workflows. Hugging Face. ↩ ↩2 ↩3 ↩4
-
CallSphere. (2026). Parallel Fan-Out Fan-In Patterns: Processing Multiple Sub-Tasks Simultaneously. CallSphere. ↩ ↩2