Parallelisation

Running independent subtasks at the same time rather than one after another, then merging the results — trading compute cost for speed and robustness.

What is it?

prompt-chaining solves complex tasks by processing them in a strict sequence: step 1, then step 2, then step 3. That works when each step depends on the previous one. But many real tasks contain parts that are completely independent of each other — checking a document for factual accuracy has nothing to do with checking it for tone, and neither needs to wait for the other.

Parallelisation is the pipeline pattern that exploits this independence. Instead of processing subtasks one after another, you fan them out to run simultaneously and then gather the results into a single output.¹ The total time drops from the sum of all subtasks to the duration of the slowest one.

Anthropic’s Building Effective Agents guide identifies two distinct sub-patterns within parallelisation:¹

Sectioning — splitting a task into different independent subtasks that each handle a separate concern, running them in parallel, and merging the results.
Voting — running the same task multiple times (often with different prompts or model configurations) to get diverse outputs, then aggregating them for higher confidence.

Both sub-patterns share the same fan-out/gather structure, but they serve different purposes. Sectioning increases speed and specialisation. Voting increases reliability and confidence.

In plain terms

Imagine you need to clean your house before guests arrive. You could vacuum every room, then dust every room, then clean every bathroom — doing one task at a time across the whole house. Or you could ask three people to each take one room and do everything in that room simultaneously. The total work is the same, but the clock time drops dramatically because the rooms are independent.

At a glance

Fan-out and gather structure (click to expand)
graph TD
    A[Input Task] --> F[Fan-Out]
    F --> B1[Subtask A]
    F --> B2[Subtask B]
    F --> B3[Subtask C]
    B1 --> G[Gather / Merge]
    B2 --> G
    B3 --> G
    G --> O[Combined Output]
Key: The fan-out step splits the input into independent branches. Each branch runs its own LLM call concurrently. The gather step merges all branch outputs into a single result. No branch waits for any other branch — they run simultaneously.

How does it work?

1. Sectioning — different jobs in parallel

Sectioning breaks a task into independent concerns, assigns each concern to a separate LLM call with a specialised prompt, and runs all calls concurrently.¹ Each branch focuses on one aspect of the problem, which means its prompt can be optimised for that specific concern without competing objectives.

For example, when reviewing a piece of writing, you might run three parallel branches:

Branch	Focus	Prompt optimised for
Branch A	Factual accuracy	Verifying claims against source material
Branch B	Tone and style	Checking consistency with a style guide
Branch C	Structure and completeness	Ensuring all required sections are present

Each branch produces its own assessment. An aggregation step then merges the three assessments into a single review.²

Think of it like...

A panel of specialists at a medical consultation. The cardiologist examines the heart, the neurologist examines the nervous system, and the radiologist reads the scans — all at the same time. No specialist waits for another to finish. At the end, they convene to combine their findings into a single diagnosis.

Anthropic specifically highlights guardrail implementation as a strong use case for sectioning: one model instance processes the user query while another simultaneously screens it for inappropriate content. This tends to perform better than having a single LLM call handle both the guardrail check and the core response, because the concerns compete for attention in a single prompt.¹

Example: parallel code review (click to expand)

Consider a code review pipeline where quality matters across multiple dimensions:

Branch Focus Output
Branch A Security vulnerabilities List of security findings with severity ratings
Branch B Performance issues List of performance bottlenecks and suggestions
Branch C Code style and readability List of style violations and readability improvements

Aggregator: Merges all findings into a single review report, de-duplicates overlapping issues, and prioritises by severity.

Running these in parallel is faster than sequentially. More importantly, each branch uses a prompt tailored to its specific concern — a security-focused prompt does not need to worry about style, and vice versa. This specialisation improves the quality of each individual assessment.²

Branch	Focus	Output
Branch A	Security vulnerabilities	List of security findings with severity ratings
Branch B	Performance issues	List of performance bottlenecks and suggestions
Branch C	Code style and readability	List of style violations and readability improvements

2. Voting — same job, multiple attempts

Voting runs the same task multiple times — often with different prompts, different temperature settings, or even different models — and aggregates the outputs to reach a more confident result.¹ Where sectioning divides the work, voting multiplies it for robustness.

The aggregation method depends on the task:

Aggregation method	When to use	Example
Majority vote	Binary or categorical decisions	3 out of 5 reviewers flag content as inappropriate
Threshold vote	Balancing false positives and negatives	Require 4 out of 5 votes to block content
Best-of-N selection	Generative tasks with quality variation	Generate 3 drafts, score each, keep the best
Weighted average	When some evaluators are more reliable	Weight the specialist model’s vote higher than the generalist

Anthropic highlights two voting examples: reviewing code for vulnerabilities with several different prompts that each flag problems independently, and evaluating content appropriateness with multiple prompts that use different vote thresholds to balance false positives and negatives.¹

Think of it like...

A panel of judges scoring a gymnastics routine. Each judge scores independently, without seeing the others’ scores. The final score is an aggregate — typically dropping the highest and lowest and averaging the rest. No single judge’s bias can dominate the outcome, and the aggregate is more reliable than any individual score.

3. The fan-out/gather mechanic

Both sectioning and voting follow the same structural pattern:³

Fan-out: The input is distributed to multiple parallel branches. In sectioning, each branch gets a different instruction. In voting, each branch gets the same (or similar) instruction.
Parallel execution: All branches run concurrently. No branch depends on any other branch’s output.
Gather: An aggregation step collects all branch outputs and combines them into a single result.

The gather step can be implemented as deterministic code (concatenation, majority vote, de-duplication) or as another LLM call that synthesises the branch outputs into a coherent whole. The choice depends on how much judgement the merging requires.²

Key distinction

Sectioning fans out different tasks to different branches. Voting fans out the same task to multiple branches. Both use the same fan-out/gather structure, but the fan-out logic and the gather logic differ.

4. The latency vs cost trade-off

Parallelisation is not free. Running three branches in parallel means three simultaneous LLM calls — tripling the compute cost compared to a single call. The benefit is that wall-clock time drops to the duration of the slowest branch rather than the sum of all branches.¹

This trade-off means parallelisation makes sense when:

Latency matters more than cost — the user is waiting for a response and you need to deliver faster
Quality matters more than cost — voting produces more reliable results than a single attempt, and the stakes justify the extra spend
Subtasks are genuinely independent — if branches depend on each other, you cannot run them in parallel (use prompt-chaining instead)

It does not make sense when:

Subtasks have sequential dependencies — step 2 needs step 1’s output
The task is simple enough for a single LLM call to handle reliably
Cost constraints are tight and the quality or speed gain does not justify multiplying the number of calls

Yiuno example: parallel quality review (click to expand)

When this knowledge system reviews a concept card, it could run parallel checks:

Branch Check
Branch A Are all technical terms explained or linked?
Branch B Are all footnote citations valid and properly formatted?
Branch C Does the Mermaid diagram render correctly?

Each check is independent. Running them in parallel produces a complete review faster than running them sequentially, and each branch can use a prompt optimised for its specific concern.

Branch	Check
Branch A	Are all technical terms explained or linked?
Branch B	Are all footnote citations valid and properly formatted?
Branch C	Does the Mermaid diagram render correctly?

Why do we use it?

Key reasons

1. Speed. When subtasks are independent, parallelisation reduces total processing time from the sum of all subtasks to the duration of the slowest one. For pipelines with multiple independent checks or analyses, this can cut latency dramatically.¹

2. Specialisation. Each parallel branch can use a prompt tailored to its specific concern. A security review prompt does not need to share attention with a style review prompt. This focused attention improves the quality of each individual assessment.²

3. Robustness through diversity. Voting produces more reliable results than a single attempt by aggregating multiple independent assessments. A single LLM call might miss a vulnerability; three independent calls with different prompts are far less likely to all miss the same issue.¹

4. Graceful degradation. If one parallel branch fails or times out, the other branches still produce their results. The system can deliver a partial result rather than failing entirely — the code review loses its style assessment but still reports security and performance findings.³

When do we use it?

When a task has multiple independent concerns that do not depend on each other (review dimensions, guard rails, evaluation criteria)
When latency is a constraint and you need results faster than sequential processing allows
When you need higher confidence in a decision and can afford to run the task multiple times (voting)
When different expertise is needed for different aspects of the same input (specialised prompts per branch)
When building guardrails that should run alongside the main task rather than before or after it

Rule of thumb

If you can describe the subtasks as “check A and check B and check C” where none depends on the others, parallelise. If it is “do A, then use A’s result to do B,” that is a chain, not a parallel task.¹

How can I think about it?

The newspaper desk

A newspaper editor receives a breaking story and needs it ready for print fast.

Fact-checker verifies every claim against sources

Copy editor fixes grammar, spelling, and style

Photo editor selects and crops the accompanying images

Layout designer prepares the page template

All four work simultaneously on different aspects of the same story. None needs to wait for the others. When all four are done, their work is merged into the final page. The story reaches print in the time it takes the slowest editor, not the sum of all four.

This is sectioning: different specialists, same input, parallel execution, merged output.

The taste-test panel

A food company testing a new recipe does not rely on one taster’s opinion. They assemble a panel of 10 tasters who each evaluate the recipe independently.

Each taster scores the same dish on flavour, texture, and appearance

No taster sees the others’ scores until all have submitted

The final assessment is the aggregate of all scores

Outlier scores (one person hates cilantro) are diluted by the majority

This is voting: same task, multiple independent attempts, aggregated result. The panel’s collective judgement is more reliable than any single taster, and the process guards against individual bias.

Concepts to explore next

Concept	What it covers	Status
evaluator-optimiser	Using one LLM to generate and another to critique in an iterative refinement loop	complete
orchestration	How agents and pipeline stages are coordinated and managed	stub
multi-agent-systems	Architectures where multiple specialised agents collaborate on a task	stub

Some cards don't exist yet

A broken link is a placeholder for future learning, not an error.

Check your understanding

Test yourself (click to expand)

Explain the difference between sectioning and voting. When would you choose one over the other?

Name the three stages of the fan-out/gather mechanic and describe what happens at each stage.

Distinguish between parallelisation and prompt chaining. What property of the subtasks determines which pattern to use?

Interpret this scenario: a document review pipeline runs three parallel branches (accuracy, tone, structure). The accuracy branch takes 8 seconds, tone takes 3 seconds, and structure takes 5 seconds. What is the total wall-clock time, and how does it compare to sequential execution?

Connect parallelisation to the concept of guardrails. Why does Anthropic recommend running guardrail checks in parallel with the main task rather than before it?

Where this concept fits

Position in the knowledge graph
graph TD
    AS[Agentic Systems] --> LP[LLM Pipelines]
    LP --> PC[Prompt Chaining]
    LP --> PR[Prompt Routing]
    LP --> PAR[Parallelisation]
    LP --> EO[Evaluator-Optimiser]
    LP --> CC[Context Cascading]
    LP --> RAG[RAG]
    style PAR fill:#4a9ede,color:#fff
Related concepts:

orchestration — orchestration manages which agents run and when; parallelisation is one execution strategy an orchestrator might use for independent subtasks

evaluator-optimiser — where parallelisation runs tasks simultaneously for speed or confidence, the evaluator-optimiser runs tasks iteratively for quality refinement

multi-agent-systems — multi-agent architectures often use parallelisation internally, assigning independent sub-problems to specialised agents that work concurrently

Explorer

Parallelisation

Parallelisation

What is it?

At a glance

How does it work?

1. Sectioning — different jobs in parallel

2. Voting — same job, multiple attempts

3. The fan-out/gather mechanic

4. The latency vs cost trade-off

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Graph View

Table of Contents

Backlinks

Explorer

Parallelisation

Parallelisation

What is it?

At a glance

How does it work?

1. Sectioning — different jobs in parallel

2. Voting — same job, multiple attempts

3. The fan-out/gather mechanic

4. The latency vs cost trade-off

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Footnotes

Graph View

Table of Contents

Backlinks