Prompt Routing
Directing an LLM to the right instruction set based on what the user is asking for — classifying input and selecting the correct handler before any generation begins.
What is it?
When an LLM-based system handles more than one type of task, it faces a fundamental question before it can do anything useful: what kind of request is this? Prompt routing is the mechanism that answers that question. It classifies the incoming input and directs it to a specialised handler — a specific prompt, a dedicated agent, or a particular workflow — that is optimised for that category of request.1
The parent concept, llm-pipelines, introduces routing as one of the core pipeline patterns: the branching pattern. Where a sequential pipeline processes every input through the same stages, a routing step creates divergent paths. Different types of input receive different treatment, and no single handler has to be good at everything.2
Routing is not unique to AI systems. Telephone switchboards routed calls to the right department. Web servers route HTTP requests to the right handler based on the URL path. Email filters route messages to folders based on rules. Prompt routing applies the same principle to LLM interactions: inspect the input, classify it, and send it to the right place.3
The critical insight is that routing is a separate step from generation. The router does not answer the question — it decides who answers the question. This separation of concerns means each downstream handler can be narrow, focused, and highly effective at its specific task.1
In plain terms
Prompt routing is like a receptionist at a large office. You walk in and say what you need. The receptionist does not do the work themselves — they figure out which department can help you and send you there. A good receptionist gets you to the right place quickly; a bad one sends you to the wrong floor.
At a glance
How prompt routing works (click to expand)
graph TD INPUT[User Input] --> ROUTER[Router - Classify Intent] ROUTER -->|billing| H1[Billing Handler] ROUTER -->|technical| H2[Technical Handler] ROUTER -->|general| H3[General Handler] ROUTER -->|unclear| FB[Fallback - Ask for Clarification] H1 --> OUT[Response] H2 --> OUT H3 --> OUT FB -->|clarified| ROUTERKey: The router sits at the entry point of the system. It inspects every input, classifies it into a category, and directs it to the appropriate handler. Each handler has its own specialised instructions and tools. If the input is ambiguous, a fallback path asks for clarification before re-routing.
How does it work?
Prompt routing operates through three components: the classifier (which determines intent), the routing table (which maps intents to handlers), and the fallback mechanism (which handles ambiguity). The sophistication of each component determines how well the system handles diverse and unpredictable input.
1. Hard-coded routing — rules and keywords
The simplest form of routing uses deterministic rules: keyword matching, regular expressions, or explicit conditional logic. If the input contains “invoice” or “payment”, route to the billing handler. If it contains “error” or “crash”, route to technical support.3
This approach is fast, predictable, and easy to debug. You can look at the routing rules and understand exactly why any given input was routed where it was. There are no surprises.
The trade-off is rigidity. Hard-coded routing fails on paraphrases (“I need help with my bill” vs “payment issue”), multilingual input, and any phrasing the rule author did not anticipate. As the number of categories grows, the rule set becomes brittle and difficult to maintain.4
Think of it like...
A vending machine with labelled buttons. Press “A1” and you always get the same item. Fast and reliable, but you can only choose from what is on the menu — and if you want something the buttons do not cover, the machine cannot help you.
2. Dynamic routing — LLM-based classification
Instead of writing rules by hand, you can ask an LLM to classify the input. The router prompt describes the available categories and asks the model to select the best match. This approach handles paraphrases, ambiguity, and natural language variation far better than keyword matching.1
For example, a routing prompt might say: “Classify the following user message into one of these categories: billing, technical, account, or general. Return only the category name.” The model reads the message, interprets the intent, and returns a classification that the system uses to select the next handler.
Dynamic routing is more flexible but introduces latency (an extra LLM call before the main task begins) and non-determinism (the same input might occasionally be classified differently). Production systems often use a small, fast model for the classification step to minimise cost and latency.2
Think of it like...
A human receptionist who listens to what you say, understands your intent even if you phrase it oddly, and directs you accordingly. More flexible than the vending machine, but occasionally makes a judgement call you might disagree with.
Example: LLM-based classification (click to expand)
Consider a customer service system with four departments:
User message Classified intent Routed to ”My last invoice was wrong” billing Billing handler ”The app crashes when I open settings” technical Technical handler ”How do I change my password?“ account Account handler ”I love your product!“ general General handler ”Je voudrais annuler mon abonnement” billing Billing handler The LLM-based router handles the French input, the informal phrasing, and the compliment correctly — all of which would challenge a keyword-based system.
3. Hybrid routing — rules first, LLM as fallback
Many production systems combine both approaches. Deterministic rules handle clear-cut cases (exact commands, structured input, known patterns) quickly and cheaply. When the rules cannot make a confident classification, the input is escalated to an LLM classifier for a more nuanced decision.4
This hybrid approach captures the best of both worlds: the speed and predictability of rules for common cases, and the flexibility of LLM classification for edge cases. It also reduces cost, because the expensive LLM call only happens when the cheap rules are insufficient.
Key distinction
Hard-coded routing is deterministic — the same input always produces the same route. Dynamic routing is probabilistic — the LLM interprets intent, which means edge cases may be classified differently on different runs. Hybrid routing uses deterministic rules as the first pass and reserves probabilistic classification for ambiguous inputs.
4. Routing tables — mapping intents to handlers
A routing table is the data structure that connects classifications to actions. It maps each recognised intent to a specific handler: a prompt template, an agent, a workflow, or a tool. The table is the “menu” that the router selects from.1
A well-designed routing table has these properties:
- Exhaustive — every expected intent maps to a handler
- Mutually exclusive — categories do not overlap (or overlap is resolved by priority)
- Fallback-aware — unrecognised intents have a default path
- Maintainable — adding a new category means adding one row, not rewriting the router
Yiuno example: the AGENTS.md routing table (click to expand)
The yiuno vault uses a routing table in its AGENTS.md file to direct tasks to the correct playbook or template:
Task Go to Publish an article _ai/playbooks/publish.mdCreate a page _ai/templates/page.mdDeploy the site _ai/playbooks/deploy.mdSpawn a subdomain _ai/playbooks/subdomain.mdCreate a concept card _ai/playbooks/concept-card.mdCreate a learning path _ai/playbooks/learning-path.mdStart a project _ai/playbooks/learning-pipeline.mdThe agent reads the user’s first message, matches it against the table, and loads the corresponding playbook. If the intent is unclear, a fallback menu is presented. This is hard-coded routing — the table is explicit and deterministic, which makes it easy to maintain and debug.
5. Fallback handling and ambiguity resolution
Every routing system encounters inputs it cannot confidently classify. The fallback mechanism determines what happens in these cases. Common strategies include:4
- Ask for clarification — present the user with options and let them self-route
- Default handler — route to a general-purpose handler that can handle broad queries
- Confidence thresholds — if the classifier’s confidence is below a threshold, escalate to a human or a more capable model
- Multi-label routing — when an input matches multiple categories, run handlers in parallel or prioritise by relevance
The quality of fallback handling often determines the difference between a system that feels intelligent and one that feels frustrating. A system that says “I’m not sure what you mean — could you choose from these options?” is far better than one that silently routes to the wrong handler.3
Why do we use it?
Key reasons
1. Specialisation. Each handler can be optimised for a narrow task with focused instructions, relevant context, and appropriate tools. A billing handler does not need to know about technical debugging, and vice versa. This produces better results than a single “do everything” prompt.2
2. Scalability. Adding a new capability means adding a new handler and a new row in the routing table. The existing handlers remain unchanged. This modular architecture scales as the system grows.1
3. Cost efficiency. Different routes can use different models. Simple classifications might use a small, fast model; complex generation might use a large, capable one. Routing enables matching model capability to task complexity, which reduces cost significantly.5
4. Debuggability. When something goes wrong, the routing decision is an inspectable artifact. You can see what the input was, how it was classified, and which handler processed it. This makes errors traceable rather than mysterious.2
When do we use it?
- When a system handles multiple types of requests that require different treatment
- When different request types need different tools, context, or instructions
- When you want to add new capabilities without rewriting existing ones
- When cost matters and you want to match model size to task complexity
- When building multi-agent systems where specialised agents handle different domains
Rule of thumb
If you find yourself writing a single prompt with lots of “if the user asks about X, do Y; if they ask about Z, do W” conditionals, you are describing a routing problem — and you should build a router.
How can I think about it?
The hospital triage desk
Prompt routing works like a hospital triage system.
- A patient arrives (user input enters the system)
- The triage nurse (router) quickly assesses the situation — not to treat, but to classify
- Based on the assessment, the patient is directed to emergency, outpatient, or a specialist (specialised handlers)
- Each department has its own staff, equipment, and procedures (handler-specific prompts, tools, and context)
- If the nurse cannot determine the right department, they ask clarifying questions (fallback mechanism)
- The nurse does not need to know how to perform surgery — they need to know how to recognise who needs a surgeon (classification, not generation)
The hospital works because the triage desk separates classification from treatment. Without it, every patient would wait in a single queue for a single doctor who tries to handle everything.
The mail sorting centre
Prompt routing works like a postal sorting facility.
- Every piece of mail arrives at a central point (all user inputs enter through a single interface)
- Sorting machines read the address (the router classifies the input)
- Mail is directed to different bins based on destination (routed to different handlers)
- Local mail goes to nearby carriers, international mail goes to the airport, parcels go to a different conveyor (each handler has different infrastructure)
- Unreadable addresses go to a special desk where a human examines them (fallback handling for ambiguous input)
- The sorting machine does not deliver the mail — it ensures every piece reaches the right next step
Speed comes from having a fast, focused classifier at the front that prevents everything from going through a single, slow path.
Concepts to explore next
| Concept | What it covers | Status |
|---|---|---|
| context-cascading | How context layers are loaded progressively — routing decides which layers to load | complete |
| playbooks-as-programs | Structured procedures that serve as routing destinations | stub |
| orchestration | Managing multiple agents, including how a router hands off to specialised agents | stub |
| intent-classification | The mechanism for determining what type of request the system is handling | complete |
Some cards don't exist yet
A broken link is a placeholder for future learning, not an error.
Check your understanding
Test yourself (click to expand)
- Explain what prompt routing does and why it is a separate step from generation. What problem does it solve?
- Name the three main approaches to routing (hard-coded, dynamic, hybrid) and describe one advantage and one disadvantage of each.
- Distinguish between a routing table and a fallback mechanism. How do they work together?
- Interpret this scenario: a customer support system routes “I want to cancel my subscription” to the billing handler, but the user actually has a technical issue preventing them from accessing the cancellation page. What went wrong, and how could the routing system be improved?
- Connect prompt routing to context cascading. How does a routing decision determine which context layers get loaded for a given task?
Where this concept fits
Position in the knowledge graph
graph TD LP[LLM Pipelines] --> PR[Prompt Routing] LP --> CC[Context Cascading] LP --> RAG[RAG] PR --> IC[Intent Classification] CC -.->|prerequisite| PR style PR fill:#4a9ede,color:#fffRelated concepts:
- playbooks-as-programs — playbooks are a common routing destination; the router selects which playbook to execute
- orchestration — in multi-agent systems, routing determines which agent handles a request
- knowledge-graphs — graph structures can inform routing by mapping user intents to concept domains
Sources
Further reading
Resources
- Building Effective Agents (Anthropic) — The foundational reference on workflow patterns including routing, from the team behind Claude
- AI Agent Routing: A Practical Guide (BSWEN) — Hands-on implementation guide for intent classification and routing with code examples
- Top 5 LLM Routing Techniques (Maxim AI) — Comprehensive overview of routing strategies from keyword matching to semantic classification
- Router-Based Agents: The Architecture Pattern That Makes AI Systems Scale (Towards AI) — Architecture-focused deep dive into how routing enables scalable multi-agent systems
- Level 2 Agents: Router Pattern Deep Dive (AI Skill Market) — Detailed walkthrough of the router pattern with progressive complexity levels
Footnotes
-
Schluntz, E. and Zhang, B. (2024). Building Effective Agents. Anthropic. ↩ ↩2 ↩3 ↩4 ↩5
-
Carpintero, D. (2025). Design Patterns for Building Agentic Workflows. Hugging Face. ↩ ↩2 ↩3 ↩4
-
BSWEN. (2026). AI Agent Routing: A Practical Guide to Intent Classification and Routing Implementation. BSWEN. ↩ ↩2 ↩3
-
Maxim AI. (2026). Top 5 LLM Routing Techniques. Maxim AI. ↩ ↩2 ↩3
-
Markaicode. (2026). The LLM Router Pattern: Dynamically Switching Models by Task Complexity and Cost. Markaicode. ↩