Autonomy Spectrum

A framework for classifying AI systems by how much they can do on their own — from simple chatbots that only respond when asked, to autonomous agents that plan and act independently.


What is it?

When people talk about AI agents, they often treat it as a binary: either something is a chatbot or it is an autonomous agent. In practice, there is a spectrum between those extremes, with at least four distinct levels of capability.1 Understanding where a system sits on this spectrum — and where it should sit — is one of the most important design decisions in AI engineering.

The idea is borrowed from a familiar domain: self-driving cars. The SAE (Society of Automotive Engineers) defined six levels of driving automation, from “no automation” to “full automation.”2 The AI industry has adopted a similar approach. Sean Falconer’s autonomy levels, for instance, map a progression from simple prompt-response systems through tool-augmented assistants, workflow agents, and fully autonomous agents.3 Each level adds a capability — memory, tool access, planning, or multi-agent coordination — and each capability brings new design challenges.

The critical insight, emphasised by Anthropic’s guide to building effective agents, is that you should start with the simplest architecture that solves the problem.4 More autonomy means more complexity, more failure modes, and more cost. A tool-augmented LLM that reliably completes a task is far better than an autonomous agent that fails unpredictably. The spectrum is not a ladder to climb — it is a menu to choose from.

In plain terms

Think of the autonomy spectrum like a car’s cruise control settings. Basic cruise control holds a set speed (reactive chatbot). Adaptive cruise control adjusts speed based on traffic (tool-augmented LLM). A highway autopilot handles steering and lane changes (workflow agent). Full self-driving handles the entire journey (autonomous agent). Each level is useful — the right choice depends on the road, not on which level sounds most impressive.


At a glance


How does it work?

Level 1: Reactive chatbot

The simplest form. The system receives a prompt, generates a response, and stops. It has no memory between turns (or very limited memory), no access to external tools, and no ability to take actions in the world.1

For example: a basic customer FAQ bot that matches your question to a pre-written answer, or a vanilla LLM chat interface.

Think of it like...

A receptionist who can answer questions from a script but cannot look anything up, make calls, or take action on your behalf. Useful for simple, predictable interactions.

Level 2: Tool-augmented LLM

The system can call external tools — search the web, query a database, run code, call an API — and incorporate the results into its response. This is a major capability jump because the system can now access current information and perform actions beyond text generation.3

For example: an LLM that can search documentation, retrieve customer records, or execute a calculation before responding.

Think of it like...

A receptionist who has a phone and a computer. They can still only respond when you ask, but now they can look things up, check schedules, and give you accurate, real-time information instead of relying on a static script.

Level 3: Workflow agent

The system can decompose a goal into multiple steps, execute them in sequence, and adapt based on intermediate results. It follows a plan — sometimes a pre-defined workflow, sometimes one it generates itself — and makes decisions at each step without asking the user.2

For example: a research agent that searches multiple sources, compares findings, resolves contradictions, and produces a synthesis. Or a coding agent that reads an error, hypothesises a fix, edits the code, runs tests, and iterates.

Level 4: Autonomous agent

The system operates with minimal human oversight over extended periods. It sets sub-goals, manages its own resources, handles errors and edge cases, and may coordinate with other agents. Human involvement is limited to setting the initial goal and reviewing outcomes.3

For example: an agent that continuously monitors a codebase for security vulnerabilities, triages them by severity, generates fixes for low-risk issues, and escalates high-risk ones to a human reviewer.

Key distinction

The difference between Level 3 and Level 4 is not just capability — it is duration and oversight. A Level 3 agent completes a bounded task and returns. A Level 4 agent operates continuously, making ongoing decisions about what to do next. This is where guardrails, monitoring, and human-in-the-loop design become critical.4


Why do we use it?

Key reasons

1. Right-sizing complexity. The spectrum prevents over-engineering. If a Level 2 system solves the problem, building a Level 4 system wastes time, money, and introduces unnecessary failure modes.4

2. Managing risk. Higher autonomy means higher stakes. The spectrum gives designers a shared vocabulary for discussing how much control to hand over — and where to place guardrails.3

3. Setting expectations. Stakeholders, users, and developers all benefit from a clear classification. Saying “this is a Level 2 tool-augmented assistant” is far more precise than “this is an AI agent.”


When do we use it?

  • When designing a new AI system and deciding how much autonomy it needs
  • When evaluating whether an existing system is over- or under-built for its task
  • When communicating with stakeholders about what an AI system can and cannot do
  • When planning a roadmap — starting at Level 2 and progressively adding autonomy as you validate each layer
  • When deciding where to place human-in-the-loop checkpoints

Rule of thumb

Start at the lowest level that solves the problem. Move up the spectrum only when you have evidence that more autonomy will deliver value — and when you have the guardrails to manage the added risk.4


How can I think about it?

The restaurant kitchen

Imagine a restaurant kitchen with different levels of staff responsibility:

  • Level 1 (Reactive): A line cook who follows a recipe card exactly. They wait for an order, execute it, and stop. No improvisation.
  • Level 2 (Tool-augmented): A cook who can check the pantry, substitute ingredients, and adjust quantities based on what is available. Still follows the recipe, but can adapt to context.
  • Level 3 (Workflow): A sous chef who receives “make tonight’s special” and independently plans the dish, sources ingredients, delegates prep tasks, and assembles the result.
  • Level 4 (Autonomous): The head chef who plans the entire menu, manages the team, adjusts to seasonal ingredients, handles unexpected situations (a supplier cancellation, a VIP dietary restriction), and runs the kitchen night after night.

Each level is valuable. You do not need a head chef to boil pasta — but you do need one to run a kitchen.

The email spectrum

Think about how you manage email at different levels:

  • Level 1: You read and reply to every email yourself. No automation.
  • Level 2: Your email client auto-sorts messages into folders using rules. It uses tools (filters, labels) but only when you set them up.
  • Level 3: An AI assistant drafts replies for routine messages, flags urgent ones, and archives newsletters — but you review and approve before sending.
  • Level 4: A fully autonomous email manager that handles routine correspondence, schedules meetings, follows up on unanswered threads, and only escalates truly ambiguous situations to you.

Most people are comfortable at Level 2 or 3 for email. Level 4 feels risky because emails are high-stakes communication. This intuition — that the right level depends on the consequences of errors — applies to all agentic design.


Concepts to explore next

ConceptWhat it coversStatus
human-in-the-loopWhen and how to involve humans in agent decision-makingcomplete
orchestrationCoordinating multiple agents and managing complex workflowsstub
guardrailsConstraints that prevent agents from taking harmful actionsstub

Some cards don't exist yet

A broken link is a placeholder for future learning, not an error.


Check your understanding


Where this concept fits

Position in the knowledge graph

graph TD
    AIML[AI and Machine Learning] --> AS[Agentic Systems]
    AS --> AuS[Autonomy Spectrum]
    AS --> LLM[LLM Pipelines]
    AS --> ORCH[Orchestration]
    style AuS fill:#4a9ede,color:#fff

Related concepts:

  • human-in-the-loop — the design pattern for keeping humans involved at critical decision points, especially as autonomy increases
  • orchestration — how multiple agents at different autonomy levels are coordinated in a single system
  • guardrails — the constraints that make higher autonomy levels safe and reliable

Sources


Further reading

Resources

Footnotes

  1. Vellum AI. (2025). LLM Agents: The Six Levels of Agentic Behavior. Vellum AI. 2

  2. Kiran, T. (2026). Agent Autonomy Is a Spectrum: A Practical Maturity Model (L1 to L5). Medium. 2

  3. Falconer, S. (2025). The Practical Guide to the Levels of AI Agent Autonomy. Medium. 2 3 4

  4. Anthropic. (2024). Building Effective Agents. Anthropic. 2 3 4