Agent Memory

How AI agents remember information across conversation turns and sessions — the architectural systems that give stateless language models the ability to retain and recall context.

What is it?

Large language models are stateless by default.¹ Every time you start a new conversation, the model has no memory of anything you discussed before. It does not know your name, your preferences, or what it told you yesterday. Each interaction begins from a blank slate.

This is a fundamental problem for agentic systems. An agent that forgets everything between sessions cannot build on previous work, learn from past mistakes, or maintain a coherent relationship with its user. Memory is what transforms a stateless text generator into something that feels like a persistent collaborator.²

Agent memory is not a single mechanism — it is an architectural concern that spans at least three distinct types: short-term memory (the current conversation), working memory (a scratchpad for active reasoning), and long-term memory (persistent knowledge that survives across sessions).³ Each type solves a different problem, and most production agents need all three.

The fundamental constraint underlying all agent memory is the context window — the maximum amount of text a language model can process in a single call.⁴ Everything the model “knows” during a given interaction must fit inside this window. Memory engineering is, at its core, the art of deciding what goes into that limited space and what stays out.

In plain terms

A language model without memory is like a brilliant expert with amnesia. Every time you walk into their office, they have no idea who you are or what you discussed last time. Agent memory is the system of notes, files, and records that lets this expert pick up where they left off — the conversation summary pinned to their desk, the working notes on their whiteboard, and the client file in their cabinet.

At a glance

Three types of agent memory (click to expand)
graph TD
    AM[Agent Memory] --> STM[Short-Term Memory]
    AM --> WM[Working Memory]
    AM --> LTM[Long-Term Memory]
    STM -->|Conversation history| CW[Context Window]
    WM -->|Scratchpad and plans| CW
    LTM -->|Retrieved facts| CW
    CW --> LLM[Language Model]
    style AM fill:#4a9ede,color:#fff
Key: All three memory types ultimately feed into the context window — the limited text buffer the language model can process. Short-term memory is the current conversation. Working memory holds active plans and intermediate results. Long-term memory stores persistent knowledge retrieved on demand. The challenge is fitting the right information into a finite window.

How does it work?

1. Short-term memory: conversation history

The most basic form of memory. The system keeps a record of previous messages in the current conversation and includes them in each new request to the model.¹ This is how a chatbot “remembers” what you said three messages ago.

The limitation is the context window. As conversations grow longer, the history eventually exceeds what the model can process. At that point, the system must decide what to keep and what to discard — typically using strategies like truncation (drop the oldest messages), summarisation (compress earlier messages into a summary), or sliding windows (keep only the most recent N turns).³

Think of it like...

A whiteboard in a meeting room. Everyone can see what has been written so far. But the whiteboard has a fixed size — once it is full, someone has to erase the oldest notes to make room for new ones. The question is always: what is safe to erase?

2. Working memory: the scratchpad

Working memory is a dedicated space where the agent stores intermediate results, plans, and reasoning notes during a complex task.² Unlike conversation history (which records what was said), working memory records what the agent is thinking and doing.

For example, a research agent working through a multi-step analysis might maintain a scratchpad with: the current plan, which steps are complete, key findings so far, and open questions. This scratchpad is injected into the context window alongside the conversation history, giving the model access to its own prior reasoning.

Think of it like...

A detective’s case board — the wall covered in photos, notes, and red string connecting clues. It is not a record of conversations; it is a record of active thinking. The detective refers to it constantly while working, and it helps maintain coherence across a long investigation.

Example: a coding agent's working memory (click to expand)

Consider an agent debugging a failing test suite:

Working memory contents:

Goal: Fix failing test test_user_registration

Error: NullPointerException at line 42 in UserService.java

Hypothesis: The emailValidator dependency is not injected

Actions taken: Read UserService.java, confirmed missing @Inject annotation

Next step: Add annotation and re-run tests

This scratchpad keeps the agent focused across multiple tool calls (reading files, editing code, running tests) without losing track of what it has already tried and what it plans to do next.

3. Long-term memory: persistent knowledge

Long-term memory stores information that persists across sessions — user preferences, past project context, accumulated knowledge, learned facts.⁴ This is the most architecturally complex type because it requires a storage system external to the model itself.

Common implementations include:

Approach	How it works	Best for
Vector databases	Store text as embeddings, retrieve by semantic similarity	Finding relevant past context from large collections
Key-value stores	Store structured facts (user preferences, settings)	Quick lookup of specific known information
Knowledge graphs	Store entities and relationships	Complex queries about how things connect
File-based memory	Store summaries and notes as files the agent can read	Simple persistence in file-based workflows

The retrieval challenge is central: the agent must decide which long-term memories are relevant to the current task and pull only those into the context window.³ Retrieving too much wastes precious context space. Retrieving too little means the agent lacks critical information.

Concept to explore

See rag for the architectural pattern of retrieving external knowledge and injecting it into the context window at inference time.

4. The compression trade-off

Every memory system faces the same fundamental trade-off: remembering everything vs keeping context focused.² A model that receives its entire history — every conversation, every fact, every intermediate thought — will drown in irrelevant information and perform poorly. A model that receives too little will miss critical context.

Effective memory engineering uses several compression strategies:

Summarisation: Replace a 50-message conversation with a 5-sentence summary
Relevance filtering: Only retrieve memories that match the current query
Hierarchical memory: Store detailed records but surface only summaries, expanding to detail on demand
Forgetting: Deliberately discard information that is outdated or unlikely to be needed again

Key distinction

Context engineering is the broader practice of designing what goes into a model’s context window. Memory engineering is the subset focused specifically on persisting and retrieving information across turns and sessions.⁵ Both are about the same finite resource — the context window — but memory engineering adds the dimension of time.

Why do we use it?

Key reasons

1. Continuity. Without memory, every interaction is isolated. Memory lets an agent build on prior work, maintain relationships with users, and accumulate knowledge over time — turning disconnected exchanges into a coherent, evolving collaboration.²

2. Efficiency. An agent that remembers a user’s preferences, past decisions, and project context does not need to ask the same questions repeatedly. Memory eliminates redundant work and reduces the cognitive load on the user.⁴

3. Quality. Decisions informed by historical context are better decisions. An agent that remembers what approaches failed last time, what the user explicitly rejected, or what constraints were discovered mid-project produces higher-quality output.³

4. Trust. Users trust systems that remember them. An agent that forgets your name between sessions feels broken. One that recalls your preferences and builds on prior conversations feels like a genuine collaborator.

When do we use it?

When an agent needs to maintain context across multiple sessions (not just within a single conversation)
When tasks are long-running and span days or weeks of iterative work
When the agent serves repeat users who expect personalisation and continuity
When the agent must learn from past interactions — avoiding repeated mistakes or building on accumulated knowledge
When the context window is too small to hold all relevant information at once

Rule of thumb

If the agent only needs to handle a single question-and-answer exchange, memory is unnecessary. If the agent needs to remember anything from a previous turn, session, or user interaction, memory is an architectural requirement — not a nice-to-have feature.

How can I think about it?

The three notebooks

Imagine a consultant who carries three notebooks:

The conversation notebook (short-term memory): Notes from the current meeting. Written in real time, reviewed during the discussion, and eventually archived or discarded. If the meeting runs very long, the consultant flips back through the notebook but can only hold so many pages in their active attention.

The project notebook (working memory): A dedicated scratchpad for the active project. Contains the current plan, open questions, decisions made, and next steps. Carried to every meeting about this project. Thrown away when the project ends.

The client file (long-term memory): A folder in the filing cabinet with everything the consultant knows about this client — past projects, preferences, org chart, lessons learned. Pulled out before each meeting, but only the relevant sections are reviewed.

The consultant’s effectiveness depends on all three working together. Miss the conversation notebook and you lose the thread. Miss the project notebook and you lose the plan. Miss the client file and you start from zero every engagement.

The restaurant regular

Think about what happens when a regular customer walks into a restaurant:

Short-term memory: The waiter remembers what you ordered tonight and that you asked for no onions (current conversation).

Working memory: The kitchen’s order board tracks which courses are prepared, which are pending, and any special modifications (active task state).

Long-term memory: The reservation system notes that you are a regular, you prefer the corner table, you are allergic to shellfish, and you always order the house red (persistent profile).

A restaurant that only has short-term memory treats every visit as a first visit. A restaurant with all three memory types makes you feel known and valued. The same principle applies to AI agents — memory is what transforms a transactional tool into a trusted assistant.

Concepts to explore next

Concept	What it covers	Status
context-cascading	How to structure and layer context for AI systems	complete
rag	Retrieving external knowledge and injecting it at inference time	stub
knowledge-graphs	Storing entities and relationships for structured retrieval	stub

Some cards don't exist yet

A broken link is a placeholder for future learning, not an error.

Check your understanding

Test yourself (click to expand)

Explain why memory is an architectural concern for AI agents, not just a feature. What fundamental property of language models makes memory necessary?

Name the three types of agent memory and describe what each one stores.

Distinguish between short-term memory and working memory. How do their contents and purposes differ?

Interpret this scenario: an agent consistently gives excellent answers in the first 10 minutes of a conversation but becomes confused and contradictory after 30 minutes. Which memory type is likely failing, and what strategy might fix it?

Connect agent memory to the concept of rag: how does retrieval-augmented generation function as a form of long-term memory?

Where this concept fits

Position in the knowledge graph
graph TD
    AIML[AI and Machine Learning] --> AS[Agentic Systems]
    AS --> AMEM[Agent Memory]
    AS --> LLM[LLM Pipelines]
    AS --> ORCH[Orchestration]
    style AMEM fill:#4a9ede,color:#fff
Related concepts:

context-cascading — a pattern for structuring what information is loaded into the context window and in what order

rag — retrieval-augmented generation is a primary implementation pattern for long-term agent memory

knowledge-graphs — a structured storage approach for long-term memory that preserves relationships between facts

Explorer

Agent Memory

Agent Memory

What is it?

At a glance

How does it work?

1. Short-term memory: conversation history

2. Working memory: the scratchpad

3. Long-term memory: persistent knowledge

4. The compression trade-off

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Graph View

Table of Contents

Backlinks

Explorer

Agent Memory

Agent Memory

What is it?

At a glance

How does it work?

1. Short-term memory: conversation history

2. Working memory: the scratchpad

3. Long-term memory: persistent knowledge

4. The compression trade-off

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Footnotes

Graph View

Table of Contents

Backlinks