Context Engineering
The practice of curating, structuring, and managing all the information a language model can see when it generates a response — not just the prompt, but everything around it.
What is it?
Most people think working with AI is about writing good prompts. It is not. The prompt — the instruction you type — is a fraction of what the model actually sees. The model also sees a system prompt, retrieved documents, conversation history, tool definitions, memory from prior sessions, and metadata. All of it. At once. In a single context window. The quality of that entire context determines the quality of the output far more than the wording of any single instruction.1
In June 2025, Tobi Lutke, CEO of Shopify, proposed a reframe that caught fire across the industry: “I really like the term ‘context engineering’ over ‘prompt engineering.’ It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.”2 Andrej Karpathy, former head of AI at Tesla, agreed: “People associate prompts with short task descriptions. In every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step.”3
The distinction matters because it shifts the focus from what you say to what the model knows. A mediocre prompt surrounded by excellent context produces better results than a brilliant prompt surrounded by noise. Prompt engineering asks “how should I phrase this?” Context engineering asks “what does the model need to see before I ask anything at all?”4
Philipp Schmid, a technical lead at Hugging Face, put it concisely: think of an LLM like a CPU, and its context window as RAM. Your job is akin to an operating system — load that working memory with just the right code and data for the task.5
In plain terms
Context engineering is the skill of preparing everything an AI model will read before it answers you. It is the difference between asking a colleague a question cold and asking them the same question after handing them the relevant documents, the project history, and a clear brief. The question is the same. The answer is vastly different.
At a glance
What fills the context window (click to expand)
graph TD subgraph Context Window SP[System Prompt] --> I[Task Instruction] RD[Retrieved Documents] --> I CH[Conversation History] --> I TD[Tool Definitions] --> I MM[Memory and State] --> I end I --> O[Model Output] style I fill:#4a9ede,color:#fff style SP fill:#5cb85c,color:#fffKey: Everything in the context window is processed together. The model does not distinguish between “prompt” and “background” — it attends to all of it simultaneously. Context engineering is the practice of controlling what fills this window.
How does it work?
The context window — finite and competitive
Every language model has a context window — the maximum amount of text it can process at once. Claude’s window is up to 200,000 tokens. GPT-4o supports 128,000. Gemini goes to 2 million. These numbers sound enormous, but they are smaller than they appear, because every token competes for the model’s attention.1
Research from Chroma demonstrates that every frontier model exhibits context rot — performance degrades as the context window fills up. Accuracy drops start around 32,000 tokens for most models. The model does not “forget” information in the window; it spreads its attention thinner across more tokens, making it more likely to miss or misweight critical information.6
Stanford’s “Lost in the Middle” research shows the pattern is not uniform: LLMs attend most to information at the beginning and end of the context, with a significant accuracy dip for information placed in the middle — a U-shaped attention curve.7
Think of it like...
A desk. You can pile as many documents on it as you like, but if the desk is covered in irrelevant papers, you will struggle to find the one fact you need. Context engineering is keeping the desk clear except for exactly the documents this task requires — and placing the most important ones where your eyes naturally land.
What goes in — the five sources of context
Anthropic’s engineering guide identifies five categories of information that fill a context window:1
| Source | What it is | Example |
|---|---|---|
| System prompt | Standing instructions loaded every session | ”You are a senior data analyst. Always cite sources.” |
| Retrieved documents | Information pulled from external sources at runtime | Database records, API responses, RAG results |
| Conversation history | Prior messages in the current session | User’s earlier questions and the model’s responses |
| Tool definitions | Descriptions of tools the model can call | File read, web search, database query schemas |
| Memory and state | Persistent information from prior sessions | User preferences, project decisions, past outputs |
The challenge is that all five compete for the same finite window. Loading a 50-page document as context leaves less room for conversation history. Adding 20 tool definitions reduces space for retrieved data. Context engineering is the discipline of making these trade-offs deliberately rather than by accident.1
How to manage it — four core strategies
Anthropic’s guide documents four production-tested strategies for keeping context healthy:1
1. Compaction
As conversations grow, older messages accumulate and consume the window. Compaction summarises the conversation while preserving key decisions, creating a compressed representation that retains meaning with fewer tokens. The model “reads its own notes” instead of re-processing every prior message.
2. Structured note-taking
Rather than keeping everything in the conversation, the agent writes persistent notes to files outside the context window. These notes can be retrieved later when relevant, but they do not consume context when they are not needed. This is how agent-memory works in practice — the model externalises information to a persistent store and retrieves it on demand.
3. Sub-agent isolation
Complex tasks are decomposed into sub-tasks, each handled by a specialised agent with a clean context window. The sub-agent processes its task in isolation and returns only a distilled summary to the parent agent. This prevents context pollution — the phenomenon where irrelevant information from one task degrades performance on another.1
graph TD O[Orchestrator Agent] --> S1[Sub-Agent: Research] O --> S2[Sub-Agent: Analysis] O --> S3[Sub-Agent: Writing] S1 -->|summary only| O S2 -->|summary only| O S3 -->|summary only| O style O fill:#4a9ede,color:#fff
4. Just-in-time loading
Instead of pre-loading all potentially relevant information, the agent maintains lightweight references (file paths, URLs, query identifiers) and loads data into context only when a specific task requires it. This keeps the context lean until the moment the information is actually needed.1
The fundamental principle
Anthropic’s guide frames it as an optimisation problem: find “the smallest set of high-signal tokens that maximise the likelihood of your desired outcome.” Every irrelevant token in context actively degrades performance. Less is more — but the right less.1
How to structure it — placement and ordering
What goes in matters. Where it goes matters just as much.1
| Principle | Why it works |
|---|---|
| Reference material first, query last | Exploits the model’s recency bias; queries placed at the end improve response quality |
| Static content before dynamic content | Enables prompt caching (up to 90% cost reduction) and gives stable context higher positional priority |
| Most important at the edges | Mitigates the “Lost in the Middle” effect — critical information performs best at the beginning or end of context |
| Layer from general to specific | context-cascading — global rules, then domain context, then task instructions. Each layer narrows the next |
Context engineering vs prompt engineering
The two are not opposites — they are layers. Prompt engineering is about crafting the instruction. Context engineering is about curating everything around it. In practice, most failures attributed to “bad prompts” are actually context failures: the model had the right instruction but the wrong (or insufficient) surrounding information.4
| Prompt engineering | Context engineering | |
|---|---|---|
| Focus | What you say to the model | What the model knows when you say it |
| Scope | The instruction or query | System prompt + documents + history + tools + memory |
| Typical fix | Rewrite the prompt | Restructure, filter, or augment the context |
| Analogy | Writing a better question on an exam | Studying the right material before the exam |
Think of it like...
Prompt engineering is learning how to ask good questions in a meeting. Context engineering is preparing the pre-read documents, setting the agenda, and inviting the right people to the room. You need both, but the pre-work determines whether the meeting is productive regardless of how well you phrase your questions.
Why do we use it?
Key reasons
1. The same prompt produces wildly different results depending on context. Ask “write a validation function” with no context and you get generic code. Ask it with your codebase’s patterns, conventions, and test framework loaded, and you get code that fits. The prompt is identical. The context changes everything.4
2. Context rot is real and measurable. Every frontier model degrades as context fills up. Context engineering is the discipline of keeping the window lean, relevant, and well-structured so the model can reason effectively.6
3. It unlocks autonomous agents. An agent that manages its own context — compacting, externalising notes, loading information just in time — can operate over long sessions without degrading. An agent with no context strategy runs out of window or drowns in noise.1
4. It is the highest-leverage skill for AI practitioners. Karpathy’s framing is widely cited because it is true: in production AI systems, the context around the prompt matters more than the prompt itself.3
When do we use it?
- When building any AI-assisted workflow that involves more than a single chat exchange
- When an AI system needs access to external knowledge (documents, databases, codebases)
- When running long conversations where earlier context risks being lost
- When deploying autonomous agents that must manage their own information across steps
- When AI outputs are inconsistent despite identical prompts — the problem is almost always context, not the prompt
- When costs are high — better context curation reduces token usage and enables caching
Rule of thumb
If you have tried improving the prompt three times and the output is still wrong, stop rewriting the prompt. Look at what else is in the context window. Nine times out of ten, that is where the problem lives.
How can I think about it?
The briefing room analogy
Imagine you are a general about to make a critical decision. You walk into a briefing room. Context engineering determines what is on the walls and tables when you arrive:
- System prompt = the standing orders posted on the wall. Always there, always applicable.
- Retrieved documents = the intelligence reports an aide has selected for this specific decision.
- Conversation history = the notes from prior meetings, summarised (compacted) to the key decisions.
- Tool definitions = the communication equipment available — radio, satellite, drone feeds.
- Memory = the classified files in the secure cabinet, pulled out only when relevant.
A well-prepared briefing room makes the decision obvious. A poorly prepared one — cluttered with irrelevant reports, missing key intelligence — leads to bad decisions regardless of how smart the general is.
The chef analogy
A chef (the model) receives an order (the prompt). But the quality of the dish depends far more on what is in the kitchen:
- Mise en place (context curation) — the right ingredients, pre-measured, at the right temperature, within arm’s reach.
- Recipe card (system prompt) — the standing technique for this type of dish.
- Pantry (retrieval) — the full inventory, but you only pull what this dish needs.
- Fridge (memory) — ingredients prepared yesterday, stored for today’s service.
A mediocre chef with perfect mise en place produces a better dish than a great chef rummaging through a disorganised kitchen. Context engineering is the mise en place of AI.
Concepts to explore next
| Concept | What it covers | Status |
|---|---|---|
| context-cascading | Layering context from general to specific for progressive understanding | complete |
| rag | Retrieving external documents to inject into context at runtime | complete |
| agent-memory | Persistent storage outside the context window, retrieved on demand | complete |
| context-rot | How model performance degrades as context fills up | stub |
| context-compaction | Summarising conversation history to free context space | stub |
| just-in-time-loading | Loading data into context only when a task requires it | stub |
| harness-engineering | The system layer that manages context alongside evals and guardrails | complete |
Some of these cards don't exist yet
They’ll be created as the knowledge system grows. A broken link is a placeholder for future learning, not an error.
Check your understanding
Test yourself (click to expand)
- Explain why context engineering matters more than prompt engineering for production AI systems. What changes when you shift focus from “what you say” to “what the model sees”?
- Name the five sources of information that fill a context window and give an example of each.
- Distinguish between compaction and just-in-time loading. When would you use each strategy?
- Interpret this scenario: an AI assistant produces excellent summaries in short conversations but gives increasingly vague answers as the conversation exceeds 30 messages. The prompts have not changed. Using what you have learned, diagnose the problem and propose a solution.
- Connect context engineering to context-cascading. How does the cascading pattern implement the principle of “less is more” in practice?
Where this concept fits
Position in the knowledge graph
graph TD AS[Agentic Systems] --> CE[Context Engineering] AS --> HE[Harness Engineering] AS --> LP[LLM Pipelines] CE --> CC[Context Cascading] CE --> RAG[RAG] CE --> AM[Agent Memory] CE -.->|managed by| HE style CE fill:#4a9ede,color:#fffRelated concepts:
- context-cascading — the specific pattern of layering context from general to specific, implementing context engineering’s “right information, right order” principle
- rag — retrieval-augmented generation loads external knowledge into context at runtime, a core context engineering strategy
- agent-memory — externalising information outside the context window so it can be retrieved when needed
- harness-engineering — the system layer that manages context engineering alongside evaluations and guardrails
Sources
Further reading
Resources
- Effective Context Engineering for AI Agents (Anthropic) — The definitive guide to context curation, compaction, memory, and sub-agent architectures
- Context Engineering (Simon Willison) — Clear, concise explanation of why the shift from prompt to context engineering matters
- The New Skill in AI is Not Prompting, It’s Context Engineering (Philipp Schmid) — Practical overview with the CPU/RAM mental model
- Context Engineering: Bringing Engineering Discipline to Prompts (O’Reilly) — Enterprise perspective on operationalising context engineering
- Context Engineering Best Practices (Redis) — Production best practices from a data infrastructure perspective
Footnotes
-
Anthropic. (2025). Effective Context Engineering for AI Agents. Anthropic Engineering. The canonical guide to context curation, compaction, sub-agent isolation, and just-in-time loading. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10
-
Lutke, T. (2025). Tweet on context engineering. June 19, 2025. The tweet that popularised “context engineering” as a term. ↩
-
Karpathy, A. (2025). Tweet on context engineering. June 2025. Endorsed the term and distinguished it from casual prompting. ↩ ↩2
-
Willison, S. (2025). Context Engineering. simonwillison.net. Why “context engineering” is a better frame than “prompt engineering” and what the shift means in practice. ↩ ↩2 ↩3
-
Schmid, P. (2025). The New Skill in AI is Not Prompting, It’s Context Engineering. philschmid.de. Practical introduction with the CPU/RAM analogy and context window management strategies. ↩
-
Chroma Research. (2025). Context Rot: How Increasing Input Tokens Impacts LLM Performance. Chroma. Empirical evidence that every frontier model degrades as context length increases. ↩ ↩2
-
Liu, N.F. et al. (2024). Lost in the Middle: How Language Models Use Long Contexts. TACL 2024 / Stanford. Demonstrates the U-shaped attention curve in long-context LLMs. ↩
