Context Engineering

The practice of curating, structuring, and managing all the information a language model can see when it generates a response — not just the prompt, but everything around it.


What is it?

Most people think working with AI is about writing good prompts. It is not. The prompt — the instruction you type — is a fraction of what the model actually sees. The model also sees a system prompt, retrieved documents, conversation history, tool definitions, memory from prior sessions, and metadata. All of it. At once. In a single context window. The quality of that entire context determines the quality of the output far more than the wording of any single instruction.1

In June 2025, Tobi Lutke, CEO of Shopify, proposed a reframe that caught fire across the industry: “I really like the term ‘context engineering’ over ‘prompt engineering.’ It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.”2 Andrej Karpathy, former head of AI at Tesla, agreed: “People associate prompts with short task descriptions. In every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step.”3

The distinction matters because it shifts the focus from what you say to what the model knows. A mediocre prompt surrounded by excellent context produces better results than a brilliant prompt surrounded by noise. Prompt engineering asks “how should I phrase this?” Context engineering asks “what does the model need to see before I ask anything at all?”4

Philipp Schmid, a technical lead at Hugging Face, put it concisely: think of an LLM like a CPU, and its context window as RAM. Your job is akin to an operating system — load that working memory with just the right code and data for the task.5

In plain terms

Context engineering is the skill of preparing everything an AI model will read before it answers you. It is the difference between asking a colleague a question cold and asking them the same question after handing them the relevant documents, the project history, and a clear brief. The question is the same. The answer is vastly different.


At a glance


How does it work?

The context window — finite and competitive

Every language model has a context window — the maximum amount of text it can process at once. Claude’s window is up to 200,000 tokens. GPT-4o supports 128,000. Gemini goes to 2 million. These numbers sound enormous, but they are smaller than they appear, because every token competes for the model’s attention.1

Research from Chroma demonstrates that every frontier model exhibits context rot — performance degrades as the context window fills up. Accuracy drops start around 32,000 tokens for most models. The model does not “forget” information in the window; it spreads its attention thinner across more tokens, making it more likely to miss or misweight critical information.6

Stanford’s “Lost in the Middle” research shows the pattern is not uniform: LLMs attend most to information at the beginning and end of the context, with a significant accuracy dip for information placed in the middle — a U-shaped attention curve.7

Think of it like...

A desk. You can pile as many documents on it as you like, but if the desk is covered in irrelevant papers, you will struggle to find the one fact you need. Context engineering is keeping the desk clear except for exactly the documents this task requires — and placing the most important ones where your eyes naturally land.

What goes in — the five sources of context

Anthropic’s engineering guide identifies five categories of information that fill a context window:1

SourceWhat it isExample
System promptStanding instructions loaded every session”You are a senior data analyst. Always cite sources.”
Retrieved documentsInformation pulled from external sources at runtimeDatabase records, API responses, RAG results
Conversation historyPrior messages in the current sessionUser’s earlier questions and the model’s responses
Tool definitionsDescriptions of tools the model can callFile read, web search, database query schemas
Memory and statePersistent information from prior sessionsUser preferences, project decisions, past outputs

The challenge is that all five compete for the same finite window. Loading a 50-page document as context leaves less room for conversation history. Adding 20 tool definitions reduces space for retrieved data. Context engineering is the discipline of making these trade-offs deliberately rather than by accident.1

How to manage it — four core strategies

Anthropic’s guide documents four production-tested strategies for keeping context healthy:1

1. Compaction

As conversations grow, older messages accumulate and consume the window. Compaction summarises the conversation while preserving key decisions, creating a compressed representation that retains meaning with fewer tokens. The model “reads its own notes” instead of re-processing every prior message.

2. Structured note-taking

Rather than keeping everything in the conversation, the agent writes persistent notes to files outside the context window. These notes can be retrieved later when relevant, but they do not consume context when they are not needed. This is how agent-memory works in practice — the model externalises information to a persistent store and retrieves it on demand.

3. Sub-agent isolation

Complex tasks are decomposed into sub-tasks, each handled by a specialised agent with a clean context window. The sub-agent processes its task in isolation and returns only a distilled summary to the parent agent. This prevents context pollution — the phenomenon where irrelevant information from one task degrades performance on another.1

graph TD
    O[Orchestrator Agent] --> S1[Sub-Agent: Research]
    O --> S2[Sub-Agent: Analysis]
    O --> S3[Sub-Agent: Writing]
    S1 -->|summary only| O
    S2 -->|summary only| O
    S3 -->|summary only| O

    style O fill:#4a9ede,color:#fff

4. Just-in-time loading

Instead of pre-loading all potentially relevant information, the agent maintains lightweight references (file paths, URLs, query identifiers) and loads data into context only when a specific task requires it. This keeps the context lean until the moment the information is actually needed.1

The fundamental principle

Anthropic’s guide frames it as an optimisation problem: find “the smallest set of high-signal tokens that maximise the likelihood of your desired outcome.” Every irrelevant token in context actively degrades performance. Less is more — but the right less.1

How to structure it — placement and ordering

What goes in matters. Where it goes matters just as much.1

PrincipleWhy it works
Reference material first, query lastExploits the model’s recency bias; queries placed at the end improve response quality
Static content before dynamic contentEnables prompt caching (up to 90% cost reduction) and gives stable context higher positional priority
Most important at the edgesMitigates the “Lost in the Middle” effect — critical information performs best at the beginning or end of context
Layer from general to specificcontext-cascading — global rules, then domain context, then task instructions. Each layer narrows the next

Context engineering vs prompt engineering

The two are not opposites — they are layers. Prompt engineering is about crafting the instruction. Context engineering is about curating everything around it. In practice, most failures attributed to “bad prompts” are actually context failures: the model had the right instruction but the wrong (or insufficient) surrounding information.4

Prompt engineeringContext engineering
FocusWhat you say to the modelWhat the model knows when you say it
ScopeThe instruction or querySystem prompt + documents + history + tools + memory
Typical fixRewrite the promptRestructure, filter, or augment the context
AnalogyWriting a better question on an examStudying the right material before the exam

Think of it like...

Prompt engineering is learning how to ask good questions in a meeting. Context engineering is preparing the pre-read documents, setting the agenda, and inviting the right people to the room. You need both, but the pre-work determines whether the meeting is productive regardless of how well you phrase your questions.


Why do we use it?

Key reasons

1. The same prompt produces wildly different results depending on context. Ask “write a validation function” with no context and you get generic code. Ask it with your codebase’s patterns, conventions, and test framework loaded, and you get code that fits. The prompt is identical. The context changes everything.4

2. Context rot is real and measurable. Every frontier model degrades as context fills up. Context engineering is the discipline of keeping the window lean, relevant, and well-structured so the model can reason effectively.6

3. It unlocks autonomous agents. An agent that manages its own context — compacting, externalising notes, loading information just in time — can operate over long sessions without degrading. An agent with no context strategy runs out of window or drowns in noise.1

4. It is the highest-leverage skill for AI practitioners. Karpathy’s framing is widely cited because it is true: in production AI systems, the context around the prompt matters more than the prompt itself.3


When do we use it?

  • When building any AI-assisted workflow that involves more than a single chat exchange
  • When an AI system needs access to external knowledge (documents, databases, codebases)
  • When running long conversations where earlier context risks being lost
  • When deploying autonomous agents that must manage their own information across steps
  • When AI outputs are inconsistent despite identical prompts — the problem is almost always context, not the prompt
  • When costs are high — better context curation reduces token usage and enables caching

Rule of thumb

If you have tried improving the prompt three times and the output is still wrong, stop rewriting the prompt. Look at what else is in the context window. Nine times out of ten, that is where the problem lives.


How can I think about it?

The briefing room analogy

Imagine you are a general about to make a critical decision. You walk into a briefing room. Context engineering determines what is on the walls and tables when you arrive:

  • System prompt = the standing orders posted on the wall. Always there, always applicable.
  • Retrieved documents = the intelligence reports an aide has selected for this specific decision.
  • Conversation history = the notes from prior meetings, summarised (compacted) to the key decisions.
  • Tool definitions = the communication equipment available — radio, satellite, drone feeds.
  • Memory = the classified files in the secure cabinet, pulled out only when relevant.

A well-prepared briefing room makes the decision obvious. A poorly prepared one — cluttered with irrelevant reports, missing key intelligence — leads to bad decisions regardless of how smart the general is.

The chef analogy

A chef (the model) receives an order (the prompt). But the quality of the dish depends far more on what is in the kitchen:

  • Mise en place (context curation) — the right ingredients, pre-measured, at the right temperature, within arm’s reach.
  • Recipe card (system prompt) — the standing technique for this type of dish.
  • Pantry (retrieval) — the full inventory, but you only pull what this dish needs.
  • Fridge (memory) — ingredients prepared yesterday, stored for today’s service.

A mediocre chef with perfect mise en place produces a better dish than a great chef rummaging through a disorganised kitchen. Context engineering is the mise en place of AI.


Concepts to explore next

ConceptWhat it coversStatus
context-cascadingLayering context from general to specific for progressive understandingcomplete
ragRetrieving external documents to inject into context at runtimecomplete
agent-memoryPersistent storage outside the context window, retrieved on demandcomplete
context-rotHow model performance degrades as context fills upstub
context-compactionSummarising conversation history to free context spacestub
just-in-time-loadingLoading data into context only when a task requires itstub
harness-engineeringThe system layer that manages context alongside evals and guardrailscomplete

Some of these cards don't exist yet

They’ll be created as the knowledge system grows. A broken link is a placeholder for future learning, not an error.


Check your understanding


Where this concept fits

Position in the knowledge graph

graph TD
    AS[Agentic Systems] --> CE[Context Engineering]
    AS --> HE[Harness Engineering]
    AS --> LP[LLM Pipelines]
    CE --> CC[Context Cascading]
    CE --> RAG[RAG]
    CE --> AM[Agent Memory]
    CE -.->|managed by| HE
    style CE fill:#4a9ede,color:#fff

Related concepts:

  • context-cascading — the specific pattern of layering context from general to specific, implementing context engineering’s “right information, right order” principle
  • rag — retrieval-augmented generation loads external knowledge into context at runtime, a core context engineering strategy
  • agent-memory — externalising information outside the context window so it can be retrieved when needed
  • harness-engineering — the system layer that manages context engineering alongside evaluations and guardrails

Sources


Further reading

Resources

Footnotes

  1. Anthropic. (2025). Effective Context Engineering for AI Agents. Anthropic Engineering. The canonical guide to context curation, compaction, sub-agent isolation, and just-in-time loading. 2 3 4 5 6 7 8 9 10

  2. Lutke, T. (2025). Tweet on context engineering. June 19, 2025. The tweet that popularised “context engineering” as a term.

  3. Karpathy, A. (2025). Tweet on context engineering. June 2025. Endorsed the term and distinguished it from casual prompting. 2

  4. Willison, S. (2025). Context Engineering. simonwillison.net. Why “context engineering” is a better frame than “prompt engineering” and what the shift means in practice. 2 3

  5. Schmid, P. (2025). The New Skill in AI is Not Prompting, It’s Context Engineering. philschmid.de. Practical introduction with the CPU/RAM analogy and context window management strategies.

  6. Chroma Research. (2025). Context Rot: How Increasing Input Tokens Impacts LLM Performance. Chroma. Empirical evidence that every frontier model degrades as context length increases. 2

  7. Liu, N.F. et al. (2024). Lost in the Middle: How Language Models Use Long Contexts. TACL 2024 / Stanford. Demonstrates the U-shaped attention curve in long-context LLMs.