How to Talk to AI: From Prompts to Context to Harnesses

Most people think the skill of working with AI is knowing what to type. It is not. The skill is knowing what the model needs to see, how to verify what it produces, and how to build a system that stays reliable when you are not watching.

Who this is for

You have used ChatGPT, Claude, or Copilot. You have typed prompts and received outputs. Sometimes the output was brilliant. Sometimes it was confidently wrong. You want to understand why the difference — and how to make the good results consistent.

This path is for you if:

You use AI tools regularly but feel like results are hit-or-miss
You want to move from casual prompting to building reliable AI-assisted workflows
You want to understand the full stack of LLM management, not just “prompt tricks”

What this article is NOT

This is not a list of prompt templates. This is a thinking framework for understanding the three layers of LLM management — and knowing which layer to invest in for your situation.

Part 1 — Intent, not incantation

The most common misconception about working with AI: that there are magic words. That if you phrase your prompt just right — add “think step by step,” say “you are an expert,” use the right XML tags — the model will produce perfect output.

This is backwards. The single most important factor in LLM output quality is not phrasing. It is intent clarity — how precisely you have communicated what you want, why you want it, and what “good” looks like.¹

graph LR
    A[Vague intent] -->|any prompt format| B[Unpredictable output]
    C[Clear intent] -->|any prompt format| D[Useful output]

    style B fill:#e74c3c,color:#fff
    style D fill:#5cb85c,color:#fff

Anthropic’s prompting guide puts it directly: “Think of Claude as a brilliant but new employee who lacks context about your norms, preferences, and workflows. The more precisely you communicate what you want, the better the output.”¹

The test is simple: show your prompt to a colleague who knows nothing about your project. If they would be confused about what you want, the model will be too. Format tricks cannot compensate for unclear intent.

This does not mean format is irrelevant. Structure helps. XML tags reduce misinterpretation. Examples anchor expectations. But these are amplifiers of clear intent, not substitutes for it.

Why this matters for you

Stop searching for the perfect prompt template. Start by writing down — in plain language — exactly what you want, why you want it, what constraints apply, and what the output should look like. That document is the prompt. Everything else is formatting.

Part 2 — Prompt engineering: the instruction layer

prompt-engineering is the practice of crafting the instruction you give to a language model. It is the first layer of LLM management, and despite the hype, what actually matters is surprisingly simple.¹

What works

Anthropic, OpenAI, and Google converge on the same fundamentals:¹²

Principle	What it means	Why it works
Be specific	State exactly what you want, not roughly	The model has no way to infer unstated requirements
Explain why	Give the motivation behind constraints	”Never use ellipses — this will be read by text-to-speech” is clearer than “never use ellipses”
Show examples	3-5 diverse input/output pairs	Examples anchor format and quality expectations better than any description
Assign a role	”You are a senior data analyst”	Focuses tone, vocabulary, and depth
Structure the output	Specify format, length, sections	Removes ambiguity about what “a good response” looks like

What does not work

Magic phrases. “Think step by step” helps reasoning models slightly but is not a universal fix. On reasoning-native models like o3 and DeepSeek-R1, chain-of-thought instructions actually degrade output because they think internally.³
Over-engineering format. Spending hours on XML tag structure when the underlying intent is unclear produces beautifully formatted wrong answers.
Portable prompts. The same prompt behaves differently across models. Claude responds well to XML tags. GPT-5 responds well to structured persona adoption. Treat prompts as model-specific.³

Prompt chaining

Complex tasks cannot be handled in a single prompt. prompt-chaining breaks a task into sequential steps where each output feeds the next.⁴

graph LR
    A[Step 1: Research] -->|output| B[Step 2: Outline]
    B -->|output| C[Step 3: Draft]
    C -->|output| D[Step 4: Review]
    D -->|feedback| C

    style A fill:#e8b84b,color:#fff
    style D fill:#4a9ede,color:#fff

Three patterns:

Pattern	How it works	When to use
Linear	Each step feeds the next	Tasks with clear sequential dependencies
Branching	Output triggers different paths	Tasks where the next step depends on the result
Self-correction loop	Generate, review against criteria, refine	Any task where quality matters

The self-correction loop is now the most common production pattern: draft something, have the model evaluate it against explicit criteria, then revise based on the evaluation. This catches errors that a single pass misses.

Chunking

Chunking is different from chaining. Where chaining breaks tasks into steps, chunking breaks inputs into manageable pieces. A 50-page document exceeds what a model can process well in one pass — splitting it into sections and processing each individually produces better results than dumping everything in at once.⁵

The prompt engineering principle

Prompt engineering is necessary but not sufficient. It governs what you ask the model. But the quality of the answer depends at least as much on what the model knows when it answers. That is context engineering.

Part 3 — Context engineering: the information layer

In June 2025, Tobi Lutke, CEO of Shopify, proposed a reframe: “I really like the term ‘context engineering’ over ‘prompt engineering.’ It describes the core skill better: the art of providing all the context for the task to be plausibly solvable by the LLM.”⁶

Andrej Karpathy agreed: “People associate prompts with short task descriptions. In every industrial-strength LLM app, context engineering is the delicate art and science of filling the context window with just the right information for the next step.”⁷

context-engineering is the practice of curating and structuring all the information a model can see when it generates a response — not just your instruction, but the system prompt, the retrieved documents, the conversation history, the tool definitions, and the memory from prior sessions.⁵

Why context beats prompting

The same prompt produces radically different outputs depending on what context surrounds it. Ask “write a function to validate email” with no context and you get a generic regex. Ask it with your codebase’s existing validation patterns, your error handling conventions, and your test framework loaded into context, and you get code that fits your project.

graph TD
    subgraph Context Window
        SP[System Prompt] --> TI[Task Instruction]
        RD[Retrieved Documents] --> TI
        CH[Conversation History] --> TI
        TD[Tool Definitions] --> TI
        MM[Memory and State] --> TI
    end
    TI --> O[Model Output]

    style TI fill:#4a9ede,color:#fff

What goes in, and in what order

Anthropic’s research shows that placement matters:⁵

Put reference material first, query last. Queries placed at the end of context improve response quality by up to 30%.
Static content before dynamic content. This also enables prompt caching, reducing costs by up to 90%.
Less is more. Every token in context competes for the model’s attention. Irrelevant information degrades output — a phenomenon researchers call context rot.⁸

Stanford’s “Lost in the Middle” research confirms the risk: LLMs exhibit a U-shaped attention curve, with highest accuracy when relevant information is at the beginning or end of context, and 30%+ degradation when critical information sits in the middle.⁹

Structuring context for quality

The key insight from Anthropic’s context engineering guide: treat context as a precious, finite resource and find “the smallest set of high-signal tokens that maximise the likelihood of your desired outcome.”⁵

Practical strategies:

Strategy	What it does
Compaction	Summarise conversation history while preserving key decisions
Structured note-taking	Agents write notes persisted outside the context, retrievable later
The AGENTS.md pattern	Preload architectural context; use tools for just-in-time retrieval
Sub-agent isolation	Specialised agents handle focused tasks with clean contexts, returning distilled summaries

Avoiding slop

“Slop” — generic, verbose, low-signal output — is almost always a context problem, not a model problem. The model produces slop when it lacks the specific context needed to produce something precise. The fix is not a better prompt. The fix is better context: real examples, specific constraints, domain knowledge, and a clear definition of what “good” looks like.

context-cascading formalises this: instructions are organised in layers from broad to specific — global rules, then domain context, then task instructions, then output format — loaded in sequence so each layer narrows the scope of the next.

The context engineering principle

The quality of an LLM’s output is bounded by the quality of its context. A mediocre prompt with excellent context beats an excellent prompt with poor context — every time.

Part 4 — Harness engineering: the system layer

In February 2026, Mitchell Hashimoto (co-founder of HashiCorp) published an account of his AI adoption journey and gave a name to a practice that production teams had been developing independently: harness engineering.¹⁰

The core equation: Agent = Model + Harness.

The harness is everything in an AI system except the model itself: the tools, the permissions, the state management, the testing, the logging, the guardrails, the feedback loops, and the verification systems that make the model’s output reliable.¹¹

graph TD
    subgraph Harness
        G[Guides] --> M[Model]
        M --> S[Sensors]
        S -->|feedback| M
        G -.->|AGENTS.md, specs, rules| G
        S -.->|tests, linters, evals| S
    end

    style M fill:#4a9ede,color:#fff
    style G fill:#5cb85c,color:#fff
    style S fill:#e8b84b,color:#fff

Why the harness matters more than the model

LangChain’s DeepAgent moved from outside the Top 30 to the Top 5 on Terminal Bench 2.0 — by changing only the harness while keeping the same model (GPT-5.2-Codex). Key improvements: self-verification loops, loop detection, context management, and time budgeting.¹²

The insight: a mediocre model in an excellent harness outperforms an excellent model with no harness. This is why Ethan Mollick observes that “the same model can behave very differently depending on what harness it’s operating in.”¹³

Guides and sensors

Birgitta Bockeler, Distinguished Engineer at Thoughtworks, published the most comprehensive treatment of harness engineering, proposing two categories of controls:¹¹

Guides (feedforward controls) steer the agent before it acts:

Documentation — AGENTS.md files, architecture specs, coding standards that the agent reads before starting work
Bootstrap scripts — automated setup that ensures the agent starts in a known-good state
Constraints — explicit boundaries on what the agent can and cannot modify

Sensors (feedback controls) observe after the agent acts and enable self-correction:

Computational sensors — tests, linters, type checkers. Fast, deterministic, and cheap.
Inferential sensors — AI-powered code review, semantic analysis. Slower but capable of richer judgement.
Custom feedback — linter messages that include instructions for self-correction, so the agent can fix its own mistakes.

Think of it like...

A harness is to an AI agent what a cockpit is to a pilot. The pilot (model) has the skill to fly, but the cockpit provides the instruments (sensors), the checklists (guides), the autopilot constraints (guardrails), and the black box (logging) that make flying safe and reliable. A pilot without a cockpit can still fly — but you would not board that plane.

Evaluations: the foundation of harness engineering

evaluations (evals) are systematic tests that measure whether an LLM system produces correct, useful, safe output. They are the single most important component of a harness because without them, you cannot know whether anything else is working.¹⁴

Eval type	What it tests	Example
Correctness	Does the output match expected answers?	Compare against gold-standard test cases
Consistency	Does the same input produce similar outputs?	Run the same prompt 10 times and measure variance
Safety	Does the output respect boundaries?	Test for hallucination, data leakage, harmful content
Regression	Did a change break something that worked before?	Run the full eval suite after every system change

Anthropic’s guidance on long-running agents emphasises that effective harnesses “build verification into the workflow itself” rather than checking outputs only at the end.¹⁵

Guardrails

guardrails are constraints that prevent the agent from taking actions outside its intended scope. They operate at multiple levels:

Input guardrails — validate and sanitise what goes into the model
Output guardrails — check what comes out before it reaches the user or executes an action
Action guardrails — restrict what tools the agent can use, what files it can modify, what APIs it can call
Escalation guardrails — human-in-the-loop checkpoints for high-stakes or ambiguous situations

The harness engineering principle

You do not make AI reliable by finding a better model. You make it reliable by building a system that catches errors, enforces boundaries, and improves with every failure. The model is the engine. The harness is everything else.

Part 5 — The three layers in practice

Prompt engineering, context engineering, and harness engineering are not competing approaches. They are concentric layers, each encompassing the one before:¹⁶

graph TD
    subgraph Harness Engineering
        subgraph Context Engineering
            subgraph Prompt Engineering
                PE[What you ask]
            end
            CE[What the model sees]
        end
        HE[How the system works]
    end

    style PE fill:#e8b84b,color:#fff
    style CE fill:#4a9ede,color:#fff
    style HE fill:#5cb85c,color:#fff

Layer	Core question	What you build	When to invest
Prompt	How do I ask?	Instructions, examples, role definitions	Always — this is the foundation
Context	What does the model know?	Context structures, retrieval, memory, cascading	When prompts alone produce inconsistent results
Harness	How does the system work?	Evals, guardrails, sensors, guides, observability	When you ship to production or run agents autonomously

The progression maps to maturity:

Exploring — you prompt directly in a chat interface. Prompt engineering is enough.
Building — you integrate AI into a workflow. Context engineering becomes essential.
Shipping — you deploy AI that runs without you watching. Harness engineering is mandatory.

In plain terms

Prompt engineering is writing a good email. Context engineering is making sure the recipient has all the background documents they need. Harness engineering is building the entire office system — email, filing, review process, quality checks — that ensures the work gets done correctly even when you are not in the room.

Part 6 — The spectrum: chatting to shipping

Not every use of AI requires all three layers. The investment should match the stakes:

graph LR
    A[Chat] -->|add structure| B[Workflow]
    B -->|add verification| C[Product]
    C -->|add autonomy| D[Agent System]

    style A fill:#e8b84b,color:#fff
    style B fill:#4a9ede,color:#fff
    style C fill:#5cb85c,color:#fff
    style D fill:#9b59b6,color:#fff

Level	Example	Layers needed
Chat	Asking Claude to explain a concept	Prompt only
Workflow	Using AI to draft, then reviewing yourself	Prompt + basic context
Product	AI feature in an app that users interact with	Prompt + context + evals
Agent system	AI that acts autonomously across sessions	All three layers, fully

The key question

Ask yourself: “If this AI produces wrong output and I don’t notice for an hour, what happens?” If the answer is “nothing serious,” prompt engineering is enough. If the answer makes you nervous, you need a harness.

Part 7 — The map so far

graph TD
    LLM[LLM Engineering] --> PE[Prompt Engineering]
    PE --> IC[Intent Clarity]
    PE --> PC[Prompt Chaining]
    PE --> CH[Chunking]
    PE --> EX[Examples and Role]

    LLM --> CE[Context Engineering]
    CE --> CC[Context Cascading]
    CE --> GR[Granularity]
    CE --> RT[Retrieval and RAG]
    CE --> MM[Memory and State]

    LLM --> HE[Harness Engineering]
    HE --> EV[Evaluations]
    HE --> GU[Guardrails]
    HE --> GD[Guides - Feedforward]
    HE --> SN[Sensors - Feedback]
    HE --> OB[Observability]

    style LLM fill:#4a9ede,color:#fff

What you now understand

Mental models you have gained

Intent over incantation — clarity of what you want matters more than how you phrase it

Prompt engineering — the instruction layer: be specific, explain why, show examples, chain complex tasks

Context engineering — the information layer: curate what the model sees; less is more; structure and placement matter

Harness engineering — the system layer: Agent = Model + Harness; guides steer before, sensors correct after

The three layers are concentric — each encompasses the one before; invest in the layer that matches your stakes

Evaluations are non-negotiable — without evals, you cannot know if anything else works

The harness beats the model — a good system around a mediocre model outperforms a great model with no system

Check your understanding

Test yourself before moving on (click to expand)

Explain the difference between prompt engineering, context engineering, and harness engineering. What question does each layer answer?

Describe three strategies for improving context quality and explain why context placement matters.

Distinguish between guides and sensors in a harness. Give a concrete example of each for an AI coding assistant.

Interpret this scenario: an AI-powered customer support bot gives correct answers 80% of the time but occasionally fabricates return policies. The team’s instinct is to rewrite the prompt. Using what you have learned, identify which layer is most likely the problem and propose a solution.

Design a simple harness for an AI tool that summarises meeting notes. Identify at least two guides, two sensors, and one escalation guardrail.

Where to go next

I want to understand how LLMs actually work inside

This path covers how to use LLMs. If you want to understand the machinery — tokenization, embeddings, next-token prediction, parameters, training, and the transformer architecture — read inside-the-machine.

Best for: People who want accurate mental models of the technology they are working with.

I want to understand the AI architecture underneath

This path covers how to work with LLMs. If you want to understand how agentic systems are structured — routing, orchestration, knowledge, and pipelines — read agentic-design.

Best for: People building AI systems, not just using AI tools.

I want to understand the software fundamentals

LLM engineering sits on top of software architecture. If you have not read it yet, from-zero-to-building covers the base layer: frontend, backend, APIs, databases, and the document chain from intent to code.

Best for: People who want the full stack picture.

I want to build something now

Take these principles and apply them. Start a project through the learning pipeline, define your intent, and the system will match relevant concepts and generate a learning path tailored to your project.

Best for: People who learn by building.

Explorer

How to Talk to AI: From Prompts to Context to Harnesses

How to Talk to AI: From Prompts to Context to Harnesses

Who this is for

Part 1 — Intent, not incantation

Part 2 — Prompt engineering: the instruction layer

What works

What does not work

Prompt chaining

Chunking

Part 3 — Context engineering: the information layer

Why context beats prompting

What goes in, and in what order

Structuring context for quality

Avoiding slop

Part 4 — Harness engineering: the system layer

Why the harness matters more than the model

Guides and sensors

Evaluations: the foundation of harness engineering

Guardrails

Part 5 — The three layers in practice

Part 6 — The spectrum: chatting to shipping

Part 7 — The map so far

What you now understand

Check your understanding

Where to go next

Sources

Further reading

Graph View

Table of Contents

Backlinks

Explorer

How to Talk to AI: From Prompts to Context to Harnesses

How to Talk to AI: From Prompts to Context to Harnesses

Who this is for

Part 1 — Intent, not incantation

Part 2 — Prompt engineering: the instruction layer

What works

What does not work

Prompt chaining

Chunking

Part 3 — Context engineering: the information layer

Why context beats prompting

What goes in, and in what order

Structuring context for quality

Avoiding slop

Part 4 — Harness engineering: the system layer

Why the harness matters more than the model

Guides and sensors

Evaluations: the foundation of harness engineering

Guardrails

Part 5 — The three layers in practice

Part 6 — The spectrum: chatting to shipping

Part 7 — The map so far

What you now understand

Check your understanding

Where to go next

Sources

Further reading

Footnotes

Graph View

Table of Contents

Backlinks