What is an agent harness in simple terms?

An agent harness is everything wrapped around a language model that turns it from a text predictor into something that does work: the loop, context, guardrails, tools, verification, recovery, and orchestration. The model is the engine; the harness is the rest of the car.

Is an agent harness the same as MCP or a framework?

No. MCP is a standard way to plug tools into the tool layer, one layer of seven. A framework is a harness someone else built and you adopted. The harness is the whole machine; MCP is one connector inside it.

Why does the harness matter more than the model?

Two engineers using the same model ship wildly different agents because the harness decides the outcome. One teardown of Claude Code estimated roughly 98% of it was harness, not model. The model is what you buy; the harness is what you get good at.

What Is an Agent Harness? The 7 Layers

Summary:

An agent harness is the seven layers wrapped around a language model that turn it into a working agent.

The layers, in build order: loop, context, guardrails, tools, verification, recovery, orchestration.

Each layer has one job and one specific failure it prevents.

Once you know the seven, you can take any agent apart and name what it built and what it skipped.

What is an agent harness? An agent harness is everything wrapped around a language model that turns it from a thing that predicts text into a thing that does the work. Strip all of it away and you are holding an API that autocompletes. The model is the small core. The harness is the body around it, and it is roughly 98% of what makes a tool like Claude Code feel smart. The model came in at under 2%.

The people who maintain the curated best-of-Agent-Harnesses list put the distinction cleanly:

A model answers; an agent acts. An agent harness is the runtime that turns one into the other: the model thinks; the harness decides what that thinking is allowed to touch.

The seven layers

The seven harness layers wrapped around the model core in build order: 1 loop, 2 context and state, 3 guardrails, 4 tools, 5 verification, 6 recovery, 7 orchestration, each with its one job and the failure it prevents

A working harness has seven layers. Read the third column as the symptom you will actually see when that layer is missing, because that is how you use this table in practice: a symptom shows up, and you flip back here to name the layer that caused it.

#	Layer	The one job it does	What breaks without it
1	Loop	Take a turn, run the tool it asks for, feed the result back, repeat until done	No progress. A chatbot that describes the work instead of doing it
2	Context + state	Put the right information in the window each turn, and remember across a long task	Blind and forgetful. Loses the plot by turn three, re-reads the same file
3	Guardrails	Keep the model inside safe boundaries; gate the dangerous actions	Out of bounds. One confident-but-wrong turn deletes your work
4	Tools	Give the model real capabilities: narrow, typed, safe to retry	All talk. The smartest model flails on a vague error or a mega-tool
5	Verification	Check the answer before you trust it: tests, types, an independent look	Wrong but confident. A green-tests merged PR still ships a bug
6	Recovery	Get back on track when a check fails: roll back, retry, or escalate	Dead ends. One failed step poisons the whole run
7	Orchestration	Coordinate more than one agent on a job and merge their results	Chaos at scale. Every agent works alone, blind to the others

Two rows carry the most weight. Guardrails are where most people are wrong about agent security: the real attack surface is not a cleverly worded prompt, it is the plain list of things the tools are allowed to touch. And tools decide whether the model ever gets anywhere, because a bad tool is the single most common reason an agent gets stuck.

One thing is not a layer but runs through all seven: evaluation. It is the discipline that measures whether any layer earns its place. Add a layer because a video told you to and you are guessing. Add a layer, measure, and rip it out if the number does not move, and you are engineering.

The vocabulary, resolved

People throw these words around as synonyms and then wonder why nothing they read fits together. They stack. Each one contains the one below it.

Prompt. A single message you send the model. One turn of text. The smallest unit.
Context. Everything the model can see on one call: the prompt, the system instructions, the history, any files you injected. The model knows only what is in here.
Tool call. The model asking, in text, to run something. Your code does the actual running.
Loop. The model and its tools cycling: call, run a tool, feed the result back, repeat until done. This is the exact line where a chatbot becomes an agent.
Orchestration. Coordinating more than one harness on a single job.

Two words people expect on that ladder are not rungs. RAG is a technique that fills the context layer. MCP is a standard connector that plugs tools into the tool layer. Both are tactics applied to layers you already have names for, not new kinds of thing.

Drop any tool into the seven slots

The anatomy is universal, which is the proof it is real. Take three tools you have probably used and watch them fall into the same seven slots.

Layer	Claude Code	Codex CLI	Cursor
Loop	Polished CLI loop	Same loop, terminal	Loop inside the editor
Context	Aggressive context mgmt	Per-session context	Managed as you move through files
Guardrails	Permission system gates calls	Sandbox flag + approval policy	Edit/approve prompts
Tools	Curated tool set	Curated tool set	File + terminal + checks
Verification	Verification hooks	Test/lint on request	Runs checks
Recovery	Retries on failure	Retries on failure	Retries on failure
Orchestration	Sub-agents	Sub-agents	Editor-side coordination

The category is real and crowded. Here are named harnesses people actually run, read off the curated ranking:

Harness	What it is
Codex	OpenAI’s terminal coding agent; sandboxed tool-call loop, multi-provider
opencode	Model-agnostic terminal coding agent; tool-calling loop in the shell
Gemini CLI	Google’s terminal coding agent
crush	Charm’s terminal coding agent; tool-calling loop with session persistence
goose	Block’s open-source terminal agent harness with recipes
OpenHands	Dockerized software-engineering agent; bash/editor/browser toolset

Score any harness against the seven layers

The fastest way to understand a tool is to take it apart. One point per layer it implements, then let the total predict which setup wins on the same model.

LAYERS = ["loop", "context", "guardrails", "tools",
          "verification", "recovery", "orchestration"]

def score(have):                       # have = set of layers the tool implements
    got = sum(layer in have for layer in LAYERS)
    missing = [layer for layer in LAYERS if layer not in have]
    return got, missing

# your first-night while-loop vs a full harness:
print(score({"loop"}))                 # (1, ['context', 'guardrails', ...])
print(score(set(LAYERS)))              # (7, [])

Run it on the tool you actually use. Set the layers you can point at, leave the rest out, and read the gap:

# the tool you use today. Set a layer only where you can name HOW it does it.
my_tool = {"loop", "context", "guardrails", "tools", "verification", "recovery"}
got, missing = score(my_tool)
print(f"{got}/7 layers; missing: {missing}")   # 6/7 layers; missing: ['orchestration']

The layer a tool leaves out is the lesson. Either the tool is closed and you have found the limit of what you can know, or it genuinely lacks that layer and you have spotted a real weakness on purpose. This is also the answer to “why is every agent just a worse Claude Code?” They wired up the same model and built one or two layers. Claude Code built all seven.

You don’t have a model problem

The most useful thing the anatomy buys you is a way to diagnose a failing agent instead of shrugging and blaming the model. Almost every complaint that sounds like “the model is dumb” is a missing layer, and now you can name it.

forgets what it was doing      -> missing context layer
overwrote a file it shouldn't  -> missing guardrail
repeats the same broken call   -> bad tool design
reports success on broken code -> no verification
one failure derails the run    -> no recovery

Swap in a model twice as smart and it makes the same mistakes a little more eloquently, because none of them was an intelligence failure. They were missing machinery, and machinery is something you build.

What should you actually do?

If you can’t define a harness in one sentence yet → memorize the seven-layer table. Draw it from memory with the model in the middle. That alone puts you ahead of most people making videos about it.
If your agent keeps failing → stop reaching for a bigger model. Run the symptom list above and name the missing layer instead.
If you are choosing a tool → score it against the seven layers before you commit. The one with more layers built well wins on the same model, not the one with the bigger model.
If someone tells you MCP or a framework “is” a harness → it isn’t. MCP is one connector; a framework is a harness you adopted. Know the difference before you build on either.

The bottom line

A harness is seven layers wrapped around a model: loop, context, guardrails, tools, verification, recovery, orchestration.
The model is something you buy. The harness is something you get good at, and it decides the outcome.
Learn the anatomy and you stop cargo-culting tools. You start judging them.

What Is an Agent Harness? The 7 Layers

Agentic Coding: Build the Harness

The seven layers

The vocabulary, resolved

Drop any tool into the seven slots

Score any harness against the seven layers

You don’t have a model problem

What should you actually do?

The bottom line

Frequently Asked Questions

What Is an Agent Harness? The 7 Layers

Agentic Coding: Build the Harness

The seven layers

The vocabulary, resolved

Drop any tool into the seven slots

Score any harness against the seven layers

You don’t have a model problem

What should you actually do?

The bottom line

Frequently Asked Questions

More from this book

AI Agent Guardrails: Gate Every Tool Call

Build an AI Coding Agent From Scratch in Python

Context Engineering for AI Agents

Why AI Agents Report False Success