Why do Claude Code subagents use so many tokens?

Each subagent runs its own API requests in a fresh context window, so a three-subagent task costs roughly four times a single-thread session. The in-product /usage message reports 85% of usage coming from subagent-heavy sessions.

How do I make Claude Code subagents cheaper?

Route worker subagents to a cheaper model. Set model: haiku or model: sonnet in each worker's frontmatter and keep the planner on opus. Haiku is 5x cheaper than Opus per token, so a worker making 30 calls a session saves fast.

What is CLAUDE_CODE_SUBAGENT_MODEL?

An environment variable that silently overrides every subagent's frontmatter model field. If it is set in your shell or dotfiles, your careful routing is preempted with no signal in the transcript. Audit and unset it before trusting your cost numbers.

Why Claude Code Subagents Burn So Many Tokens

Summary:

Subagents run their own API requests, so subagent-heavy sessions can be 85% of your token bill.

The biggest lever is model routing: planner on Opus, workers on Haiku or Sonnet. Haiku is 5x cheaper than Opus.

One env var, CLAUDE_CODE_SUBAGENT_MODEL, silently undoes all of it. Audit it first.

Bonus: the live Anthropic pricing table, the per-call math, and the cost-control settings.json.

Claude Code subagents cost more than most operators realize, and the receipt is in your own session. A developer on Reddit posted their /usage breakdown after burning through Max 20 unexpectedly fast: “85% of your usage came from subagent-heavy sessions.” That number is not a bug. It is structural. The question is whether you, the operator, are routing those calls deliberately or paying Opus prices for work a cheaper model handles fine.

Why do Claude Code subagents burn so many tokens?

Because each subagent runs its own API requests in a fresh context window. A subagent is an isolated delegation: it does not inherit your session, so it re-reads what it needs and bills its own calls. Three subagents on one task is roughly four times the token spend of a single-thread session. That is the mechanism behind the 85%.

The evidence is consistent across operators and plan tiers. One Reddit poster: “I just used 20% of my weekly Max 20 subscription in my 5hr window.” A GitHub issue (anthropics/claude-code #55051): “I just used up 50% of my 5x limits in like 30 seconds by using subagents with the prompt ‘do a deep security review of this codebase. You may need multiple subagents.’” Three operators, three tiers, same shape: the spend concentrates in subagent fan-out, and nobody had walked them through the controls.

What does each model actually cost?

Here is the live per-model API pricing, from the Anthropic pricing page, per million tokens:

Model	Input $/MTok	Output $/MTok
Claude Opus 4.5	$5	$25
Claude Sonnet 4.5	$3	$15
Claude Haiku 4.5	$1	$5

Haiku is 5x cheaper than Opus on both input and output. That is the whole lever. Most operators have every subagent inheriting the main session’s model, which is usually Opus, so every worker pays Opus prices on tasks that do not need Opus quality. Route each worker explicitly with the model field in its frontmatter:

# planner stays on the strategic model
---
name: planner
model: opus
---
# workers drop to cheaper engines
---
name: coder
model: haiku
---
---
name: reviewer
model: sonnet
---

Walk the math with the table above. Assume a worker call of 20,000 input and 5,000 output tokens. On Opus: $5 × 0.02 + $25 × 0.005 = $0.225 per call. On Haiku: $1 × 0.02 + $5 × 0.005 = $0.045 per call. One-fifth the cost. Run three Haiku workers in parallel and you spend $0.135 to finish three tasks, still less than one serial Opus call’s $0.225 for one task. That is the image’s compound effect: 5x cheaper per call times 3x parallel throughput nets roughly 2x billable work per dollar.

The env var that silently undoes your routing

You can route every worker perfectly and see no change, because one environment variable preempts all of it. CLAUDE_CODE_SUBAGENT_MODEL overrides the frontmatter model field with no signal in tool_result and no warning in the transcript. GitHub issue #57718 documents it with a test matrix: three of three silent clamps when the env var is set, two of two honored when unset.

Audit before you trust your numbers:

echo "CLAUDE_CODE_SUBAGENT_MODEL: ${CLAUDE_CODE_SUBAGENT_MODEL:-(unset)}"
unset CLAUDE_CODE_SUBAGENT_MODEL   # then restart your session

If that printed a value you did not set, your routing was being preempted and your cost analysis was wrong. The variable was lying, not your math. This is two minutes that prevents the most demoralizing version of “I tuned my models and saw nothing change.”

Then lock a cost-control default in ~/.claude/settings.json:

{
  "model": "opus",
  "availableModels": ["opus", "sonnet", "haiku"]
}

The model: opus line is your planner’s main-session default. availableModels keeps an operator from accidentally landing on a model your agent files do not route to.

The honest read: this is a throughput claim, not a 50% cost cut

Here is the reality check most cost guides skip. The levers do not cut your bill in half on the same workload. Model routing alone cuts the subagent line item roughly 30%. The “2x” is a throughput number: three cheap workers in parallel finish three tasks in about the time one Opus worker finishes one. If you read “2x faster” as “save 50% on tokens,” that is not what ships.

That distinction matters because the payoff is real either way, and you choose which form to take it. Either you ship more output per month at the same bill, or you bank the savings at the same output. A reasonable target after a week of routing is a 30 to 50% drop on the subagent line, measured by snapshotting /usage, running normally for seven days, and snapshotting again. Your number depends on how aggressively you route to Haiku. The honest claim is “noticeable drop on the subagent line,” not a fixed percentage.

What should you actually do?

If every subagent inherits your main model → set model: haiku on simple workers and model: sonnet on the reviewer today. Keep the planner on Opus.
If you routed models and saw no change → run echo $CLAUDE_CODE_SUBAGENT_MODEL. If it is set, that is your culprit. Unset and restart.
If you have overnight bulk work → it can wait, so it belongs on the Batch API at half price, not in your interactive session.

The bottom line

Subagents are not theoretical cost. They are the 85% line item on your /usage receipt right now.
Model routing is the biggest lever, and Haiku at 5x cheaper than Opus is where the savings live. Scope the model like you scope the tools.
The cost is operator control, not model taste. Audit the env var, route deliberately, and measure the drop instead of guessing at it.

Why Claude Code Subagents Burn So Many Tokens

Claude Code Subagents and Hooks

Why do Claude Code subagents burn so many tokens?

What does each model actually cost?

The env var that silently undoes your routing

The honest read: this is a throughput claim, not a 50% cost cut

What should you actually do?

The bottom line

Frequently Asked Questions

Why Claude Code Subagents Burn So Many Tokens

Claude Code Subagents and Hooks

Why do Claude Code subagents burn so many tokens?

What does each model actually cost?

The env var that silently undoes your routing

The honest read: this is a throughput claim, not a 50% cost cut

What should you actually do?

The bottom line

Frequently Asked Questions

More from this book

Claude Code Hooks: Wire a SubagentStop Hook

Run Claude Code Subagents in Parallel Without Clobbering

Set Up Claude Code Subagents: Three, Not Forty-Seven

Tool, Skill, or Subagent? The Claude Code Decision