How much does it cost to run AI agents in a company?

A six-agent production company costs about $28.50 a month in cloud tokens, producing roughly 150 deliverables. One premium agent on Claude Sonnet runs $18; four Qwen workers run $1.50 to $3 each; a local Ops agent is free.

Which model should each AI agent use?

Premium models like Claude Sonnet for judgment and final review, cheap cloud models like Qwen 2.5 72B for routine data work, and a local Ollama model for formatting and ops. The mix is what keeps the bill near $30.

Is Qwen really cheaper than Claude Sonnet for agent work?

Yes. Qwen 2.5 72B is $0.36 input and $0.40 output per million tokens versus Sonnet's $3 / $15, roughly 8x cheaper on input. For routine agent work with clear skill files it is hard to tell the difference.

What It Costs to Run an Autonomous AI Company

Summary:

A six-agent autonomous company costs about $28.50/month in cloud tokens.

The trick is the model mix: one premium agent, four cheap workers, one local.

Real per-model prices, scraped live, so you can run the math yourself.

Copy-paste cost calculator plus the model-selection rule that keeps the bill flat.

Six-agent monthly cost breakdown: a dominant Writer on Sonnet at $18 over four near-zero Qwen workers and a free local Ops agent, totaling $28.50 a month against $7,500 in revenue

How much does it cost to run AI agents in a real autonomous company? About $28.50 a month for six of them. That is not a typo and it is not a teaser number. It is the cloud-token bill for a six-agent production company shipping 150 deliverables a month, and the whole reason it is that low is the model mix. Spend on the one agent that needs judgment, go cheap on the four that do routine work, and run the formatting agent locally for free.

What does a six-agent company actually cost?

Here is the production budget, agent by agent, for 150 deliverables a month:

Agent	Model	Monthly cost
Writer	Claude Sonnet 4.5	$18.00
Researcher	Qwen 2.5 72B	$3.00
Analyst	Qwen 2.5 72B	$3.00
Marketer	Qwen 2.5 72B	$3.00
Editor	Qwen 2.5 72B	$1.50
Ops	Local (Ollama, Llama 3 8B)	$0.00
Total cloud cost		$28.50/mo

One agent, the Writer, is 63% of the bill, because it is the only one on a premium model. The four Qwen workers together cost $10.50. The Ops agent costs nothing because it runs locally. At even $50 per deliverable, 150 deliverables is $7,500/month in revenue against $28.50/month in cloud costs. The margin is absurd. It is also real.

Why is the cheap model good enough?

Because most agent work is routine, and routine work does not need a frontier model. The prices that make this work, scraped from the provider:

Model	Input /1M	Output /1M	Context
qwen/qwen-2.5-72b-instruct	$0.36	$0.40	131K
nousresearch/hermes-4-405b	$1.00	$3.00	131K

Source: OpenRouter (list prices). Qwen 2.5 72B at $0.36 input is roughly 8x cheaper than Claude Sonnet on input tokens. For research, extraction, and structured output driven by a clear skill file, you cannot tell it apart from Sonnet at about 12% of the cost. It struggles on genuinely ambiguous reasoning, which is exactly why you keep one premium agent for the judgment role.

How do you decide which model goes where?

The rule is simple and it is the whole game. Put the premium model only where judgment lives:

Premium (Claude Sonnet 4.5): agents that make judgment calls, decompose ambiguous tasks, or do final quality review before a client sees it.
Worker (Qwen 2.5 72B): agents with a clear skill file: data gathering, extraction, transformation, structured output.
Local (Llama 3 8B on Ollama): formatting, file ops, and anything you would feel silly paying for.

Switching a worker to the cheap model is two lines:

hermes config set model.provider openrouter
hermes config set model.name qwen/qwen-2.5-72b-instruct

Or override it per agent in the Paperclip adapter config:

"adapterConfig": { "model": "openrouter/qwen/qwen-2.5-72b-instruct" }

Spending Sonnet money on a formatting agent is a waste. Spending Qwen money on an agent that makes judgment calls wrecks your output. The mix is the discipline.

Can you run it for free?

Yes, with local models. Ollama costs zero dollars per token; you pay in hardware and latency. Pull a small model and point Hermes at it:

ollama pull llama3:8b
ollama list

An 8B model on a laptop handles formatting, summarization, and simple extraction fine. It struggles on multi-step reasoning, so the fully-local path is great for the Ops role and for learning, rough for the judgment roles. The honest budget thresholds: under $5/month is local-only, $5 to $15 is Qwen via OpenRouter, $15 to $40 is the mixed stack that runs a real company, and over $100 means something is misconfigured. The six-agent setup should stay well under $40 if you configured it right.

Run the math on your own setup

You do not need a spreadsheet. Monthly cost for one agent is cost-per-run times runs per day times thirty. Plug in your own numbers:

# monthly cost = cost-per-run * runs/day * 30
echo "scale=2; 0.12 * 5 * 30" | bc   # Writer on Sonnet, 5 runs/day -> 18.00
echo "scale=2; 0.02 * 5 * 30" | bc   # a Qwen worker, 5 runs/day  ->  3.00

Change the per-run cost and the run count to match your workload, sum the agents, and you have your monthly bill before you spend a cent. For batch-friendly work, Anthropic’s Batch API halves Sonnet token costs, so a batch-heavy company drops the premium line further.

What should you actually do?

If you are learning → go local or Qwen-only. Do not spend Sonnet money while you are still figuring out the tool.
If you are running a real company → one premium agent, the rest on Qwen, Ops local. That is the $28.50 setup.
If your bill is over $100/month → something is wrong. Check for retry loops, agents loading too many skills, and a worker accidentally left on a premium model.
If you want lower variance → move the Writer to Sonnet permanently and accept the higher line; it is still under $30.

The bottom line

Production cost is not the bottleneck in an agent company. At $28.50 against $7,500, the constraint is client acquisition and quality review, not tokens.
The entire budget lives or dies on the model mix. One premium agent, cheap workers, local ops. Get that right and the bill stays flat as you scale deliverables.
Re-check the prices at the source before you commit. They drift, and the cheap-model advantage is the foundation the whole margin sits on.

What It Costs to Run an Autonomous AI Company

Zero-Human Companies

What does a six-agent company actually cost?

Why is the cheap model good enough?

How do you decide which model goes where?

Can you run it for free?

Run the math on your own setup

What should you actually do?

The bottom line

Frequently Asked Questions

What It Costs to Run an Autonomous AI Company

Zero-Human Companies

What does a six-agent company actually cost?

Why is the cheap model good enough?

How do you decide which model goes where?

Can you run it for free?

Run the math on your own setup

What should you actually do?

The bottom line

Frequently Asked Questions

More from this book

Write a Hermes SKILL.md the Agent Actually Uses

Where Hermes + Paperclip Agents Break (and the Fixes)

Install Paperclip AI and Launch Your First AI Company

Wire Hermes Into Paperclip with the hermes_local Adapter