How Much Does a Hermes + Paperclip Company Cost to Run?
>This covers the cost math. Zero-Human Companies walks through five full business configurations with agent configs, skill libraries, and revenue projections.

Zero-Human Companies
Build an Autonomous AI Business with Hermes Agent + Paperclip
Summary:
- A 6-agent production company costs ~$28.50/month in API fees and ships 150 deliverables.
- The trick is model tiering: premium models on judgment roles, cheap models on structured tasks, local models on formatting.
- Verified current provider pricing included. Copy-paste the cost calculator to run your own numbers.
- If you’re spending over $100/month on AI agent costs, something is broken.
How much does an AI agent company cost to run? Short answer: $28.50/month for a full production setup. Real answer: it depends on how you pick your models.
Most people overspend by running everything on a premium model, or underspend by running everything locally and wondering why the output reads like garbage. The right move: use expensive models where quality matters, cheap models where it doesn’t.
Here are the real numbers.
What are the actual API prices for AI agents?

Every price below was pulled from provider pricing pages on April 10, 2026. If you’re reading this later, re-verify before committing to a budget.
| Model | Input / 1M tokens | Output / 1M tokens | Best for |
|---|---|---|---|
| Claude Opus 4.6 | $5.00 | $25.00 | Complex reasoning, final review |
| Claude Sonnet 4 | $3.00 | $15.00 | Writing, judgment calls |
| Claude Haiku 4.5 | $1.00 | $5.00 | Fast derivative tasks |
| Qwen 2.5 72B (OpenRouter) | $0.12 | $0.39 | Routine agent work |
| Ollama local | $0.00 | $0.00 | Formatting, ops, testing |
Sources: platform.claude.com/docs/en/about-claude/pricing and openrouter.ai model pages, retrieved 2026-04-10.
The gap between Sonnet 4 and Qwen is massive. Sonnet costs $3/$15. Qwen costs $0.12/$0.39. That’s 25x cheaper on input. For tasks with clear instructions, Qwen performs the same as Sonnet. For judgment calls, Sonnet wins.
Opinion: Qwen 2.5 72B is the most underpriced model for agent work right now. It handles research, data extraction, and structured writing at 4% of Sonnet’s cost. Most builders default to Sonnet for every role. That’s a 25x overpay on work Qwen does just as well.
How much does each tier actually cost per month?
Three realistic operating modes. Pick the one that matches where you are.
Tier 1: Learning ($0-15/month)
Single agent, 5-10 tasks per day. You’re experimenting.
def tier1_cost(tasks_per_day=10, days=30):
"""Tier 1: single agent learning costs."""
# Qwen 2.5 72B via OpenRouter
cost_per_task = 0.035 # ~5K input + 1K output tokens
monthly = tasks_per_day * days * cost_per_task
print(f"Qwen cloud: ${monthly:.2f}/month ({tasks_per_day} tasks/day)")
print(f"Ollama local: $0.00/month (same tasks, your hardware)")
return monthly
tier1_cost()
# Qwen cloud: $10.50/month (10 tasks/day)
# Ollama local: $0.00/month (same tasks, your hardware)
Start with Qwen or local. You’re learning the tool, not shipping client work. Sonnet at this stage is burning money on capability you don’t need yet.
Tier 2: Small Company ($15-40/month)
Three to four agents, daily runs. One premium model for your best agent, cheap workers for the rest.
| Role | Model | Cost per run | Runs/day | Monthly |
|---|---|---|---|---|
| Research | Qwen 2.5 72B | $0.003 | 5 | $0.45 |
| Writer | Claude Sonnet 4 | $0.12 | 5 | $18.00 |
| Editor | Qwen 2.5 72B | $0.002 | 5 | $0.30 |
| Total | $18.75 |
Under $20/month for three agents running five times daily. The writer gets Sonnet because prose quality matters. Research and editing are procedural. Qwen handles them fine.
Tier 3: Full Production ($28.50/month)
Six agents, 150 deliverables per month. This is the setup from the book.
| Agent | Model | Cost/run | Runs/day | Monthly |
|---|---|---|---|---|
| Writer | Claude Sonnet 4 | $0.08 | 5 | $12.00 |
| Researcher | Qwen 2.5 72B | $0.03 | 5 | $4.50 |
| Analyst | Qwen 2.5 72B | $0.03 | 5 | $4.50 |
| Editor | Qwen 2.5 72B | $0.02 | 5 | $3.00 |
| Marketer | Qwen 2.5 72B | $0.03 | 5 | $4.50 |
| Ops | Ollama local | $0.00 | 5 | $0.00 |
| Total | $28.50 |
Six agents. Five deliverables per day. 150 per month. $28.50 in API costs.
At $0.09 per deliverable and a $350 client price, that’s 99% margins on production. The bottleneck is never the API bill. It’s client acquisition and quality review.
How do you calculate your own agent costs?
Copy this function. Plug in your models and usage patterns.
def agent_cost_calculator(agents: list[dict]) -> dict:
"""Calculate monthly AI agent costs for any configuration.
Each agent dict needs: name, input_tokens, output_tokens,
input_price (per 1M), output_price (per 1M), runs_per_day
"""
MODEL_PRICES = {
"sonnet-4": {"input": 3.00, "output": 15.00},
"opus-4.6": {"input": 5.00, "output": 25.00},
"haiku-4.5": {"input": 1.00, "output": 5.00},
"qwen-72b": {"input": 0.12, "output": 0.39},
"ollama": {"input": 0.00, "output": 0.00},
}
total_monthly = 0
total_deliverables = 0
print(f"{'Agent':<12} {'Model':<12} {'$/run':>8} {'$/month':>10}")
print("-" * 44)
for a in agents:
prices = MODEL_PRICES[a["model"]]
cost_per_run = (
(a["input_tokens"] / 1_000_000) * prices["input"] +
(a["output_tokens"] / 1_000_000) * prices["output"]
)
monthly = cost_per_run * a["runs_per_day"] * 30
total_monthly += monthly
total_deliverables += a["runs_per_day"] * 30
print(f"{a['name']:<12} {a['model']:<12} ${cost_per_run:>7.4f} ${monthly:>9.2f}")
cost_per_deliverable = total_monthly / total_deliverables if total_deliverables else 0
print("-" * 44)
print(f"{'TOTAL':<12} {'':12} {'':>8} ${total_monthly:>9.2f}")
print(f"\nDeliverables/month: {total_deliverables}")
print(f"Cost/deliverable: ${cost_per_deliverable:.4f}")
return {"monthly": total_monthly, "per_deliverable": cost_per_deliverable}
# Example: full production setup
production = [
{"name": "Writer", "model": "sonnet-4", "input_tokens": 20000, "output_tokens": 4000, "runs_per_day": 5},
{"name": "Researcher", "model": "qwen-72b", "input_tokens": 15000, "output_tokens": 3000, "runs_per_day": 5},
{"name": "Analyst", "model": "qwen-72b", "input_tokens": 15000, "output_tokens": 3000, "runs_per_day": 5},
{"name": "Editor", "model": "qwen-72b", "input_tokens": 8000, "output_tokens": 1500, "runs_per_day": 5},
{"name": "Marketer", "model": "qwen-72b", "input_tokens": 15000, "output_tokens": 3000, "runs_per_day": 5},
{"name": "Ops", "model": "ollama", "input_tokens": 5000, "output_tokens": 1000, "runs_per_day": 5},
]
agent_cost_calculator(production)
# Agent Model $/run $/month
# --------------------------------------------
# Writer sonnet-4 $0.1200 $18.00
# Researcher qwen-72b $0.0030 $0.45
# Analyst qwen-72b $0.0030 $0.45
# Editor qwen-72b $0.0016 $0.24
# Marketer qwen-72b $0.0030 $0.45
# Ops ollama $0.0000 $0.00
# --------------------------------------------
# TOTAL $19.59
#
# Deliverables/month: 900
# Cost/deliverable: $0.0218
Swap the model strings and token counts for your setup. The MODEL_PRICES dict uses current provider verified prices. Update them when providers change their rates.
Should you use OpenRouter or direct Anthropic API?
Opinion: start with OpenRouter. Switch to direct API only when volume justifies it.
| Factor | OpenRouter | Direct Anthropic |
|---|---|---|
| Setup | One account, one credit pool | Separate account per provider |
| Model switching | Config change | Config change + new API key |
| Pricing | Small markup over provider rates | Lowest base rates |
| Batch API (50% off) | Not available | Available |
| Fallback routing | Automatic | You build it yourself |
| Break-even point | Under ~50K requests/day | Over ~50K requests/day |
For most people, OpenRouter wins on convenience.
The exception: if your workload is batch-friendly (not real-time), the direct Anthropic Batch API cuts costs in half. Sonnet 4 drops to $1.50/$7.50. Opus 4.6 drops to $2.50/$12.50. That discount stacks with prompt caching for big contexts.
# Batch API savings for Sonnet 4
standard = {"input": 3.00, "output": 15.00}
batch = {"input": 1.50, "output": 7.50}
monthly_standard = (20000 / 1e6 * standard["input"] + 4000 / 1e6 * standard["output"]) * 150
monthly_batch = (20000 / 1e6 * batch["input"] + 4000 / 1e6 * batch["output"]) * 150
print(f"Standard: ${monthly_standard:.2f}/month")
print(f"Batch: ${monthly_batch:.2f}/month")
print(f"Savings: ${monthly_standard - monthly_batch:.2f}/month")
# Standard: $18.00/month
# Batch: $9.00/month
# Savings: $9.00/month
What does over $100/month mean?
Something is wrong. At the Tier 3 production setup, six agents running 150 deliverables per month cost $28.50. Even doubling the volume, you should stay under $60.
If you’re over $100, check these three things:
- Retry loops. An agent that retries failed tasks burns tokens on every attempt. Five retries on a Sonnet task costs 5x what one clean run costs. Set
maxIterationsto a sane limit and log when agents hit it. - Context stuffing. Loading a 50,000-token skill file on every run when the agent only needs 2,000 tokens of instructions. Trim your skills to the minimum viable context.
- Wrong model on the wrong role. Running Opus 4.6 ($5/$25) on a formatting agent that Qwen ($0.12/$0.39) handles perfectly. Audit your model assignments against the heuristic: premium for judgment, worker for structured tasks, local for formatting.
Opinion: if you can’t explain where every dollar goes, your agents are misconfigured, not underfunded. Frontier model pricing has dropped enough that cost problems are almost always config problems.
What should you actually do?
Start with the cheapest option that works for your current stage. Don’t pre-optimize.
- Learning? Install Ollama, pull an 8B model, point your agent framework at it. Total cost: $0. Upgrade to Qwen via OpenRouter ($10 of credits) when local quality gets frustrating.
- First client? Mixed stack. Sonnet 4 on the role that touches client deliverables, Qwen on everything else. Budget $20-30/month.
- Production? Run the Tier 3 setup from this article. Six agents, mixed models, ~$28.50/month. Monitor with the cost calculator above and watch for the three red flags.
The tiering decision matters more than any individual price. Expensive model where judgment happens. Cheap model where procedures happen. Free model where formatting happens. Get this right once and cost stays flat while output scales.
bottom_line
- A 6-agent AI company costs $28.50/month in API fees. Not $500. Not $200. Twenty-eight fifty. The margin on agent-produced deliverables is 99% at typical client pricing. The bottleneck is never the API bill.
- Model tiering is the whole strategy. Sonnet 4 on judgment roles, Qwen 2.5 72B on structured tasks, Ollama on formatting. One premium agent, four cheap workers, one free. That ratio is what keeps the bill under $30.
- Over $100/month means you have a bug, not a cost problem. Check for retry loops, bloated context, and premium models doing worker tasks. The pricing on frontier AI has dropped enough that cost overruns are configuration errors, not market reality.
Frequently Asked Questions
How much does it cost per month to run an AI agent company?+
A full production setup with 6 agents producing 150 deliverables per month costs about $28.50 in API fees. Learning setups run $0-15. The key is putting cheap models on routine work and reserving premium models for judgment calls.
What is the cheapest way to run AI agents?+
Run Ollama locally for $0/token on formatting and ops tasks. For cloud work, Qwen 2.5 72B via OpenRouter costs $0.12 per million input tokens, roughly 25x cheaper than Claude Sonnet 4. Mix both for under $15/month.
How do you calculate cost per deliverable for AI agents?+
Add up input and output token costs across every agent in the pipeline. A typical 5-agent research report costs about $0.09 total. At a $350 client price, that is 99% margin on production costs.
More from this Book
8 Hermes + Paperclip Failures (and How to Fix Each One)
Why Hermes and Paperclip agents break in production. 8 failure modes from real GitHub issues with copy-paste fixes and a diagnostic function.
from: Zero-Human Companies
Build an AI Research Agency on Hermes + Paperclip
5-agent research agency on Paperclip with Hermes workers. Full JSON configs, $0.09/report cost math, and the ramp to $6K/month.
from: Zero-Human Companies
How to Set Up Hermes Agent Skills That Compound Over Time
Build a self-improving AI agent with Hermes Agent skills. Three mechanisms, real SKILL.md files, and the commands that make your agent smarter weekly.
from: Zero-Human Companies
Connect Hermes Agent to Paperclip with hermes_local
Step-by-step setup for the Paperclip hermes_local adapter. Full config schema, heartbeat loop, environment variables, and the persistSession bug workaround.
from: Zero-Human Companies