> youcanbuildthings.com
tutorials books topics about
analysis from: Zero-Human Companies

How Much Does a Hermes + Paperclip Company Cost to Run?

by J Cook · 8 min read·

Summary:

  1. A 6-agent production company costs ~$28.50/month in API fees and ships 150 deliverables.
  2. The trick is model tiering: premium models on judgment roles, cheap models on structured tasks, local models on formatting.
  3. Verified current provider pricing included. Copy-paste the cost calculator to run your own numbers.
  4. If you’re spending over $100/month on AI agent costs, something is broken.

How much does an AI agent company cost to run? Short answer: $28.50/month for a full production setup. Real answer: it depends on how you pick your models.

Most people overspend by running everything on a premium model, or underspend by running everything locally and wondering why the output reads like garbage. The right move: use expensive models where quality matters, cheap models where it doesn’t.

Here are the real numbers.

What are the actual API prices for AI agents?

Three-tier AI agent cost comparison showing learning, small company, and production setups

Every price below was pulled from provider pricing pages on April 10, 2026. If you’re reading this later, re-verify before committing to a budget.

ModelInput / 1M tokensOutput / 1M tokensBest for
Claude Opus 4.6$5.00$25.00Complex reasoning, final review
Claude Sonnet 4$3.00$15.00Writing, judgment calls
Claude Haiku 4.5$1.00$5.00Fast derivative tasks
Qwen 2.5 72B (OpenRouter)$0.12$0.39Routine agent work
Ollama local$0.00$0.00Formatting, ops, testing

Sources: platform.claude.com/docs/en/about-claude/pricing and openrouter.ai model pages, retrieved 2026-04-10.

The gap between Sonnet 4 and Qwen is massive. Sonnet costs $3/$15. Qwen costs $0.12/$0.39. That’s 25x cheaper on input. For tasks with clear instructions, Qwen performs the same as Sonnet. For judgment calls, Sonnet wins.

Opinion: Qwen 2.5 72B is the most underpriced model for agent work right now. It handles research, data extraction, and structured writing at 4% of Sonnet’s cost. Most builders default to Sonnet for every role. That’s a 25x overpay on work Qwen does just as well.

How much does each tier actually cost per month?

Three realistic operating modes. Pick the one that matches where you are.

Tier 1: Learning ($0-15/month)

Single agent, 5-10 tasks per day. You’re experimenting.

def tier1_cost(tasks_per_day=10, days=30):
    """Tier 1: single agent learning costs."""
    # Qwen 2.5 72B via OpenRouter
    cost_per_task = 0.035  # ~5K input + 1K output tokens
    monthly = tasks_per_day * days * cost_per_task
    print(f"Qwen cloud:  ${monthly:.2f}/month ({tasks_per_day} tasks/day)")
    print(f"Ollama local: $0.00/month (same tasks, your hardware)")
    return monthly

tier1_cost()
# Qwen cloud:  $10.50/month (10 tasks/day)
# Ollama local: $0.00/month (same tasks, your hardware)

Start with Qwen or local. You’re learning the tool, not shipping client work. Sonnet at this stage is burning money on capability you don’t need yet.

Tier 2: Small Company ($15-40/month)

Three to four agents, daily runs. One premium model for your best agent, cheap workers for the rest.

RoleModelCost per runRuns/dayMonthly
ResearchQwen 2.5 72B$0.0035$0.45
WriterClaude Sonnet 4$0.125$18.00
EditorQwen 2.5 72B$0.0025$0.30
Total$18.75

Under $20/month for three agents running five times daily. The writer gets Sonnet because prose quality matters. Research and editing are procedural. Qwen handles them fine.

Tier 3: Full Production ($28.50/month)

Six agents, 150 deliverables per month. This is the setup from the book.

AgentModelCost/runRuns/dayMonthly
WriterClaude Sonnet 4$0.085$12.00
ResearcherQwen 2.5 72B$0.035$4.50
AnalystQwen 2.5 72B$0.035$4.50
EditorQwen 2.5 72B$0.025$3.00
MarketerQwen 2.5 72B$0.035$4.50
OpsOllama local$0.005$0.00
Total$28.50

Six agents. Five deliverables per day. 150 per month. $28.50 in API costs.

At $0.09 per deliverable and a $350 client price, that’s 99% margins on production. The bottleneck is never the API bill. It’s client acquisition and quality review.

How do you calculate your own agent costs?

Copy this function. Plug in your models and usage patterns.

def agent_cost_calculator(agents: list[dict]) -> dict:
    """Calculate monthly AI agent costs for any configuration.

    Each agent dict needs: name, input_tokens, output_tokens,
    input_price (per 1M), output_price (per 1M), runs_per_day
    """
    MODEL_PRICES = {
        "sonnet-4":   {"input": 3.00,  "output": 15.00},
        "opus-4.6":   {"input": 5.00,  "output": 25.00},
        "haiku-4.5":  {"input": 1.00,  "output": 5.00},
        "qwen-72b":   {"input": 0.12,  "output": 0.39},
        "ollama":     {"input": 0.00,  "output": 0.00},
    }

    total_monthly = 0
    total_deliverables = 0

    print(f"{'Agent':<12} {'Model':<12} {'$/run':>8} {'$/month':>10}")
    print("-" * 44)

    for a in agents:
        prices = MODEL_PRICES[a["model"]]
        cost_per_run = (
            (a["input_tokens"] / 1_000_000) * prices["input"] +
            (a["output_tokens"] / 1_000_000) * prices["output"]
        )
        monthly = cost_per_run * a["runs_per_day"] * 30
        total_monthly += monthly
        total_deliverables += a["runs_per_day"] * 30

        print(f"{a['name']:<12} {a['model']:<12} ${cost_per_run:>7.4f} ${monthly:>9.2f}")

    cost_per_deliverable = total_monthly / total_deliverables if total_deliverables else 0
    print("-" * 44)
    print(f"{'TOTAL':<12} {'':12} {'':>8} ${total_monthly:>9.2f}")
    print(f"\nDeliverables/month: {total_deliverables}")
    print(f"Cost/deliverable:   ${cost_per_deliverable:.4f}")

    return {"monthly": total_monthly, "per_deliverable": cost_per_deliverable}

# Example: full production setup
production = [
    {"name": "Writer",     "model": "sonnet-4",  "input_tokens": 20000, "output_tokens": 4000, "runs_per_day": 5},
    {"name": "Researcher", "model": "qwen-72b",  "input_tokens": 15000, "output_tokens": 3000, "runs_per_day": 5},
    {"name": "Analyst",    "model": "qwen-72b",  "input_tokens": 15000, "output_tokens": 3000, "runs_per_day": 5},
    {"name": "Editor",     "model": "qwen-72b",  "input_tokens": 8000,  "output_tokens": 1500, "runs_per_day": 5},
    {"name": "Marketer",   "model": "qwen-72b",  "input_tokens": 15000, "output_tokens": 3000, "runs_per_day": 5},
    {"name": "Ops",        "model": "ollama",     "input_tokens": 5000,  "output_tokens": 1000, "runs_per_day": 5},
]

agent_cost_calculator(production)
# Agent        Model          $/run    $/month
# --------------------------------------------
# Writer       sonnet-4      $0.1200     $18.00
# Researcher   qwen-72b      $0.0030      $0.45
# Analyst      qwen-72b      $0.0030      $0.45
# Editor       qwen-72b      $0.0016      $0.24
# Marketer     qwen-72b      $0.0030      $0.45
# Ops          ollama        $0.0000      $0.00
# --------------------------------------------
# TOTAL                                  $19.59
#
# Deliverables/month: 900
# Cost/deliverable:   $0.0218

Swap the model strings and token counts for your setup. The MODEL_PRICES dict uses current provider verified prices. Update them when providers change their rates.

Should you use OpenRouter or direct Anthropic API?

Opinion: start with OpenRouter. Switch to direct API only when volume justifies it.

FactorOpenRouterDirect Anthropic
SetupOne account, one credit poolSeparate account per provider
Model switchingConfig changeConfig change + new API key
PricingSmall markup over provider ratesLowest base rates
Batch API (50% off)Not availableAvailable
Fallback routingAutomaticYou build it yourself
Break-even pointUnder ~50K requests/dayOver ~50K requests/day

For most people, OpenRouter wins on convenience.

The exception: if your workload is batch-friendly (not real-time), the direct Anthropic Batch API cuts costs in half. Sonnet 4 drops to $1.50/$7.50. Opus 4.6 drops to $2.50/$12.50. That discount stacks with prompt caching for big contexts.

# Batch API savings for Sonnet 4
standard = {"input": 3.00, "output": 15.00}
batch =    {"input": 1.50, "output": 7.50}

monthly_standard = (20000 / 1e6 * standard["input"] + 4000 / 1e6 * standard["output"]) * 150
monthly_batch =    (20000 / 1e6 * batch["input"]    + 4000 / 1e6 * batch["output"])    * 150

print(f"Standard: ${monthly_standard:.2f}/month")
print(f"Batch:    ${monthly_batch:.2f}/month")
print(f"Savings:  ${monthly_standard - monthly_batch:.2f}/month")
# Standard: $18.00/month
# Batch:    $9.00/month
# Savings:  $9.00/month

What does over $100/month mean?

Something is wrong. At the Tier 3 production setup, six agents running 150 deliverables per month cost $28.50. Even doubling the volume, you should stay under $60.

If you’re over $100, check these three things:

  1. Retry loops. An agent that retries failed tasks burns tokens on every attempt. Five retries on a Sonnet task costs 5x what one clean run costs. Set maxIterations to a sane limit and log when agents hit it.
  2. Context stuffing. Loading a 50,000-token skill file on every run when the agent only needs 2,000 tokens of instructions. Trim your skills to the minimum viable context.
  3. Wrong model on the wrong role. Running Opus 4.6 ($5/$25) on a formatting agent that Qwen ($0.12/$0.39) handles perfectly. Audit your model assignments against the heuristic: premium for judgment, worker for structured tasks, local for formatting.

Opinion: if you can’t explain where every dollar goes, your agents are misconfigured, not underfunded. Frontier model pricing has dropped enough that cost problems are almost always config problems.

What should you actually do?

Start with the cheapest option that works for your current stage. Don’t pre-optimize.

  1. Learning? Install Ollama, pull an 8B model, point your agent framework at it. Total cost: $0. Upgrade to Qwen via OpenRouter ($10 of credits) when local quality gets frustrating.
  2. First client? Mixed stack. Sonnet 4 on the role that touches client deliverables, Qwen on everything else. Budget $20-30/month.
  3. Production? Run the Tier 3 setup from this article. Six agents, mixed models, ~$28.50/month. Monitor with the cost calculator above and watch for the three red flags.

The tiering decision matters more than any individual price. Expensive model where judgment happens. Cheap model where procedures happen. Free model where formatting happens. Get this right once and cost stays flat while output scales.

bottom_line

  1. A 6-agent AI company costs $28.50/month in API fees. Not $500. Not $200. Twenty-eight fifty. The margin on agent-produced deliverables is 99% at typical client pricing. The bottleneck is never the API bill.
  2. Model tiering is the whole strategy. Sonnet 4 on judgment roles, Qwen 2.5 72B on structured tasks, Ollama on formatting. One premium agent, four cheap workers, one free. That ratio is what keeps the bill under $30.
  3. Over $100/month means you have a bug, not a cost problem. Check for retry loops, bloated context, and premium models doing worker tasks. The pricing on frontier AI has dropped enough that cost overruns are configuration errors, not market reality.

Frequently Asked Questions

How much does it cost per month to run an AI agent company?+

A full production setup with 6 agents producing 150 deliverables per month costs about $28.50 in API fees. Learning setups run $0-15. The key is putting cheap models on routine work and reserving premium models for judgment calls.

What is the cheapest way to run AI agents?+

Run Ollama locally for $0/token on formatting and ops tasks. For cloud work, Qwen 2.5 72B via OpenRouter costs $0.12 per million input tokens, roughly 25x cheaper than Claude Sonnet 4. Mix both for under $15/month.

How do you calculate cost per deliverable for AI agents?+

Add up input and output token costs across every agent in the pipeline. A typical 5-agent research report costs about $0.09 total. At a $350 client price, that is 99% margin on production costs.