How to Cut Your Claude Code Bill in Half

by J Cook · 8 min read·

Summary:

  1. Six specific token sinks drain your Claude Code budget. This teaches you to plug each one.
  2. Includes the 3-line prompt formula that cuts token waste by 40%.
  3. Real before/after token counts: 147,000 vs 62,000 for the same feature.
  4. Copy-paste effort level function and restart-vs-continue decision tree.

My first month with Claude Code cost me $1,400. Not a typo. I ran max effort on everything, let conversations hit 100+ messages, and asked Claude to “read my whole project” three times a day at 80,000 tokens each read. The code I shipped was about 2,000 lines. The context Claude processed to get there? Over 218,000 lines of waste.

After learning what actually burns tokens, my cost per line dropped from $0.69 to $0.08. Those are my numbers on my project. Your mileage varies with codebase size and task complexity, but the fixes are universal. Here’s every one.

How do tokens actually work in Claude Code?

Token is roughly 4 characters of English text. A line of code runs 10-20 tokens. A full page of text hits 250-300.

Every interaction has two costs: input tokens (everything Claude reads: your prompt, files, command output, conversation history) and output tokens (everything Claude generates). Input tokens are cheaper per unit but there are way more of them.

The biggest cost isn’t your prompts. It’s file reading. When Claude reads a 500-line file, that’s 5,000-7,000 input tokens. Read 20 files to understand your project and you’ve burned 100,000+ tokens before a single line of code gets written.

From the Claude Code docs:

MetricValue
Average daily cost (API users)$6/developer/day
90th percentile daily costUnder $12/day
Team monthly average (Sonnet)$100-200/developer
Background process overheadUnder $0.04/session
Agent teams multiplier~7x standard sessions

What are the 6 token sinks and how do you fix them?

Sink 1: Conversations that run too long

Every message includes ALL previous messages in the context window. Message 1 costs X tokens. Message 50 costs X + Y + everything before it. After 20-30 messages, you’re spending more on history than on work.

Fix: Start a new conversation every 15-20 messages. Open with a one-sentence summary:

I'm working on the Task Tracker app. Just finished the status toggle feature.
Next: add filtering by status. Files: src/app/page.tsx and
src/components/TaskList.tsx.

Thirty seconds of typing saves hundreds of thousands of tokens.

Sink 2: Wrong effort level

The difference between low and max effort can be 3-5x for the same task. A CSS class change on max effort costs the same as building an entire component on medium.

Fix: The plain-English version: if Claude could do it with copy-paste, use low. If Claude needs to think, use medium. If Claude needs to plan across multiple files, use high. In code:

def pick_effort(task_description: str) -> str:
    """
    Effort level decision tree for Claude Code.
    Copy this logic. Check before every prompt.

    Returns: 'low', 'medium', or 'high'
    """
    # Mechanical: 1 file, simple change
    if task_description in [
        "rename variable", "add a line", "update config",
        "fix typo", "add import", "change CSS class"
    ]:
        return "low"  # /effort low

    # Architectural: 5+ files, system-wide impact
    if any(keyword in task_description for keyword in [
        "design system", "refactor module", "debug across",
        "major restructure", "new data model"
    ]):
        return "high"  # /effort high

    # Everything else: features, bugs, tests
    return "medium"  # default, leave it

# Examples:
print(pick_effort("rename variable"))      # low
print(pick_effort("build search feature")) # medium
print(pick_effort("refactor auth module")) # high

One second before each prompt. Massive savings.

Sink 3: Claude reading files it doesn’t need

Say “add a button to the task list” and Claude reads your entire components directory, utils, and database models. It only needed one file.

Fix: Name the files. “Add a delete button to the task list in src/components/TaskList.tsx. The delete action should call the deleteTask server action in src/app/actions.ts.” Two files instead of twelve.

Add a file map to your CLAUDE.md:

## File Structure
- src/app/page.tsx: main page, renders TaskList
- src/components/TaskList.tsx: task list UI component
- src/components/TaskForm.tsx: new task form
- src/app/actions.ts: server actions for CRUD
- prisma/schema.prisma: database schema

Claude reads fewer files because it knows where things are. Pays for itself within two conversations.

Sink 4: Not using code intelligence plugins

Without code intelligence, Claude finds functions by reading file after file. With it, Claude asks “where is deleteTask defined?” and gets an instant answer.

One Reddit post titled “Enable LSP in Claude Code: code navigation goes from 30-60s to 50ms with exact results” got 862 upvotes on r/ClaudeCode. Not exaggerated. Code intelligence reduces file reads by 60-80% on medium-sized projects.

Fix: For TypeScript projects, install the TypeScript compiler (npm install -D typescript) and ensure your tsconfig.json exists. Claude Code detects it automatically. For Python, install pyright. For Go, gopls. For Rust, rust-analyzer.

Sink 5: Verbose prompts

“I was thinking that maybe we could explore the possibility of adding some kind of functionality that would allow users to potentially filter their tasks.” That’s 27 words containing 4 words of instruction: “add task filtering.”

Fix: The 3-line prompt formula:

Line 1 (action): Add a dropdown filter above the task list.
Line 2 (scope): Modify src/components/TaskList.tsx.
Line 3 (constraint): Options: All, Todo, In Progress, Done.

Under 30 words. Claude has everything it needs. Compare that to the 72-word version and you cut input tokens nearly in half.

Sink 6: Asking Claude to explain instead of do

“Can you explain how the auth system works and then suggest improvements?” makes Claude write a multi-paragraph explanation AND a suggestion. You’re paying for both.

Fix: If you want changes, ask for changes. “Improve the auth system: add rate limiting on login attempts, add password complexity requirements, switch from JWT to httpOnly cookies.” Claude makes the changes. Review the code. Ask for explanations only when you genuinely need them.

What broke when I ignored this?

Week 1: everything on max effort, 80,000 tokens per orientation read, three times a day. That’s 240,000 tokens daily just on “read my project.” Cost: ~$400.

Week 2: five-paragraph prompts for every feature. Claude would read my essay, do something different, then I’d write another essay correcting it. Features that should take one conversation took four. Cost: ~$500.

Week 3: debugging in circles. Same conversation, 100+ messages deep. Claude kept suggesting fixes it had already tried. Two full days going nowhere. Cost: ~$450.

Total: $1,387.42 for 2,000 lines of code. After applying all six fixes, my next project of similar scope cost roughly $170. Your savings depend on your project, but the pattern holds: most of what you pay for is context overhead, not useful work.

How do you measure the actual difference?

Run this test on any project. Same task, two approaches:

UNOPTIMIZED (old habits):
  Input tokens:  147,000
  Output tokens:   8,200
  Files read:     14
  Time:           45 seconds

OPTIMIZED (all 6 fixes):
  Input tokens:   62,000
  Output tokens:   6,100
  Files read:      3
  Time:           18 seconds

Same feature. Same result. 58% fewer input tokens. The optimized version is also faster because Claude reads fewer files.

When should you restart vs. continue a conversation?

SituationAction
Finished a feature, starting new oneRestart
Claude’s responses getting confusedRestart
Context bar past 60%Restart
Switching to different area of codebaseRestart
Mid-debug on a specific issueContinue
Current task continues the previous oneContinue
Iterating on design with screenshot loopContinue

The cost of restarting: 5,000-10,000 tokens to re-explain context. The cost of NOT restarting: 100,000+ tokens of dead weight per message. The math always favors restarting.

Before you restart, ask Claude: “Summarize the current state of the project in 3-4 sentences.” Copy that summary into the new conversation. Perfect context in 100 tokens instead of 5,000.

What should you actually do?

  • If you’re on the Pro plan ($20/mo): apply these fixes and you stop hitting the rate limit every afternoon. Expect 3 extra productive hours per day.
  • If you’re on the Max plan ($100-200/mo): match effort levels to tasks and restart conversations after each feature. You get a full workday of building without hitting limits.
  • If you’re on the API plan: the conversation length fix alone (Sink 1) saves $500-800/month for heavy users. Combined with effort levels and specific file references, expect to cut total spend by 40-50%.

bottom_line

  • Token waste is the default Claude Code experience. Six fixes eliminate it: fresh conversations, right effort levels, specific file references, code intelligence, short prompts, action-oriented requests.
  • The savings compound every day. Over a year, that’s hundreds or thousands of dollars for API users, and hours of recovered time for subscribers.
  • The common thread: you don’t get more by spending more. You get more by spending smarter. A $0.08/line workflow beats a $0.69/line workflow on every metric.

Frequently Asked Questions

How much does Claude Code cost per month?+

Anthropic's docs report an average of $6/developer/day for API users, with 90% staying under $12/day. Pro plan is $20/month with rate limits. Max plans run $100-$200/month.

Why does Claude Code keep re-reading my files?+

Context eviction. When a conversation gets long, older content drops out. Claude re-reads files because it literally forgot them. Fresh conversations and specific file references fix this.

What effort level should I use in Claude Code?+

Low for mechanical edits (rename, config change). Medium for features and bugs. High/max only for architecture decisions spanning 5+ files. Wrong effort levels waste 3-5x tokens on simple tasks.