Free playbooks in your inbox
Hands-on tutorials for people who want to build with AI.
Home / Collections / Local AI Playbook
Collection

Local AI Playbook.

Ollama, Hermes, autonomous agent companies. Cut cloud bills, own your stack, run private models at 30+ tokens per second on the GPU you already own.

5 titles · 25 free guides

Free guides
analysis 8 min read

What It Costs to Run an Autonomous AI Company

How much does it cost to run AI agents? A six-agent autonomous company runs on $28.50 a month. Here is the per-agent model breakdown and where the margin is.

from: Zero-Human Companies

how-to 8 min read

Write a Hermes SKILL.md the Agent Actually Uses

How to write a Hermes skill: the SKILL.md format, the three ways skills reach the agent, and a copy-paste template the matcher actually loads with hermes -s.

from: Zero-Human Companies

analysis 9 min read

Where Hermes + Paperclip Agents Break (and the Fixes)

The real AI agent failure modes when you run Hermes + Paperclip: three cited GitHub bugs, the security traps, and the runaway-cost fixes that actually hold.

from: Zero-Human Companies

tutorial 8 min read

Install Paperclip AI and Launch Your First AI Company

How to install Paperclip AI: run one npx command, pass the doctor check, open the dashboard at localhost:3100, and create your first AI company and issue.

from: Zero-Human Companies

how-to 9 min read

Wire Hermes Into Paperclip with the hermes_local Adapter

The hermes_local adapter runs Hermes Agent as a Paperclip worker. Configure one agent, assign an issue, fire a heartbeat, and watch a $0.02 run end to end.

from: Zero-Human Companies

analysis 8 min read

Local vs Cloud AI Coding: When Local Loses

Are local coding models good enough? For most of your day, yes. The honest map of where local wins, where it loses, and how to decide before you fire a task.

from: Run Claude Code Locally

analysis 8 min read

How to Pick a Local Model for Coding

The best local LLM for coding isn't the leaderboard winner. Pick by tool-call reliability and speed, with a model table and a benchmark you run yourself.

from: Run Claude Code Locally

tutorial 8 min read

Run a Local AI Coding Agent Free in 15 Minutes

Build a local AI coding agent on Ollama that edits real code offline. Install, pull qwen2.5-coder, run Codex, ship your first edit with the Wi-Fi off.

from: Run Claude Code Locally

how-to 9 min read

Run Claude Code Locally with Ollama

Point the real Claude Code CLI at a local Ollama model through a LiteLLM proxy. The exact env vars, the config.yaml, and the version you must never install.

from: Run Claude Code Locally

how-to 9 min read

Why Your Local AI Coding Agent Is Slow

Why is Ollama slow for coding? Almost never your hardware. Fix the 52-second tool call with three knobs: model size, quantization, and context length.

from: Run Claude Code Locally

tutorial 8 min read

Install OpenCode and Ship in 10 Minutes

How to install OpenCode the right way, connect a provider, and ship a deployed public URL in ten minutes. Six steps, the three install gotchas, real token cost.

from: The opencode Stack

reference 7 min read

OpenCode Not Working? The 5 Failure Modes

OpenCode not working? The 5 documented failure modes with detection, cause, one-line fix, and prevention. Plus the weekly health-check script that catches them.

from: The opencode Stack

tutorial 8 min read

OpenCode Orchestrator + Fixer Routing, Wired

The opencode orchestrator fixer pattern: two models, two slots, one opencode.json. Three task-class routing recipes and an A/B receipt that cuts the bill 10x.

from: The opencode Stack

tutorial 8 min read

Self-Host OpenCode With Zero Outbound Tokens

How to self-host OpenCode with a local model and prove zero outbound API tokens with tcpdump. Hardware tiers, the provider config, and the silence receipt.

from: The opencode Stack

how-to 8 min read

Keep OpenCode Under $1 a Day: Cost Dashboard

Build OpenCode token cost tracking that alerts at the $1/day line. The SQLite schema, the parser, the run-rate readout, and the four levers that cut spend.

from: The opencode Stack

how-to 9 min read

How to Fit a 26B LLM on a 16GB GPU

Q4_K_M is not the floor. Importance-matrix quantization, IQ3_M, and per-tensor tricks let you run models that 'cannot fit' your GPU with usable quality.

from: Master Ollama - The Speed Playbook

how-to 8 min read

How Much VRAM Do You Need for a Local LLM?

The exact formula for predicting VRAM use of any local LLM, plus the KV cache table you need before you waste 20 minutes downloading a model that crashes.

from: Master Ollama - The Speed Playbook

tutorial 8 min read

Ollama Modelfile: 3 Templates That Beat the Defaults

Default Ollama settings produce mediocre output. These 3 ready-to-copy Modelfiles for chat, code, and analysis fix it in 2 minutes with explicit reasoning.

from: Master Ollama - The Speed Playbook

analysis 9 min read

Ollama vs llama.cpp: A Head-to-Head Speed Test

All three engines use llama.cpp. Here is the head-to-head test that debunks the 'double your speed' Reddit claim and tells you which one to actually run.

from: Master Ollama - The Speed Playbook

how-to 9 min read

Why Is Ollama So Slow? A 6-Step Diagnostic

Your Ollama is stuck at 3 tok/s? The priority-ordered diagnostic that finds the bottleneck in 5 minutes, with the specific fix and a tok/s test for each.

from: Master Ollama - The Speed Playbook

tutorial 8 min read

How to Replace GitHub Copilot with Ollama in VS Code

Set up a free AI coding assistant in VS Code using Ollama and Cline. Seven real workflows tested with honest quality scores compared to GitHub Copilot.

from: Ship Local AI with Ollama

tutorial 8 min read

How to Route Queries to Multiple Ollama Models

Build a Python router that sends coding questions to CodeLlama and general queries to Llama 3. Keyword-based and AI-powered routing strategies compared.

from: Ship Local AI with Ollama

how-to 8 min read

How to Secure Your Ollama Installation

Run this 30-second audit script to catch the exact misconfiguration that exposed thousands of Ollama instances to the public internet. Copy-paste fixes included.

from: Ship Local AI with Ollama

analysis 8 min read

Ollama vs ChatGPT: Honest Benchmarks from 100 Prompts

Benchmark scores comparing local Ollama models against GPT-4o across coding, writing, reasoning, and more. Data from 100 test prompts on a consumer laptop.

from: Ship Local AI with Ollama