What is a claude prompt tuner and why does it ship as throwaway HTML?

It is a single self-contained HTML file with your template on one side, three live-rendering filled examples on the other, and a token counter at the bottom. Throwaway because it is purpose-built for one prompt for one decision, not parameterized for reuse. Reusable editors are slower than re-prompting; throwaway editors compound.

Why does the tuner need a token counter?

Because the failure mode is the template growing instead of getting sharper. Without the counter you make the prompt longer by accident and ship a degraded version. The counter ticks up as you add words and down as you remove them; you see the budget cost in real time and stop before you crowd the context window.

Build a Throwaway Claude Prompt Tuner in One Prompt

Summary:

Thariq’s verbatim 4-line Worked Prompt #8 that produces the prompt tuner in one Claude Code session.

The single diagnostic (token counter) that makes the editor honest and prevents the 7-day blind-edit failure.

The “end with an export” rule (D1 pattern) that turns the editor from a viewer into a workflow.

The CHECKPOINT: tune one real prompt, paste the tuned template back, watch the output improve on the first query.

The claude prompt tuner is the loop tightener. You have been hand-editing a system prompt for two weeks, watching the variable slots in your head, running a query, eyeballing the output, editing blind. Twenty minutes per round. With a single HTML editor that Claude Code generates from one prompt, the edit-render-eyeball cycle collapses to about thirty seconds per iteration. This article ships the editor in one prompt, names the single diagnostic that makes it honest, and walks the 7-day failure I shipped on my first try.

Annotated prompt tuner HTML showing template with three variable slots on the left, three live-rendering filled examples on the right, a character and token counter at the bottom, and a Copy final prompt export button

What is a claude prompt tuner that ships as HTML?

A single self-contained .html file Claude Code generates from one prompt. Template on the left with variable slots highlighted in a contrasting color (the agent usually picks yellow or pink). Three filled examples on the right that re-render live as you edit the template. A character and token counter at the bottom that updates with every keystroke. A “Copy final prompt” button that exports the tuned template.

Open in browser. Edit a slot. The three rendered examples update character-by-character. The token counter ticks. You see how each edit changes the filled outputs and roughly how much context budget the change costs.

The pattern is from Thariq’s article. The framing, verbatim:

Sometimes it’s hard to describe what you want purely in a text box. For this use case, I’ll often ask Claude to build me a throwaway editor for the exact thing I’m working on: not a product, or a reusable tool, but a single HTML file, purpose-built for this one piece of data.

That word throwaway is the load-bearing one. The engineer reflex when you build something useful is to parameterize it for reuse. That reflex is wrong for this pattern. The throwaway editor pays for itself the first time you use it. The reusable version has to pay for itself across ten uses to be worth the abstraction cost. Most weeks, you only tune one specific prompt. The math favors throwaway.

The 7-day failure I shipped on my first try

The first throwaway editor I built was wrong on the inside in a way I did not catch for three days.

I had been hand-editing a 240-line system prompt with nine variable slots. Every time I changed a slot’s value I had to mentally re-fill the other eight to imagine what the final prompt would look like. The friction was real. I asked Claude Code to build me a side-by-side prompt tuner: the template on the left with slots highlighted, three filled examples on the right that re-rendered as I edited. The editor worked. I started using it. I tuned the prompt for an hour, copied the result back into Claude Code, and noticed two days later that my agent was now answering questions with subtly worse outputs than the week before.

The editor had no token counter. I had not asked for one. I had been making the prompt longer without seeing the cost. The template was growing instead of getting sharper, and I had no signal to catch the drift. The fix took two minutes: ask Claude Code to add a token counter at the bottom of the editor. The lesson took three days to surface. The throwaway editor is only as useful as the diagnostics it surfaces, and the diagnostics you skip become the failure modes you ship.

Thariq’s verbatim Worked Prompt #8

The prompt that ships the editor is four lines. From the Anthropic blog post on HTML’s effectiveness, Worked Prompt #8, verbatim:

I'm tuning this system prompt. Make a side-by-side editor: editable prompt on the left with the variable slots highlighted, three sample inputs on the right that re-render the filled template live. Add a character/token counter and a copy button.

That is the entire prompt. The agent fills in the structure based on the system prompt you attach and the three sample inputs you provide.

To use it: paste the system prompt you are tuning into Claude Code, paste three sample inputs after it, and run:

Pin: ~/wiki/design-system.html as the visual reference.

I'm tuning this system prompt. Make a side-by-side editor: editable prompt on the left with the variable slots highlighted, three sample inputs on the right that re-render the filled template live. Add a character/token counter and a copy button.

System prompt:
<your system prompt here>

Sample inputs:
1. <first input>
2. <second input>
3. <third input>

Save to ~/wiki/editors/prompt-tuner.html.

The agent generates one HTML file between 600 and 1,200 lines. Open in browser.

The single diagnostic that makes it honest

The character and token counter is the diagnostic that mine was missing for three days. Make sure the agent’s output includes it. If the first generation skips the counter, re-prompt:

Add a character and token counter at the bottom of the tuner that updates live as I edit the template. Use char/4 as the token heuristic.

Insist on the counter. The counter is what makes the tuner honest.

The counter is a budgeting signal (usually a word-count or char/4 heuristic), not a billing-grade tokenizer. Close enough to catch a template growing past its useful length. Not close enough to predict the cents on your bill.

Tune for ten minutes. Watch the token count. Stop before you crowd the context window. If you see the counter climb from 2,800 tokens to 4,200 over the course of a tuning session, you have made the prompt longer, not better. The counter forces the trade visible.

End with an export (the D1 rule)

The “Copy final prompt” button is not optional. From the same Thariq article, verbatim:

The trick is always to end with an export: a ‘copy as JSON’ or ‘copy as prompt’ button that turns whatever I did in the UI back into something I can paste into Claude Code or commit to a file. You stay in the loop, but the loop gets much tighter.

The mechanic is round-trip. You manipulate the UI. The export button serializes the state. You paste that state back into Claude Code as the input to your next prompt. The agent uses your manipulated state as the starting point for whatever happens next.

Markdown cannot do this. A markdown document is read-only from the agent’s perspective. You read it, you decide, you re-type. The HTML editor with an export button shrinks the read-decide-retype loop to manipulate-export-paste. One click instead of three to five minutes of retyping.

Without the export, the editor is a viewer. A pretty viewer. Still a viewer. The whole pattern’s value lives in the export.

The three-box gate before you trust any throwaway editor

Three boxes. (1) Is it purpose-built for this specific task, not parameterized for reuse? Reusable editors with renamed slots are slower than re-prompting; throwaway purpose-built ones compound. (2) Does it include the one diagnostic that catches the failure mode you are most likely to hit? The token counter is the prompt-tuner version; the triage-board version is a count of items per bucket. The diagnostic gets surfaced if and only if you ask for it. (3) Does its export emit data in the format you actually paste somewhere next? Copy as markdown vs copy as JSON diff vs copy with a token snapshot at the bottom. The wrong format is the most embarrassing failure because the editor seems to work until the moment you need the exported output.

Three boxes is the gate. Two is the editor I would re-prompt before using. One is the editor I would throw away and re-generate.

The meta-pattern, distilled

After three editors, the shape of what makes them work is clear. Save this template as ~/wiki/source/editor-template.md:

# Throwaway editor template

I need a custom HTML editor for: <DESCRIBE THE TASK IN ONE SENTENCE>

The editor must:
1. Be a single self-contained .html file, no build step.
2. Be purpose-built for THIS task. Not parameterized for reuse.
3. Include the one diagnostic that prevents the failure mode I am most likely to hit: <NAME THE DIAGNOSTIC>.
4. End with an export button that emits the resulting state as <markdown | JSON | the original config format>.

Pin: ~/wiki/design-system.html as the visual reference.

Save to ~/wiki/editors/<name>.html.

The two angle-bracket slots are the parts you fill in for each new editor. Everything else is the durable pattern. Next custom editor you need is a one-paragraph prompt away.

What should you actually do?

If you have a system prompt you have been editing more than 5 times in the past month → ship the tuner this evening. Paste the prompt, paste 3 sample inputs, run Worked Prompt #8. About 5 minutes to generate, about 10 minutes to tune.
If your first generation has no token counter → re-prompt with explicit instruction to add it. Do not use a tuner without the counter; that is exactly the editor that wastes 7 days.
If you find yourself building the same editor three weeks in a row → only then abstract. Until then, the throwaway version is faster than the parameterized version every single time.
If your design system is not pinned → ship the design-system HTML first (one prompt, 30 minutes). The pin keeps subsequent artifacts coherent.

CHECKPOINT

Use the prompt tuner you just built to tune one real prompt you have been struggling with. Copy the tuned template back to Claude Code. Run a query that uses the tuned prompt. The output should be measurably better on the first try: more concise, more on-target, or fewer follow-up questions required. That measurable improvement is the proof the throwaway editor pattern works.

If the output is the same as before, the tuner did not surface the right thing. Two common causes: the three sample inputs you provided were too similar (the tuner could not see the variance), or you tuned the wrong slot. Re-prompt with three more diverse sample inputs and a different focus slot.

bottom_line

The tuner is throwaway by design. Engineer instinct says parameterize; engineer instinct is wrong here. Purpose-built wins on speed every time.
The token counter is what makes the editor honest. Without the diagnostic, you ship a degraded prompt and do not notice for 7 days.
Always end with an export. The pattern’s value lives in the manipulate-export-paste loop. Without the export, you built a pretty viewer.

Build a Throwaway Claude Prompt Tuner in One Prompt

Files Over Prompts

What is a claude prompt tuner that ships as HTML?

The 7-day failure I shipped on my first try

Thariq’s verbatim Worked Prompt #8

The single diagnostic that makes it honest

End with an export (the D1 rule)

The three-box gate before you trust any throwaway editor

The meta-pattern, distilled

What should you actually do?

CHECKPOINT

bottom_line

Frequently Asked Questions

Build a Throwaway Claude Prompt Tuner in One Prompt

Files Over Prompts

What is a claude prompt tuner that ships as HTML?

The 7-day failure I shipped on my first try

Thariq’s verbatim Worked Prompt #8

The single diagnostic that makes it honest

End with an export (the D1 rule)

The three-box gate before you trust any throwaway editor

The meta-pattern, distilled

What should you actually do?

CHECKPOINT

bottom_line

Frequently Asked Questions

More from this book

Persistent Memory for Claude Code in 30 Minutes

Ship HTML Artifacts on Every PR (Claude Code Pattern)

How to Build Karpathy's LLM Wiki in 90 Minutes

LLM Wiki Maintenance: One Wrong Summary, 16 Errors