How to Give an Agentic Loop an Exit Condition
An agentic loop exit condition has two halves: a measurable pass criterion and a hard cap. Wire both in Claude Code so a runaway loop stops in 1-4 turns.
>This wires the exit condition. Claude Code Loop Engineering goes deeper on deterministic verification, adversarial review, and the budget gate that caps the spend.

Claude Code Loop Engineering
Build Self-Correcting Agentic Loops That Ship Trusted Code, Not a $200 Overnight Bill
Summary:
- An exit condition has two independent halves: a measurable pass criterion and a hard cap.
- Phrase “done” as a runnable result the agent proves, never as the absence of complaints.
- Cap turns in the 1-4 range and put a dollar ceiling under it with
--max-budget-usd.- Copy-paste exit-condition recipes plus the test that proves your brake works.

An agentic loop exit condition is the one change that turns a $200 overnight runaway into a three-dollar run. A builder went to bed with a loop running, woke up to 91 review rounds and a $200 bill, and the loop hadn’t run away because the model was dumb. It ran away because nobody told it what “done” meant, so it never was. This is the most fixable mistake in agentic coding, and almost nobody is shown the fix. Here it is.
What is an exit condition for an agentic loop?
An exit condition is two independent halves, and you need both. The first is a measurable pass criterion: the definition of done the loop checks. The second is a hard cap: the ceiling that stops the loop even when the criterion is never met. One without the other is half a brake.
Why two halves? Because they handle two different outcomes. The pass criterion handles the case where the loop succeeds, so it stops the instant the work is genuinely done and not a round later. The hard cap handles the case where it fails, so a loop that can’t reach green quits at the cap instead of grinding all night. The $200 night had neither. A single turn cap of four, or a single dollar cap of five, would each have ended it before breakfast.
Here is what both halves look like in one line you can copy:
/goal all tests in test/auth pass (npm test exits 0) AND coverage >= 90% AND lint is clean, or stop after 4 turns
The clauses before the comma are the pass criterion. The clause after it is the cap. Every loop you ever write lives or dies on getting both into that single string.
What broke: the runaway that burned $200
The loop oscillated, because it had no measurable definition of done. One builder in that exact thread named the mechanism: without a clear exit, the agent makes a style change, then changes it back, then changes it again. Here is what that spiral actually looks like in the transcript:
WITHOUT a measurable exit (the rename-then-revert spiral):
turn 5: renames getTok -> fetchToken "clearer name"
turn 6: reverts fetchToken -> getTok "matches the rest of the file"
turn 7: renames getTok -> fetchToken "actually, clearer name"
turn 8: reverts ... <- burns turns, converges on nothing
That is a loop deciding it’s done over and over. The model isn’t malfunctioning. It’s doing exactly what an open-ended “keep improving this” asks, forever, on your dime. A builder who watched the same thing with a different coding tool landed on the fix this whole method teaches:
“yeah 3 rounds is roughly where i landed too. Past that codex starts inventing problems or arguing taste, so a hard cap plus a clear pass condition (not just ‘no findings’) saves a lot of money.”
(From the r/ClaudeCode thread where a builder’s overnight loop ran 91 review rounds and burned $200.)
That is the whole correction. “No findings” is a finish line that recedes, because a reviewer asked to find problems always finds one more. You don’t loop until the reviewer is happy. You loop until a measurable bar is met, or until the cap.
How do you write a measurable pass criterion?
Phrase “done” as a measurable end state the agent’s output can demonstrate, never as the absence of complaints. The small evaluator that checks your loop reads the transcript, not your filesystem, so each clause has to be a result the agent surfaces by running a command. Put a good condition and a bad one side by side:
# MEASURABLE end state — the checker can pin it down:
/goal all tests in test/auth pass and coverage >= 90% and lint is clean
# THE TRAP — open-ended; a reviewer can invent work forever:
/goal the reviewer has no more findings
The top one names facts. The bottom one names a mood. The top one ends. The bottom one recedes. If your goal has a quality bar beyond “it runs,” put the bar in as a number the agent’s tools produce. Coverage is the classic one: phrase it so the agent has to print the figure the checker needs to read.
# Force the agent to surface the number (e.g. 92.4%) in the transcript:
/goal `npm test -- --coverage` reports statements coverage at or above 90% and 0 failing
Now “done” includes a measurable threshold, and the loop can’t exit by writing a thin test that passes while half the module is uncovered.
How do you set the hard cap?
Set the turn cap in the 1-4 range, because a loop with a good plan and a clear target gets there in one to four rounds. Setting the cap there isn’t pessimism, it’s a tripwire. If the loop hits four rounds without meeting the criterion, that’s not a signal to raise the cap to forty. It’s a signal to stop and re-plan. A cap that’s too generous doesn’t make your loop more capable. It makes your failures more expensive.
The turn clause bounds iterations. For a loop you’re truly leaving alone, you want a second ceiling that bounds money and stops the run physically. Run it headless and hand it a hard dollar ceiling:
$ claude -p --max-budget-usd 5 --output-format json \
"/goal all tests in test/auth pass and coverage >= 90%, or stop after 4 turns"
That dollar cap is a real, built-in stop: the run halts when it has spent that much, no matter what the turn count or the model thinks. Turns bound the laps. Dollars bound the damage. The JSON that comes back carries the real spend so a script can read it without you watching:
{
"type": "result",
"subtype": "success",
"num_turns": 2,
"total_cost_usd": 0.83,
"result": "goal met: tests pass, coverage 92.4%"
}
Eighty-three cents, two turns, done. The $200 night had neither cap, which is the whole reason it became a story instead of a footnote.
The build: give your loop a real brake
-
Rewrite the goal as a measurable end state. Replace any soft success notion with a conjunction of runnable results: tests exit clean, the build succeeds, lint is clean, plus a quality number if you have one. Every clause must be something the agent proves by running a command.
-
Set the turn cap in the 1-4 range. Add
or stop after 4 turns. You are not setting a budget for success. You’re setting a tripwire for failure. -
Put a dollar ceiling underneath it. For any run you’ll walk away from, launch it headless with
--max-budget-usdset to a number that would make you shrug, not wince. -
Prove the brake works on an impossible task. Point the loop at something it genuinely cannot satisfy, a test asserting something false, and run it. The right outcome is that it tries, hits the cap, stops, and tells you plainly that it stopped without being done:
> /goal the test in test/impossible passes, or stop after 4 turns
[turn 1] edits, runs npm test -> FAIL expected 5, received 4
[turn 2] edits, runs npm test -> FAIL expected 5, received 4
[turn 3] edits, runs npm test -> FAIL expected 5, received 4
[turn 4] edits, runs npm test -> FAIL expected 5, received 4
■ stopped at the 4-turn cap — goal NOT met.
The test still fails. I stopped at the cap rather than continue.
A loop that says “stopped, NOT met” on an impossible task is one you can trust to say “goal met” on a real one. A brake you haven’t tested isn’t a brake.
That honest “stopped, not done” is the load-bearing distinction. Stopped means the loop halted: a cap hit, an error. Done means the pass criterion was actually met. A loop that hits its cap with the tests still red has stopped, but it is not done, and a loop that can’t tell the difference will hand you half-finished work with total confidence.
What should you actually do?
- If your loop oscillates → your condition isn’t measurable. Replace “clean this up” with a runnable result like
npm test exits 0. Cosmetic churn can never satisfy a measurable bar, so the rename-revert spiral dies. - If you want a reviewer in the loop → bound it the same way:
or stop after 3 review rounds, and convert real findings into tests. The opinion (“this isn’t covered”) becomes a file (a new test) becomes a runnable fact (“that test passes”). - If you’re leaving it unattended → use both caps. The turn clause for laps,
--max-budget-usdfor dollars. A serious exit condition is a measurable criterion, a turn cap in the 1-4 range, and a dollar ceiling under both. - If it hits the cap a lot → that’s the tripwire doing its job. Re-plan the task, don’t raise the cap.
The bottom line
- A runaway is missing an exit, not missing intelligence. A smarter model finds more sophisticated things to change and changes them with more confidence, for longer. The fix is structural, not a better prompt.
- Writing the exit condition costs thirty seconds. Not writing it cost one builder $200 and a morning of cleanup. There is no cheaper insurance anywhere in agentic coding, and almost nobody buys it.
- Keep a file of measurable exit conditions, one per kind of work you do. The exit condition is the most reusable part of any loop you’ll ever write, and a good one you wrote once is a loop you can trust every time after.
Frequently Asked Questions
What is an exit condition for an agentic loop?+
It's two things, not one: a measurable pass criterion that defines done as a runnable result, plus a hard cap that stops the loop even if the criterion is never met. Missing either half is how overnight runaways happen.
How many loops should a self-correcting agent need?+
Builders who run these daily land on one to four rounds for a loop with a solid plan. If it needs many more, the problem is the plan or the definition of done, not the model. Set the cap in that range as a tripwire.
Why is 'loop until the reviewer has no findings' a bad exit condition?+
A reviewer asked to find problems will always find another one. 'No findings' is a finish line that recedes as you walk toward it. Loop until a measurable bar is met instead, or until a hard round cap.