How do you put a budget cap on AI coding agents?

Every Claude Code and Codex run reports its USD cost in the stream's terminal result event. KanBots accumulates that into per-run, per-card, and per-autopilot-session totals, and a budget cap on the autopilot session stops the loop with stopReason: 'cost-budget' when total spend crosses the limit.

The arithmetic that bites

A single Claude Sonnet run at medium effort on a small ticket costs roughly $0.40 to $1.20 and finishes in 2 to 6 minutes. That is cheap. The bite comes from compounding.

Four parallel autopilot slots, each running about $0.50 per minute of wall time, looping for an hour: 4 slots × 60 min × $0.50 = $120. Run that overnight on three issues and you wake up to $360 of CLI charges that no one approved. The economic case for parallel agents needs a meter on it.

Three rules keep this honest. Track every run. Roll up by card and by session. Halt automatically when a number you picked in advance is crossed. KanBots does all three.

Where the cost numbers come from

Both supported CLIs emit a final result JSON object on stdout when a run ends. It carries total_cost_usd, plus token counts and a success flag. The dispatcher's stream-parser classifies that line as the result event and writes the cost onto the agent_runs row. Sum across runs for a card to get the card total; sum across runs in an autopilot session to get the session total.

Three budgets exist in .kanbots/config.json:

{
  "defaults": {
    "runCostBudgetUsd": 2.50,
    "sessionCostBudgetUsd": 25.00
  }
}

runCostBudgetUsd is per-dispatch. The dispatcher kills the run if the accumulated cost during a single CLI invocation exceeds the cap. sessionCostBudgetUsd is the autopilot cap — when the sum across every child run in the session crosses it, the orchestrator throws SessionBudgetExceededError and all slots return. Setting either to null (or omitting it) disables that cap. The Autopilot — Feature Dev modal has a budget input that overrides sessionCostBudgetUsd for the session you are about to start.

What the UI shows you

Three places to see the number, in order of zoom:

  • The run card in the live thread shows the running dispatch's accumulated cost in the run stats row alongside model, elapsed, and tokens. It updates as result events land.
  • The card detail shows the rollup across every run ever dispatched on that card — useful for asking "how much have we spent on this issue total."
  • The autopilot session panel shows the cumulative spend across every child run in the session, plotted against the cap. When the bar fills, the session ends.

Why this beats fire-and-forget cron

A cron that runs claude -p on a queue has no per-run spend visibility, no rollup, no automatic halt. You discover the cost at the end of the month from the Anthropic console. By then the loop has already run for 19 days. KanBots reads the result event for every run and persists it before the next run starts; the budget cap is a constraint the orchestrator actually checks before claiming the next persona.

The economic shape KanBots inherits from its OSS thesis: bring your own keys, pay the model provider directly, never resell inference. The budget cap protects the wallet the keys are billed to — yours.

A worked budget

  1. Card #221, "Build invoice export." You open Autopilot — Feature Dev. Personas: product, engineer, reviewer, tester. Parallelism 2. Effort: medium. Model: Sonnet.
  2. Budget cap: $15. (Default the first time you ever run autopilot on a non-trivial card; raise as you learn the work.)
  3. Cycle 0 (product, $0.30): splits #221 into three subtasks. Cycle 1 (engineer, $1.10): writes the schema. Cycle 2 (reviewer, $0.45): approves. Cycle 3 (tester, $0.80): writes a roundtrip test, it passes.
  4. Cycles 4–7 (parallelism 2 picking up the remaining subtasks): $4.80 across four runs.
  5. Cycles 8–10: $3.20 polishing reviewer feedback. Session total now $10.65. Cap not hit.
  6. Cycle 11 spawns and finishes; total $11.40; loop sees no work left and exits cleanly. You spent $11.40 of the $15 you authorized.

Defaults and recommended caps

First-time autopilot on a card you have never touched: cap $10 to $25. Watch what the loop does for one cycle then decide whether to raise.

Repeated autopilot on a card type you have done before (CRUD endpoint, schema migration, small refactor): cap $5 to $15.

Large multi-day feature with four personas and parallelism 4: start with $50, watch the autopilot panel after 30 minutes, raise if the loop is making real progress and stop if not.

For the autopilot mechanics themselves see autopilot mode; for why a runaway backlog can spike spend see self-evolving backlog.

Three failure modes

You forgot to set a session cap. The budget input was empty, the session ran for two hours, total cost $87. Fix: set sessionCostBudgetUsd in .kanbots/config.json so the modal pre-fills it on every future session. The right default for your team's usage is whatever you'd be comfortable paying for one card.

Effort: max with parallelism 4. Each slot runs the most expensive model with the largest context. Cost rate spikes to $2/minute per slot — $480/hour at full burn. Fix: drop effort to medium for the parent issue's first pass and bump only for cycles that are obviously stuck.

The cap halts a near-finish. Session hit $25 with one subtask still mid-cycle. Cap was correct, you just want this one to finish. Fix: the orchestrator lets in-flight child runs complete their current iteration; new child runs do not start. To resume, open a fresh autopilot session on the same card with a smaller cap targeting just the remaining work.

When budget caps are wrong

Budget caps are wrong when the work is open-ended exploration and you genuinely want to spend whatever it takes. Set sessionCostBudgetUsd: null and watch manually. They are also wrong as a substitute for a real spec — capping spend on an ambiguous parent does not produce a good outcome, it just produces a cheap bad outcome.

For how spend is isolated per branch (so promoting one slot does not duplicate cost), see the feature-branch workflow.

Try it on your own folder

Drop a folder, get a board, dispatch parallel agents. The desktop runs locally on macOS, Linux, and Windows.