How do you parallelize AI coding agents safely?

Parallel agent setups fail in three predictable ways: worktree clobber (two agents writing the same files), label drift(the board says “done” while the branch says “in progress”), and cost runaway (six agents looping overnight). Solve each with a structural fix, not a habit: worktree per run, status bound to branch state, hard budget caps per session.

Download desktopmacOS · Linux · Windows See cloud for teams

The short answer

Parallelizing agents is not about getting more agents running. It is about preventing the three things that happen when you do. Each failure mode has a structural fix — a place in the workflow where the bad outcome becomes impossible, not just unlikely. Without those, four agents in parallel is four times the chance of any one thing going wrong; with them, it is roughly the same risk as one agent, just faster.

Failure mode 1: worktree clobber

Two agents write to the same file in the same working directory; the second write silently beats the first. You see this when people spin up two terminal tabs in the same checkout and run Claude in each. The agents do not know about each other; the filesystem does not coordinate them; the model has no concept of “another writer.” By the time you notice, one agent’s changes are gone.

The structural fix

One worktree per run, named deterministically by run ID. KanBots’s dispatcher computes the worktree path from the run’s primary key:

# packages/dispatcher/src/worktree.ts (essentials)
worktreePath = `${repoPath}/.kanbots/worktrees/issue-${issueNumber}-${runId}`
branchName   = `kanbots/issue-${issueNumber}-${runId}`

# git operation
git worktree add -b kanbots/issue-42-7 \
                  .kanbots/worktrees/issue-42-7 \
                  origin/main

The runId is the autoincrement primary key in the SQLite runs table, so two dispatches against the same issue produce kanbots/issue-42-7 and kanbots/issue-42-8. Collisions are not unlikely; they are impossible. The branch prefix kanbots/issue- is also what the pre-push hook keys off — an agent that tries to push its branch gets rejected at the git layer before the request leaves the machine. See the git worktree workflow for the full lifecycle.

Failure mode 2: label drift

You have a kanban column “In progress” that tracks running agents. An agent finishes; nothing updates the column. You glance at the board the next morning, think the work is still in flight, dispatch another agent on the same issue, end up with two diverging branches for one task. Or: the agent fails halfway, the board still says “In progress,” you wait an hour assuming progress is happening.

This is what happens when board state and execution state are stored in two different systems. Linear has issues. Claude Code has runs. Nothing ties them.

The structural fix

Card status is derived from the latest run’s state, not set by hand. KanBots’s run lifecycle has five terminal states: completed, failed, cancelled, awaiting_decision (paused), and running. The card column reflects the latest run’s state directly:

running → In progress column, with the orange pulsing dot.
awaiting_decision→ same column, awaiting dot, decision prompt visible on the card.
completed with diff → In review column, green dot.
failed or cancelled → back to Todo, red or grey dot, error visible on the card.

In GitHub mode the column move is mirrored as a label edit on the underlying issue (status:in-progress, status:in-review, etc.). The branch existence and the agent run state are the source of truth; the kanban cell is a projection. There is no “drag the card and forget to update the runner” failure mode because the human does not own the status.

Failure mode 3: cost runaway

Four agents in parallel is four times the burn. One agent in a retry loop that you did not notice is a fire. An autopilot session that runs for eight hours overnight, with four slots, is a really expensive bug.

The naive answer (“I’ll check on it”) does not work when you have a job and need to sleep. The structural answer is hard budget caps at two levels.

The structural fix

Two budgets, both enforced by the supervisor, both hard stops:

Per-run cap. Each Claude/Codex run has a configurable token or dollar ceiling. The supervisor reads the usage field on each result event (and the per-turn token counts on assistant events) and sends the process SIGTERM when the cumulative cost crosses the cap. The card surfaces this as a cost-capped stop reason.
Per-autopilot-session cap. The feature-dev or QA loop has an overall dollar budget. When the rolling total across all slots crosses it, the loop stops dispatching new runs; in-flight runs drain naturally. The autopilot session row in the UI shows “Stopped: budget reached — $X.YZ used.”

Both caps are visible in the UI before you start the session. There is no version that runs “until it finishes” without a ceiling — even “unlimited” is a number you typed. This is the same pattern good cloud platforms use for compute spend: do not rely on humans noticing; rely on the meter cutting it off.

How KanBots ties the three fixes together

The three failure modes share one shape: state drift between “where work happens” and “where status lives.” KanBots collapses both into one process, one SQLite database in .kanbots/db.sqlite:

The runs table owns the lifecycle. Every state transition (started, awaiting_decision, completed, failed) is a row update with a timestamp.
Worktree paths and branch names are derived from runs.id, so the filesystem state is reconstructable from the database and vice versa.
Cost events (parsed out of every result event in the stream) are stored alongside the run, so per-card and per-session rollups are real queries, not back-of-envelope.
The UI subscribes to the runs table and re-renders on change. The column the card lives in is computed from the run state. No human-only status fields exist.

Practical effect: if you kill KanBots mid-run, the worktree is on disk, the run row is in completed=false, and reopening the app resumes the live thread — the parser picks up where it left off. The board never lies because the board is the database.

A concrete walkthrough

Imagine you want to drain a backlog of 12 small refactor tasks overnight, with a $25 budget total, on a laptop with four cores.

Move the 12 cards into Todo. Open Autopilot — Feature Dev.
Set parallelism = 4, model = Claude Sonnet, effort = medium, session cost cap = $25, per-run cap = $4. Pick personas: engineer + reviewer.
Click Start. Four cards begin running concurrently, four worktrees appear under .kanbots/worktrees/, four kanbots/issue-N-M branches exist.
The loop round-robins: as each card finishes, the next backlog card is dispatched into the freed slot. Personas alternate between engineer (write code) and reviewer (check the engineer’s diff against the spec).
Awakened in the morning: the autopilot session row says “Stopped: budget reached — $24.78 used. 9 of 12 cards completed. 2 failed (test broke). 1 awaiting decision (architecture question).” All nine completed cards sit in In review with worktree intact, ready to promote.

No card silently lying about its status. No agent quietly racking up $40 because nothing capped it. No second agent on a card whose first run was “invisibly finished.” That is the difference structural fixes make.

Other failure modes worth knowing about

Stale base ref

All four worktrees were created off main at 10pm; by morning, mainhas moved. Each worktree is now four commits behind. KanBots displays this on the card; the human decides whether to rebase or merge during promotion. Do not automate the rebase — that is where merge conflicts from agent diffs go silently wrong.

Decision storm

Four agents all hit a decision at once. You wake up to four prompts. Either answer them in dependency order (some cards depend on others) or set the autopilot session to auto-pick the safer option when no human is around — the decision prompt protocol supports a default. This is a UX choice, not a structural failure.

Rate limit fan-out

N agents all hitting Anthropic at once, all triggering the same shared rate limit, all retrying with the same backoff — thundering herd. Real backoff jitter on the client side fixes it; without it, you waste cost on retry pings. KanBots’s dispatcher detects rate_limit events in the stream and shows the run as paused with a clear reason, instead of generic stall.

When this is the wrong shape

Parallel agents with the three fixes above is a good fit for: a backlog you want drained, a refactor that fans out, a Sentry queue, a triage column. It is the wrong shape for:

One agent on one complex problem. The right answer there is multi-persona round-robin inside one card, not multiple cards in parallel. See running Claude Code in parallel for when this overlaps.
Production deployment work. The whole pattern is for a developer machine driving development work. Production-impacting work should go through code review on PRs, not straight from an agent worktree.
Anything irreversible (deletes, migrations) without a human in the decision loop. Set the autopilot to require human on these decisions instead of auto-defaulting.

For the right-shape cases, the structural fixes turn parallel agents from “risky ad-hoc thing” into “ordinary workflow.” Pair this with how to spawn multiple coding agents from one machine for the OS-level side, and the worktree workflow for the git side.

Try it on your own folder

Drop a folder, get a board, dispatch parallel agents. The desktop runs locally on macOS, Linux, and Windows.

Download desktopmacOS · Linux · Windows Getting started docs

How do you parallelize AI coding agents safely?

The short answer

Failure mode 1: worktree clobber

The structural fix

Failure mode 2: label drift

The structural fix

Failure mode 3: cost runaway

The structural fix

How KanBots ties the three fixes together

A concrete walkthrough

Other failure modes worth knowing about

Stale base ref

Decision storm

Rate limit fan-out

When this is the wrong shape

Try it on your own folder

Related questions