How do you build an AI agent task queue?
What an agent task queue actually is
A request/response chatbot has no queue. You send a message, you wait for the answer, you go again. A “fire-and-forget” cron has the opposite problem: tasks start, you have no idea what they’re doing, and the results land in a log file you won’t read.
An agent task queue sits in between. It has persistent units of work (cards), a dispatcher that allocates parallelism, a place for the agent to ask the human a question without losing context, and budgets that fire before runaway spend becomes catastrophic. The whole thing is observable from a single board.
The pieces, named
- Cards. Persistent units of work. In kanbots a card is an issue (local or GitHub). It carries title, body, labels, status, and a child list of runs.
- Columns. The state of the work: Backlog, In progress, Review, Done. The drag order inside a column is the queue order.
- Runs. A specific attempt by an agent to make progress on a card. One card can have many runs over its life (failed runs, follow-up runs, autopilot child runs).
- Slots. The unit of parallelism. Autopilot has up to
MAX_PARALLELISM = 4slots. Each slot can run one agent at a time. - Decisions. A pause point inside a run where the agent asks a human a question. Stored in the
decisionstable with a deadline; default 30-min timeout, configurable 60s–24h. - Budgets.
runCostBudgetUsdcaps a single run.sessionCostBudgetUsdcaps an autopilot session. Both are set per-workspace in.kanbots/config.jsonor via the Cost budgets UI.
How the dispatcher picks the next task
For a one-off dispatch, you decide: click a card, click Dispatch. For autopilot — the queue-shaped mode — the loop is:
- A free slot polls for work. In feature-dev autopilot, the slot atomically claims the next persona from a round-robin counter (
cycle_index) and dispatches the parent issue against that persona’s system prompt. In one-off mode, the dispatcher takes the top card in Backlog and moves it to In progress. - A worktree is created at
.kanbots/worktrees/issue-N-runId/. The agent CLI is spawned (claude -porcodex exec) with a stream-JSON output flag. The slot now belongs to that run. - Stream events flow into the
agent_eventstable. The UI subscribes and renders them live. If adecisionevent arrives, the run pauses on the card and the slot remains held (the agent process is alive, waiting on stdin) until the decision resolves. - On
result(or stop/cancel/timeout), the slot frees. The dispatcher picks again.
Slots are held during decision pauses by design. The agent process is still alive on the developer’s machine; killing the slot would mean re-spawning Claude later and re-priming the context. The cost is that a stuck decision blocks one of your four slots until the deadline.
Cost caps, in order of severity
Three independent caps protect against runaway spend. Each one stops execution at its own scope.
- Per-run.
runCostBudgetUsdis tracked by the dispatcher asresultevents come in. When cumulative cost exceeds the cap, the run stops withstopReason: 'cost-budget'. The slot frees. The card stays in In progress; you decide whether to re-run. - Per-autopilot-session.
sessionCostBudgetUsdrolls up all child runs in an autopilot session. When the session total breaches the cap, the orchestrator stops launching new child runs; in-flight children finish their current iteration but aren’t replaced. A single button on the session header stops the whole tree. - Cloud event ceiling. On kanbots Cloud, runs that exceed 50,000 stream events trip a server-side ceiling (configurable per-org up to 500,000). The cloud sends a
force_terminatemessage; the agent exits. This exists to catch infinite-loop pathologies that didn’t accumulate cost fast enough to trip the run budget.
What decisions do to the queue
A decision is a deliberate stall. The agent emits a stream event:
{ "type": "decision_request",
"payload": {
"decision_id": "dec_01HX...",
"question": "The Stripe webhook handler has no retry logic. Add one?",
"options": [
{ "value": "a", "label": "Yes, exponential backoff" },
{ "value": "b", "label": "Yes, respect Retry-After" },
{ "value": "c", "label": "No, defer to follow-up issue" }
],
"timeout_seconds": 1800
} }The dispatcher writes a row into the decisions table with state: open, deadline_at = opened_at + timeout_seconds, and a nullable default_value. The card surfaces the question in the run drawer with numbered buttons. The agent process blocks waiting on stdin.
When a permitted answerer (the assignee, an org admin, or anyone per decisions.answerers policy) clicks an option, the API runs a conditional update:
UPDATE decisions
SET state = 'answered',
answered_at = now(),
answered_by_user_id = $1,
answered_value = $2
WHERE id = $3 AND state = 'open'
RETURNING *;The conditional WHERE state = 'open'serializes simultaneous answers without an advisory lock — the second clicker gets a 409 with the winning answer. The chosen value is written to the agent’s stdin; the run resumes; the slot stays held throughout.
Programmatic answerers work too: a CI script with a bearer token can subscribe via the agent SDK and auto-answer low-risk decisions. This is how you build “always allow option A on decisions tagged as risk_level: low” without a human in the loop.
What happens when the queue empties
Autopilot doesn’t loop forever in the absence of work. The stop conditions are:
- A persona signals completion.
- The session cost budget hits.
- The user clicks Stop session.
- Every persona has cycled and reported no new work to do (the backlog has converged).
Feature-dev autopilot is interesting because personas can add work via splitIssue— an agent spotting a missing test creates a child card on the board, and a later cycle picks it up. The queue can grow under its own steam. Your only knob to keep it bounded is the session cost budget.
Failure modes and fixes
“Slots are held by stuck decisions.”
If three of your four slots are paused on decisions nobody clicked, throughput drops to one. Fix the upstream cause: lower the decision timeout (so dead runs free their slots faster), or set a default_value on the decision template so the cloud auto-answers after the deadline. The Vercel cron job runs every 30s and times out decisions whose deadline has passed; if the default is set, the agent gets the default; if not, the run fails with reason: decision_timeout.
“A run hit the cost cap but the card stayed open.”
Intentional. Budget breaches stop the run, free the slot, and leave the card alone — the worktree is still on disk with whatever partial work the agent did. Decide: promote what you’ve got, dispatch a fresh run with a higher cap, or discard. If you want auto-discard on budget breach, that’s a feature request, not a config.
“Rate limits kill throughput.”
Both Claude and Codex stream a rate_limit event when the provider pushes back. The dispatcher broadcasts cooldown:changedwith the resume time; new dispatches queue until the cooldown clears. If you’re seeing this often, the fix is fewer parallel slots, not more retries — the provider is telling you it can’t take more right now.
When this is the wrong fit
If your tasks are sub-minute (autocomplete, single-file edits in the editor), a queue is overhead. The chat is the better surface: ask, accept, move on. The queue earns its keep when tasks run for 5–60 minutes, want a worktree of their own, and need a place to ask the human a question without losing context.
Also: if you don’t want budgets, decisions, or observability — if you just want a script that loops Claude on a backlog and you trust it — a bash script with xargs -P 4 will do. The queue exists to make the loop safe and team-shaped, not to replace simple things.
Related reading
For the board itself, see what an AI agent kanban board is. For the budget mechanics in more depth, see how to put a budget cap on AI coding agents. For the parallelism story, see how to run Claude Code in parallel.
Try it on your own folder
Drop a folder, get a board, dispatch parallel agents. The desktop runs locally on macOS, Linux, and Windows.
Related questions
- What is the right AI coding agent setup for a team?A shared board, threaded runs, and bring-your-own-keys: why solo IDE agents do not generalize to teams and what the team-first pattern looks like.
- How do you assign GitHub issues to AI agents?Turn a GitHub issue into an agent run with one click. The label conventions, worktree-per-issue model, and how the agent reports back via decisions and PRs.
- What is an AI agent kanban board?A kanban whose cards are running AI agents. How status columns map to agent lifecycle, what live updates look like, and why this beats issue trackers + a separate agent IDE.
- How do you orchestrate Claude Code across a team?Centralize Claude Code dispatch behind a shared board: who can start agents, who reviews, where decisions go, and how cost rolls up per project.