How do you build an AI agent task queue?

Build it as four columns — Backlog, In progress, Review, Done — with a dispatcher that pulls the top card off Backlog whenever a slot frees. Hard caps that matter: parallel slots (max 4 in kanbots autopilot), per-run cost budget, and per-session cost budget. Decision prompts pause a slot until a human (or a programmatic answerer) clicks an option.

Download desktopmacOS · Linux · Windows See cloud for teams

What an agent task queue actually is

A request/response chatbot has no queue. You send a message, you wait for the answer, you go again. A “fire-and-forget” cron has the opposite problem: tasks start, you have no idea what they’re doing, and the results land in a log file you won’t read.

An agent task queue sits in between. It has persistent units of work (cards), a dispatcher that allocates parallelism, a place for the agent to ask the human a question without losing context, and budgets that fire before runaway spend becomes catastrophic. The whole thing is observable from a single board.

The pieces, named

Cards. Persistent units of work. In kanbots a card is an issue (local or GitHub). It carries title, body, labels, status, and a child list of runs.
Columns. The state of the work: Backlog, In progress, Review, Done. The drag order inside a column is the queue order.
Runs. A specific attempt by an agent to make progress on a card. One card can have many runs over its life (failed runs, follow-up runs, autopilot child runs).
Slots. The unit of parallelism. Autopilot has up to MAX_PARALLELISM = 4 slots. Each slot can run one agent at a time.
Decisions. A pause point inside a run where the agent asks a human a question. Stored in the decisionstable with a deadline; default 30-min timeout, configurable 60s–24h.
Budgets. runCostBudgetUsd caps a single run. sessionCostBudgetUsd caps an autopilot session. Both are set per-workspace in .kanbots/config.json or via the Cost budgets UI.

How the dispatcher picks the next task

For a one-off dispatch, you decide: click a card, click Dispatch. For autopilot — the queue-shaped mode — the loop is:

A free slot polls for work. In feature-dev autopilot, the slot atomically claims the next persona from a round-robin counter (cycle_index) and dispatches the parent issue against that persona’s system prompt. In one-off mode, the dispatcher takes the top card in Backlog and moves it to In progress.
A worktree is created at .kanbots/worktrees/issue-N-runId/. The agent CLI is spawned (claude -p or codex exec) with a stream-JSON output flag. The slot now belongs to that run.
Stream events flow into the agent_events table. The UI subscribes and renders them live. If a decision event arrives, the run pauses on the card and the slot remains held (the agent process is alive, waiting on stdin) until the decision resolves.
On result (or stop/cancel/timeout), the slot frees. The dispatcher picks again.

Slots are held during decision pauses by design. The agent process is still alive on the developer’s machine; killing the slot would mean re-spawning Claude later and re-priming the context. The cost is that a stuck decision blocks one of your four slots until the deadline.

Cost caps, in order of severity

Three independent caps protect against runaway spend. Each one stops execution at its own scope.

Per-run. runCostBudgetUsd is tracked by the dispatcher as result events come in. When cumulative cost exceeds the cap, the run stops with stopReason: 'cost-budget'. The slot frees. The card stays in In progress; you decide whether to re-run.
Per-autopilot-session. sessionCostBudgetUsdrolls up all child runs in an autopilot session. When the session total breaches the cap, the orchestrator stops launching new child runs; in-flight children finish their current iteration but aren’t replaced. A single button on the session header stops the whole tree.
Cloud event ceiling. On kanbots Cloud, runs that exceed 50,000 stream events trip a server-side ceiling (configurable per-org up to 500,000). The cloud sends a force_terminatemessage; the agent exits. This exists to catch infinite-loop pathologies that didn’t accumulate cost fast enough to trip the run budget.

What decisions do to the queue

A decision is a deliberate stall. The agent emits a stream event:

{ "type": "decision_request",
  "payload": {
    "decision_id": "dec_01HX...",
    "question": "The Stripe webhook handler has no retry logic. Add one?",
    "options": [
      { "value": "a", "label": "Yes, exponential backoff" },
      { "value": "b", "label": "Yes, respect Retry-After" },
      { "value": "c", "label": "No, defer to follow-up issue" }
    ],
    "timeout_seconds": 1800
  } }

The dispatcher writes a row into the decisions table with state: open, deadline_at = opened_at + timeout_seconds, and a nullable default_value. The card surfaces the question in the run drawer with numbered buttons. The agent process blocks waiting on stdin.

When a permitted answerer (the assignee, an org admin, or anyone per decisions.answerers policy) clicks an option, the API runs a conditional update:

UPDATE decisions
SET state = 'answered',
    answered_at = now(),
    answered_by_user_id = $1,
    answered_value = $2
WHERE id = $3 AND state = 'open'
RETURNING *;

The conditional WHERE state = 'open'serializes simultaneous answers without an advisory lock — the second clicker gets a 409 with the winning answer. The chosen value is written to the agent’s stdin; the run resumes; the slot stays held throughout.

Programmatic answerers work too: a CI script with a bearer token can subscribe via the agent SDK and auto-answer low-risk decisions. This is how you build “always allow option A on decisions tagged as risk_level: low” without a human in the loop.

What happens when the queue empties

Autopilot doesn’t loop forever in the absence of work. The stop conditions are:

A persona signals completion.
The session cost budget hits.
The user clicks Stop session.
Every persona has cycled and reported no new work to do (the backlog has converged).

Feature-dev autopilot is interesting because personas can add work via splitIssue— an agent spotting a missing test creates a child card on the board, and a later cycle picks it up. The queue can grow under its own steam. Your only knob to keep it bounded is the session cost budget.

Failure modes and fixes

“Slots are held by stuck decisions.”

If three of your four slots are paused on decisions nobody clicked, throughput drops to one. Fix the upstream cause: lower the decision timeout (so dead runs free their slots faster), or set a default_value on the decision template so the cloud auto-answers after the deadline. The Vercel cron job runs every 30s and times out decisions whose deadline has passed; if the default is set, the agent gets the default; if not, the run fails with reason: decision_timeout.

“A run hit the cost cap but the card stayed open.”

Intentional. Budget breaches stop the run, free the slot, and leave the card alone — the worktree is still on disk with whatever partial work the agent did. Decide: promote what you’ve got, dispatch a fresh run with a higher cap, or discard. If you want auto-discard on budget breach, that’s a feature request, not a config.

“Rate limits kill throughput.”

Both Claude and Codex stream a rate_limit event when the provider pushes back. The dispatcher broadcasts cooldown:changedwith the resume time; new dispatches queue until the cooldown clears. If you’re seeing this often, the fix is fewer parallel slots, not more retries — the provider is telling you it can’t take more right now.

When this is the wrong fit

If your tasks are sub-minute (autocomplete, single-file edits in the editor), a queue is overhead. The chat is the better surface: ask, accept, move on. The queue earns its keep when tasks run for 5–60 minutes, want a worktree of their own, and need a place to ask the human a question without losing context.

Also: if you don’t want budgets, decisions, or observability — if you just want a script that loops Claude on a backlog and you trust it — a bash script with xargs -P 4 will do. The queue exists to make the loop safe and team-shaped, not to replace simple things.

Try it on your own folder

Drop a folder, get a board, dispatch parallel agents. The desktop runs locally on macOS, Linux, and Windows.

Download desktopmacOS · Linux · Windows Getting started docs