Can AI agents handle tech debt cleanup?
Tech debt is the textbook fit for parallel coding agents because each unit of work is independent. Renaming a deprecated API call across 60 files is 60 separate jobs with no inter-job dependencies. KanBots was built to dispatch those 60 jobs from one board with the QA autopilot guarding every merge.
The trick is single-file scope and a green-tests gate. The worktree never lands on your real branch without the configured check commands passing, so the failure mode of "agent made a confident change that breaks the build" turns into "worktree sits in the discard pile while you look at it" — not a broken main.
The pattern: many small agents, not one big one
It's tempting to dispatch one agent on "migrate everything from moment to date-fns." Don't. Single-large-task runs blow context budget, lose track of which files they've already done, and produce inconsistent transformations across the codebase. Instead:
- Generate a card per file (a script or a 30-second
productpersona run will do this). - Each card's title is
Migrate moment → date-fns in src/billing/format.ts. - Each agent's prompt is "do this for exactly this file; if you need to touch another file, stop and ask."
- Run with QA autopilot enabled so typecheck and tests run after the change. The worktree is gated.
With 60 cards and parallelism 4, you finish in roughly 15 cycles — at the rate of one agent finishing per cycle, not all four sequentially — because the slots are independent. A typical small refactor runs 4–7 minutes per agent including the test run, so the whole sweep takes 60–90 minutes wall time.
How KanBots wires this up
Open the autopilot modal, pick QA flavor (not Feature-Dev — you already have the cards), and set parallelism. The QA autopilot loop is "dispatch on a card, wait for completion, run check commands in the worktree, if green mark the card review, if red dispatch a fix run on the same card." So a single misfire doesn't tank the run; the autopilot iterates until the check commands pass or the per-card cycle budget hits the cap.
Configure your check commands in .kanbots/config.json. For a typical TypeScript repo:
{
"checks": {
"typecheck": "pnpm typecheck",
"tests": "pnpm test",
"lint": "pnpm lint"
},
"autopilot": {
"qa": {
"max_fix_cycles_per_card": 3,
"session_cost_cap_usd": 50
}
}
}max_fix_cycles_per_card: 3 is the safety net. If an agent's first attempt fails the typecheck, autopilot dispatches a second agent on the same card with the failure output in context. Third strike and the card moves to failed for human eyes — usually because the refactor crossed a boundary the single-file scope wasn't enough for.
Walkthrough — a real tech-debt sweep
- Pick the debt. Say you're removing all uses of a deprecated logger. Run
rg -l "from '@old/logger'" src/to get the file list. - Generate cards. Either script it (
kanbots cards create-from-list files.txtvia the OSS CLI) or dispatch a single product-persona agent with the file list and "create one card per file" prompt. - Open autopilot. Flavor: QA. Parallelism: 4 (or 8 if you're on a big box). Cost cap: $50 for the session.
- Point it at the column with your debt cards. Click Start.
- Watch the board. Cards flip
running→reviewas their typecheck passes.failedcards are the ones to look at first — they're where the refactor leaked across files. - When the session halts, you have a column of
reviewcards. Each has a worktree and a one-line diff summary. Promote them as a batch: either land them all as individual commits, or open one mega-PR.
Failure modes and fixes
The agent edits files outside its scope
Symptom: an agent dispatched on src/billing/format.ts also edits src/billing/types.ts because a type changed. Now two agents touching the same module are racing each other in parallel slots. Fix: the engineer persona prompt for tech debt should be tight — "edit only the file named in the card title; if a type or interface elsewhere needs to change, stop and post a decision request." When the decision lands, you batch the type changes into a parent card the autopilot picks up first.
The test suite is too slow for parallelism
Symptom: 4 agents running typecheck simultaneously thrash CPU and each typecheck takes 3x longer than baseline. Fix: in .kanbots/config.json, set "checks.typecheck": "pnpm typecheck --skipLibCheck --incremental" or scope to changed files only. If your suite simply doesn't scale to parallel slots, drop parallelism to 2 and accept the longer wall time. The math still works out compared to single-threaded.
The check commands pass but the code is wrong
Symptom: typecheck and tests are green but the refactor changed observable behavior. Fix: this is why Promote → draft PR exists. Don't auto-land; review the diff. For high-stakes refactors, add an e2e check command that runs Playwright/Cypress; the QA autopilot will run it as part of the gate.
When this is the wrong tool
The pattern requires single-file scope. If your refactor crosses module boundaries — moving a function from one package to another, renaming a public API used in three apps, changing a database schema — single-file scope can't express the change. The agent will either refuse, or worse, half-do the change in a way that the typecheck doesn't catch.
Use a different shape for those: one careful agent on Heavy effort with the entire affected scope in its prompt, not a swarm of small ones. Or split the work manually into a sequence of single-file changes that can be parallelized.
Performance refactors are also dicey. The agent can verify "tests still pass" but not "this code is now faster." Benchmarks need a human or a custom check command wired into the QA gate.
See also: the safety patterns for running parallel agents without clobber and machine-level guidance on how many slots your laptop can really run.
Try it on your own folder
Drop a folder, get a board, dispatch parallel agents. The desktop runs locally on macOS, Linux, and Windows.
Related questions
- How do you automate backlog triage with AI agents?Point an agent at a column of unrefined tickets and let it split, estimate, label, and propose owners. The exact persona setup for triage autopilot.
- Can you use an AI agent to triage Sentry issues?Auto-import Sentry error groups onto a kanban board, hand each to an agent for root-cause analysis, and promote the fix as a PR. End-to-end walkthrough.
- How do you build an AI agent PR review workflow?A reviewer persona reads the diff, runs the tests in its own worktree, and posts a structured verdict. How to wire it into your existing GitHub flow.
- How do you run AI agents in the background on GitHub issues?Hand a folder of issues to autopilot and walk away. How sub-issue splitting, slot parallelism, and self-checking keep the work moving without your eyes.