Concurrent same-type dispatch on one issue: 30 s dedup TTL too short #597
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks#597
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Same-type pool members can run concurrently on a single issue when the second dispatch arrives more than 30 s after the first. Observed on issue #596 (M26-4 lifecycle config): four
bosstask rows for the same issue, including two windows of ~1m30s whereboss-defaultandboss-2were both running.Filesystem dedup masked the visible damage (worktree path is keyed by issue → last-writer wins → only one branch
boss/591+ PR #596 survived), so all four task rows are markedsuccessand no error trail exists. Token cost paid ~4× for one logical task: 38 + 40 + 23 + 23 = 124 boss turns.Reproduction (observed, 2026-04-30)
tasks.db::task_historyfor issue #596:Board screenshot at ~22:06 showed both
boss-default(53 s) andboss-2(4 m) cards on issue #596 in therunningcolumn.Root cause
apps/server/src/domain/dispatch/assign-dedup.ts:_assignDedupkeyed by${repo}#${issue}@${type}with a fixed 30 s TTL (line 23).isDupAssign(...)checks the timestamp; once 30 s elapse the key is gone and a fresh dispatch slips through, regardless of whether the prior task is still running.cancelStaleIssueTask(...)only cancels queued tasks, and only whenprev.type !== newType(line 68if (prev.type === newType) return;). Same-type concurrent runs are unhandled.domain/dispatch/pool.ts) then picks the idle peer (boss-defaultwhenboss-2is busy) → second concurrent task lands cleanly.Acceptance criteria
Source of truth = registry, not a clock
isDupAssign(repo, issueNumber, type)returns true whenever any worker oftypehas acurrentTaskor queued task tagged with${repo}#${issueNumber}— independent of how long ago the prior dispatch landed.repo,issueNumber) on the worker's task record so the check is O(workers) without a separate index.BoundedMapTTL bookkeeping (or keep purely as a fast-path; the registry walk is authoritative).Cross-type stale cancellation
cancelStaleIssueTaskkeeps its current behaviour (different-type re-dispatches cancel a queued prior task) but log line at "is already running" path also surfaces adispatch.duplicate_runningevent so the operator sees the warning on the dashboard, not just stdout.Defense-in-depth: board warning
(repo, issue_number)withstatus === "running", render a single merged card with arunning × Nindicator + warning chip. (Implementation lands behind the dedup fix so it should never trigger in normal operation.)Tests
isDupAssignreturns true while a prior task is running, even after 60 s simulated wall-clock.isDupAssignreturns false once the prior task completes (registry no longer holds it).assignedwebhooks for same issue 5 s apart → first dispatches, second is suppressed (covered today). Two webhooks 60 s apart → second is also suppressed (regression; fails today).task_historyfor any historical concurrent runs (overlappingstarted_at/finished_atfor same(repo, issue_number, agent_type)) and emit a one-shot count to verify scope.Out of scope
docker startadds latency.References
apps/server/src/domain/dispatch/assign-dedup.ts— current TTL implementation.apps/server/src/domain/dispatch/pool.ts— selector that grants the duplicate.apps/server/src/domain/dispatch/registry.ts— workercurrentTask+queuefields.