feat(dispatch): suspect-completion watchdog (B10) #437
No reviewers
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks!437
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "boss/426"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #426
Summary
silent_failure_countin SQLite so it survives orchestrator restarts; resets on any successful SHA change.escalate_after: <new_count>(consumed by B11) until the counter hits 3, then skips re-dispatch and emits aflow:dead-letterSSE event for the dashboard.pr_head_sha_at_start,pr_mergeable_at_start) captured bywebhook-handlers(post-merge rebase) andwebhook-ci(fix-ci); decision logic split into a pure evaluator + side-effecting runner for testability.Test plan
just qa(typecheck, biome lint, biome format, 2047 tests)watchdog.test.tscovers AC1–AC4 plus boundary casesRound 1 — APPROVED ✅
CI: run #2272 — success on head
2a65af8(7m50s, all jobs green).AC verification (issue #426)
escalate_afterevaluateSuspectCompletiongates onduration < SUSPECT_DURATION_MS+pr_mergeable_at_start === false+ SHA unchanged → returnsredispatchwithnew_count;runWatchdogrebuilds request with refreshed snapshot andescalate_after: decision.new_countnew_count >= DEAD_LETTER_THRESHOLD (3)→dead_letter; SSE broadcast with full payload (repo, pr_number, task_type, agent_type, count, task_id, summary)resetbranch fires first (before duration/mergeable checks), callsdeps.resetCountand returns earlysilent_failure_countstable in db.ts;getSilentFailureCount/incrementSilentFailureCount/resetSilentFailureCountwired intobuildWatchdogDepsArchitecture
Pure evaluator / side-effecting runner split is correctly applied:
evaluateSuspectCompletionis a pure function over aSuspectCompletionInputvalue, returning aWatchdogDecisiondiscriminated union with no I/O.runWatchdogowns all side effects (forge fetch, counter ops, SSE, redispatch) and delegates the decision. Consistent with the pattern used elsewhere inapps/server/src/domain/dispatch/.WatchdogDepsinjectable interface allows full test isolation without mocking modules. The runner tests use real SQLite (viaisolateDb+resetDb) combined with fake forge/SSE/redispatch deps — appropriate scope.Snapshot refresh on redispatch: the re-dispatched
TaskRequestcarries the freshly-fetchedpr_head_sha_at_start(current SHA after the suspect run), not the original stale one. This is correct — it resets the baseline for the next suspect-completion window rather than perpetually comparing against the pre-first-dispatch SHA.Fire-and-forget in
onFinish:void runWatchdog(...).catch(warn)is the right pattern; forge latency must not block queue processing of the next task.pr_mergeable_at_start: falsehardcoded inwebhook-handlers.ts: legitimate — the rebase gate already confirmsmergeable !== falsebefore dispatching, so the snapshot value is always false at that point. The comment in the diff makes this clear.nullmergeable propagation inwebhook-ci.ts: forge errors yieldnullwhich propagates topr_mergeable_at_start: null; watchdog's ignore branch covers!== false, so a null value correctly produces no false positive.Test coverage
makeForgeGetPullRequestprojection correctness + null-on-missing-PR.afterEach(() => resetDb())cleans up between runner tests.No blocking issues found. Implementation is complete, correct, and well-tested.