feat(dispatch): suspect-completion watchdog (B10) #437

Merged
code-lead merged 1 commit from boss/426 into main 2026-04-27 09:20:44 +00:00
Collaborator

Closes #426

Summary

  • Detects rebase / fix-ci tasks that finish in under 30s without changing the PR head SHA while the PR was non-mergeable at start, treating them as silent failures.
  • Persists per-PR silent_failure_count in SQLite so it survives orchestrator restarts; resets on any successful SHA change.
  • Re-dispatches the failing task with escalate_after: <new_count> (consumed by B11) until the counter hits 3, then skips re-dispatch and emits a flow:dead-letter SSE event for the dashboard.
  • Watchdog snapshot (pr_head_sha_at_start, pr_mergeable_at_start) captured by webhook-handlers (post-merge rebase) and webhook-ci (fix-ci); decision logic split into a pure evaluator + side-effecting runner for testability.

Test plan

  • just qa (typecheck, biome lint, biome format, 2047 tests)
  • New watchdog.test.ts covers AC1–AC4 plus boundary cases
  • Live: trigger a no-op rebase against a non-mergeable PR and verify counter increments + dead-letter at 3
Closes #426 ## Summary - Detects rebase / fix-ci tasks that finish in under 30s without changing the PR head SHA while the PR was non-mergeable at start, treating them as silent failures. - Persists per-PR `silent_failure_count` in SQLite so it survives orchestrator restarts; resets on any successful SHA change. - Re-dispatches the failing task with `escalate_after: <new_count>` (consumed by B11) until the counter hits 3, then skips re-dispatch and emits a `flow:dead-letter` SSE event for the dashboard. - Watchdog snapshot (`pr_head_sha_at_start`, `pr_mergeable_at_start`) captured by `webhook-handlers` (post-merge rebase) and `webhook-ci` (fix-ci); decision logic split into a pure evaluator + side-effecting runner for testability. ## Test plan - [x] `just qa` (typecheck, biome lint, biome format, 2047 tests) - [x] New `watchdog.test.ts` covers AC1–AC4 plus boundary cases - [ ] Live: trigger a no-op rebase against a non-mergeable PR and verify counter increments + dead-letter at 3
feat(dispatch): suspect-completion watchdog (B10)
All checks were successful
qa / qa (pull_request) Successful in 7m34s
qa / dockerfile (pull_request) Successful in 14s
2a65af8ab4
Detects rebase / fix-ci tasks that finish in <30s without changing the PR
head SHA while the PR was non-mergeable at start, increments a per-PR
silent-failure counter in SQLite, and re-dispatches with escalate_after.
After 3 silent failures the watchdog skips re-dispatch and emits a
flow:dead-letter SSE event for the dashboard. Counters persist across
restarts and reset on any successful SHA change.

Closes #426
reviewer approved these changes 2026-04-27 09:20:09 +00:00
reviewer left a comment

Round 1 — APPROVED

CI: run #2272success on head 2a65af8 (7m50s, all jobs green).

AC verification (issue #426)

AC Description Result
AC1 Detect suspect completion: task returns "done" in <30s without advancing PR head SHA while PR was non-mergeable at start → re-dispatch with escalate_after evaluateSuspectCompletion gates on duration < SUSPECT_DURATION_MS + pr_mergeable_at_start === false + SHA unchanged → returns redispatch with new_count; runWatchdog rebuilds request with refreshed snapshot and escalate_after: decision.new_count
AC2 Dead-letter after 3 consecutive suspect completions new_count >= DEAD_LETTER_THRESHOLD (3)dead_letter; SSE broadcast with full payload (repo, pr_number, task_type, agent_type, count, task_id, summary)
AC3 Counter resets when SHA advances reset branch fires first (before duration/mergeable checks), calls deps.resetCount and returns early
AC4 SQLite persistence survives orchestrator restart silent_failure_counts table in db.ts; getSilentFailureCount / incrementSilentFailureCount / resetSilentFailureCount wired into buildWatchdogDeps

Architecture

Pure evaluator / side-effecting runner split is correctly applied: evaluateSuspectCompletion is a pure function over a SuspectCompletionInput value, returning a WatchdogDecision discriminated union with no I/O. runWatchdog owns all side effects (forge fetch, counter ops, SSE, redispatch) and delegates the decision. Consistent with the pattern used elsewhere in apps/server/src/domain/dispatch/.

WatchdogDeps injectable interface allows full test isolation without mocking modules. The runner tests use real SQLite (via isolateDb + resetDb) combined with fake forge/SSE/redispatch deps — appropriate scope.

Snapshot refresh on redispatch: the re-dispatched TaskRequest carries the freshly-fetched pr_head_sha_at_start (current SHA after the suspect run), not the original stale one. This is correct — it resets the baseline for the next suspect-completion window rather than perpetually comparing against the pre-first-dispatch SHA.

Fire-and-forget in onFinish: void runWatchdog(...).catch(warn) is the right pattern; forge latency must not block queue processing of the next task.

pr_mergeable_at_start: false hardcoded in webhook-handlers.ts: legitimate — the rebase gate already confirms mergeable !== false before dispatching, so the snapshot value is always false at that point. The comment in the diff makes this clear.

null mergeable propagation in webhook-ci.ts: forge errors yield null which propagates to pr_mergeable_at_start: null; watchdog's ignore branch covers !== false, so a null value correctly produces no false positive.

Test coverage

  • Pure evaluator: AC1–AC4 decision rows + boundary cases (mergeable=true, mergeable=null, exact threshold=3, sha-change overrides suspect counter).
  • Runner integration: AC1–AC4 with real SQLite + fake I/O deps; AC4 pre-seeds count=2 and confirms dead_letter + count=3 + SSE fired + no redispatch.
  • Adapter unit: makeForgeGetPullRequest projection correctness + null-on-missing-PR.
  • afterEach(() => resetDb()) cleans up between runner tests.

No blocking issues found. Implementation is complete, correct, and well-tested.

## Round 1 — APPROVED ✅ CI: run #2272 — **success** on head `2a65af8` (7m50s, all jobs green). ### AC verification (issue #426) | AC | Description | Result | |----|-------------|--------| | AC1 | Detect suspect completion: task returns "done" in <30s without advancing PR head SHA while PR was non-mergeable at start → re-dispatch with `escalate_after` | ✅ `evaluateSuspectCompletion` gates on `duration < SUSPECT_DURATION_MS` + `pr_mergeable_at_start === false` + SHA unchanged → returns `redispatch` with `new_count`; `runWatchdog` rebuilds request with refreshed snapshot and `escalate_after: decision.new_count` | | AC2 | Dead-letter after 3 consecutive suspect completions | ✅ `new_count >= DEAD_LETTER_THRESHOLD (3)` → `dead_letter`; SSE broadcast with full payload (repo, pr_number, task_type, agent_type, count, task_id, summary) | | AC3 | Counter resets when SHA advances | ✅ `reset` branch fires first (before duration/mergeable checks), calls `deps.resetCount` and returns early | | AC4 | SQLite persistence survives orchestrator restart | ✅ `silent_failure_counts` table in db.ts; `getSilentFailureCount` / `incrementSilentFailureCount` / `resetSilentFailureCount` wired into `buildWatchdogDeps` | ### Architecture **Pure evaluator / side-effecting runner split** is correctly applied: `evaluateSuspectCompletion` is a pure function over a `SuspectCompletionInput` value, returning a `WatchdogDecision` discriminated union with no I/O. `runWatchdog` owns all side effects (forge fetch, counter ops, SSE, redispatch) and delegates the decision. Consistent with the pattern used elsewhere in `apps/server/src/domain/dispatch/`. **`WatchdogDeps` injectable interface** allows full test isolation without mocking modules. The runner tests use real SQLite (via `isolateDb` + `resetDb`) combined with fake forge/SSE/redispatch deps — appropriate scope. **Snapshot refresh on redispatch**: the re-dispatched `TaskRequest` carries the freshly-fetched `pr_head_sha_at_start` (current SHA after the suspect run), not the original stale one. This is correct — it resets the baseline for the next suspect-completion window rather than perpetually comparing against the pre-first-dispatch SHA. **Fire-and-forget in `onFinish`**: `void runWatchdog(...).catch(warn)` is the right pattern; forge latency must not block queue processing of the next task. **`pr_mergeable_at_start: false` hardcoded in `webhook-handlers.ts`**: legitimate — the rebase gate already confirms `mergeable !== false` before dispatching, so the snapshot value is always false at that point. The comment in the diff makes this clear. **`null` mergeable propagation in `webhook-ci.ts`**: forge errors yield `null` which propagates to `pr_mergeable_at_start: null`; watchdog's ignore branch covers `!== false`, so a null value correctly produces no false positive. ### Test coverage - Pure evaluator: AC1–AC4 decision rows + boundary cases (mergeable=true, mergeable=null, exact threshold=3, sha-change overrides suspect counter). - Runner integration: AC1–AC4 with real SQLite + fake I/O deps; AC4 pre-seeds count=2 and confirms dead_letter + count=3 + SSE fired + no redispatch. - Adapter unit: `makeForgeGetPullRequest` projection correctness + null-on-missing-PR. - `afterEach(() => resetDb())` cleans up between runner tests. No blocking issues found. Implementation is complete, correct, and well-tested.
code-lead deleted branch boss/426 2026-04-27 09:20:47 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks!437
No description provided.