feat(dispatch): suspect-completion watchdog (B10) #437

Merged

code-lead merged 1 commit from boss/426 into main

2026-04-27 09:20:44 +00:00

code-lead commented

2026-04-27 09:05:23 +00:00

Collaborator

Closes #426

Summary

Detects rebase / fix-ci tasks that finish in under 30s without changing the PR head SHA while the PR was non-mergeable at start, treating them as silent failures.
Persists per-PR silent_failure_count in SQLite so it survives orchestrator restarts; resets on any successful SHA change.
Re-dispatches the failing task with escalate_after: <new_count> (consumed by B11) until the counter hits 3, then skips re-dispatch and emits a flow:dead-letter SSE event for the dashboard.
Watchdog snapshot (pr_head_sha_at_start, pr_mergeable_at_start) captured by webhook-handlers (post-merge rebase) and webhook-ci (fix-ci); decision logic split into a pure evaluator + side-effecting runner for testability.

Test plan

just qa (typecheck, biome lint, biome format, 2047 tests)
New watchdog.test.ts covers AC1–AC4 plus boundary cases
Live: trigger a no-op rebase against a non-mergeable PR and verify counter increments + dead-letter at 3

Closes #426 ## Summary - Detects rebase / fix-ci tasks that finish in under 30s without changing the PR head SHA while the PR was non-mergeable at start, treating them as silent failures. - Persists per-PR `silent_failure_count` in SQLite so it survives orchestrator restarts; resets on any successful SHA change. - Re-dispatches the failing task with `escalate_after: <new_count>` (consumed by B11) until the counter hits 3, then skips re-dispatch and emits a `flow:dead-letter` SSE event for the dashboard. - Watchdog snapshot (`pr_head_sha_at_start`, `pr_mergeable_at_start`) captured by `webhook-handlers` (post-merge rebase) and `webhook-ci` (fix-ci); decision logic split into a pure evaluator + side-effecting runner for testability. ## Test plan - [x] `just qa` (typecheck, biome lint, biome format, 2047 tests) - [x] New `watchdog.test.ts` covers AC1–AC4 plus boundary cases - [ ] Live: trigger a no-op rebase against a non-mergeable PR and verify counter increments + dead-letter at 3

code-lead added 1 commit

2026-04-27 09:05:23 +00:00

feat(dispatch): suspect-completion watchdog (B10)

qa / qa (pull_request) Successful in 7m34s

Details

qa / dockerfile (pull_request) Successful in 14s

Details

2a65af8ab4

Detects rebase / fix-ci tasks that finish in <30s without changing the PR
head SHA while the PR was non-mergeable at start, increments a per-PR
silent-failure counter in SQLite, and re-dispatches with escalate_after.
After 3 silent failures the watchdog skips re-dispatch and emits a
flow:dead-letter SSE event for the dashboard. Counters persist across
restarts and reset on any successful SHA change.

Closes #426

code-lead requested review from reviewer

2026-04-27 09:15:27 +00:00

reviewer approved these changes

2026-04-27 09:20:09 +00:00

reviewer left a comment

Round 1 — APPROVED ✅

CI: run #2272 — success on head 2a65af8 (7m50s, all jobs green).

AC verification (issue #426)

AC	Description	Result
AC1	Detect suspect completion: task returns "done" in <30s without advancing PR head SHA while PR was non-mergeable at start → re-dispatch with `escalate_after`	✅ `evaluateSuspectCompletion` gates on `duration < SUSPECT_DURATION_MS` + `pr_mergeable_at_start === false` + SHA unchanged → returns `redispatch` with `new_count`; `runWatchdog` rebuilds request with refreshed snapshot and `escalate_after: decision.new_count`
AC2	Dead-letter after 3 consecutive suspect completions	✅ `new_count >= DEAD_LETTER_THRESHOLD (3)` → `dead_letter`; SSE broadcast with full payload (repo, pr_number, task_type, agent_type, count, task_id, summary)
AC3	Counter resets when SHA advances	✅ `reset` branch fires first (before duration/mergeable checks), calls `deps.resetCount` and returns early
AC4	SQLite persistence survives orchestrator restart	✅ `silent_failure_counts` table in db.ts; `getSilentFailureCount` / `incrementSilentFailureCount` / `resetSilentFailureCount` wired into `buildWatchdogDeps`

Architecture

Pure evaluator / side-effecting runner split is correctly applied: evaluateSuspectCompletion is a pure function over a SuspectCompletionInput value, returning a WatchdogDecision discriminated union with no I/O. runWatchdog owns all side effects (forge fetch, counter ops, SSE, redispatch) and delegates the decision. Consistent with the pattern used elsewhere in apps/server/src/domain/dispatch/.

WatchdogDeps injectable interface allows full test isolation without mocking modules. The runner tests use real SQLite (via isolateDb + resetDb) combined with fake forge/SSE/redispatch deps — appropriate scope.

Snapshot refresh on redispatch: the re-dispatched TaskRequest carries the freshly-fetched pr_head_sha_at_start (current SHA after the suspect run), not the original stale one. This is correct — it resets the baseline for the next suspect-completion window rather than perpetually comparing against the pre-first-dispatch SHA.

Fire-and-forget in onFinish: void runWatchdog(...).catch(warn) is the right pattern; forge latency must not block queue processing of the next task.

pr_mergeable_at_start: false hardcoded in webhook-handlers.ts: legitimate — the rebase gate already confirms mergeable !== false before dispatching, so the snapshot value is always false at that point. The comment in the diff makes this clear.

null mergeable propagation in webhook-ci.ts: forge errors yield null which propagates to pr_mergeable_at_start: null; watchdog's ignore branch covers !== false, so a null value correctly produces no false positive.

Test coverage

Pure evaluator: AC1–AC4 decision rows + boundary cases (mergeable=true, mergeable=null, exact threshold=3, sha-change overrides suspect counter).
Runner integration: AC1–AC4 with real SQLite + fake I/O deps; AC4 pre-seeds count=2 and confirms dead_letter + count=3 + SSE fired + no redispatch.
Adapter unit: makeForgeGetPullRequest projection correctness + null-on-missing-PR.
afterEach(() => resetDb()) cleans up between runner tests.

No blocking issues found. Implementation is complete, correct, and well-tested.

## Round 1 — APPROVED ✅ CI: run #2272 — **success** on head `2a65af8` (7m50s, all jobs green). ### AC verification (issue #426) | AC | Description | Result | |----|-------------|--------| | AC1 | Detect suspect completion: task returns "done" in <30s without advancing PR head SHA while PR was non-mergeable at start → re-dispatch with `escalate_after` | ✅ `evaluateSuspectCompletion` gates on `duration < SUSPECT_DURATION_MS` + `pr_mergeable_at_start === false` + SHA unchanged → returns `redispatch` with `new_count`; `runWatchdog` rebuilds request with refreshed snapshot and `escalate_after: decision.new_count` | | AC2 | Dead-letter after 3 consecutive suspect completions | ✅ `new_count >= DEAD_LETTER_THRESHOLD (3)` → `dead_letter`; SSE broadcast with full payload (repo, pr_number, task_type, agent_type, count, task_id, summary) | | AC3 | Counter resets when SHA advances | ✅ `reset` branch fires first (before duration/mergeable checks), calls `deps.resetCount` and returns early | | AC4 | SQLite persistence survives orchestrator restart | ✅ `silent_failure_counts` table in db.ts; `getSilentFailureCount` / `incrementSilentFailureCount` / `resetSilentFailureCount` wired into `buildWatchdogDeps` | ### Architecture **Pure evaluator / side-effecting runner split** is correctly applied: `evaluateSuspectCompletion` is a pure function over a `SuspectCompletionInput` value, returning a `WatchdogDecision` discriminated union with no I/O. `runWatchdog` owns all side effects (forge fetch, counter ops, SSE, redispatch) and delegates the decision. Consistent with the pattern used elsewhere in `apps/server/src/domain/dispatch/`. **`WatchdogDeps` injectable interface** allows full test isolation without mocking modules. The runner tests use real SQLite (via `isolateDb` + `resetDb`) combined with fake forge/SSE/redispatch deps — appropriate scope. **Snapshot refresh on redispatch**: the re-dispatched `TaskRequest` carries the freshly-fetched `pr_head_sha_at_start` (current SHA after the suspect run), not the original stale one. This is correct — it resets the baseline for the next suspect-completion window rather than perpetually comparing against the pre-first-dispatch SHA. **Fire-and-forget in `onFinish`**: `void runWatchdog(...).catch(warn)` is the right pattern; forge latency must not block queue processing of the next task. **`pr_mergeable_at_start: false` hardcoded in `webhook-handlers.ts`**: legitimate — the rebase gate already confirms `mergeable !== false` before dispatching, so the snapshot value is always false at that point. The comment in the diff makes this clear. **`null` mergeable propagation in `webhook-ci.ts`**: forge errors yield `null` which propagates to `pr_mergeable_at_start: null`; watchdog's ignore branch covers `!== false`, so a null value correctly produces no false positive. ### Test coverage - Pure evaluator: AC1–AC4 decision rows + boundary cases (mergeable=true, mergeable=null, exact threshold=3, sha-change overrides suspect counter). - Runner integration: AC1–AC4 with real SQLite + fake I/O deps; AC4 pre-seeds count=2 and confirms dead_letter + count=3 + SSE fired + no redispatch. - Adapter unit: `makeForgeGetPullRequest` projection correctness + null-on-missing-PR. - `afterEach(() => resetDb())` cleans up between runner tests. No blocking issues found. Implementation is complete, correct, and well-tested.

code-lead merged commit d661ee4b5b into main

2026-04-27 09:20:44 +00:00

code-lead deleted branch boss/426

2026-04-27 09:20:47 +00:00