B10 — Suspect-completion watchdog (close F2 — silent failures) #426

Closed
opened 2026-04-27 07:25:11 +00:00 by claude-desktop · 0 comments
Collaborator

As an orchestrator,
I want to re-dispatch when a task reports success without changing the underlying PR state,
so that silent failures are caught at iteration 1 instead of after the operator notices an hour later.

Last night Sonnet on dev returned done — task completed in 5–30 s for 6+ rebase attempts on PR #423 / #420 without actually rebasing — the agent gave up silently and the orchestrator trusted the success signal.

Acceptance criteria

Detection

  • Capture the PR head sha at task start (when branch_override = dev/N and the task type is rebase or fix-ci).
  • At task end, compare current head sha against captured.
  • If unchanged AND task duration < 30 s AND PR was mergeable = false at start: log [suspect-completion] task <id> completed without sha change — flagging, increment a per-PR silent_failure_count, and re-dispatch the same task type with escalate_after: N (consumed by B11).

Dead-letter path

  • If silent_failure_count >= 3 for the same PR: skip re-dispatch and emit a flow:dead-letter event so the operator dashboard (B15) shows the PR as needing manual intervention.
  • Counter resets on any successful sha change.
  • Counter persists across orchestrator restarts (SQLite, not in-memory).

Tests

  • Unit test: task completes <30 s, sha unchanged, mergeable=false → counter increments + re-dispatch fires.
  • Unit test: task completes <30 s, sha CHANGED → no flag, counter reset.
  • Unit test: task completes >30 s → no flag (legitimate work).
  • Unit test: counter at 3 → no re-dispatch, dead-letter event emitted.

Out of scope

  • The actual escalation logic — covered by B11.
  • Operator UI for clearing dead-letters — covered by B15.

References

  • Spec: docs/specs/automation-hardening.md §4 B10.
  • Existing task-end hook: search apps/server/src/domain/agents/runner.ts for done — task completed.
  • Night-1 incident: 6+ silent completions on #423 / #420 before manual subagent rebase.
**As an** orchestrator, **I want** to re-dispatch when a task reports success without changing the underlying PR state, **so that** silent failures are caught at iteration 1 instead of after the operator notices an hour later. Last night Sonnet on `dev` returned `done — task completed` in 5–30 s for 6+ rebase attempts on PR #423 / #420 without actually rebasing — the agent gave up silently and the orchestrator trusted the success signal. ## Acceptance criteria ### Detection - [ ] Capture the PR head sha at task start (when `branch_override = dev/N` and the task type is `rebase` or `fix-ci`). - [ ] At task end, compare current head sha against captured. - [ ] If unchanged AND task duration < 30 s AND PR was `mergeable = false` at start: log `[suspect-completion] task <id> completed without sha change — flagging`, increment a per-PR `silent_failure_count`, and re-dispatch the same task type with `escalate_after: N` (consumed by B11). ### Dead-letter path - [ ] If `silent_failure_count >= 3` for the same PR: skip re-dispatch and emit a `flow:dead-letter` event so the operator dashboard (B15) shows the PR as needing manual intervention. - [ ] Counter resets on any successful sha change. - [ ] Counter persists across orchestrator restarts (SQLite, not in-memory). ### Tests - [ ] Unit test: task completes <30 s, sha unchanged, mergeable=false → counter increments + re-dispatch fires. - [ ] Unit test: task completes <30 s, sha CHANGED → no flag, counter reset. - [ ] Unit test: task completes >30 s → no flag (legitimate work). - [ ] Unit test: counter at 3 → no re-dispatch, dead-letter event emitted. ## Out of scope - The actual escalation logic — covered by B11. - Operator UI for clearing dead-letters — covered by B15. ## References - Spec: `docs/specs/automation-hardening.md` §4 B10. - Existing task-end hook: search `apps/server/src/domain/agents/runner.ts` for `done — task completed`. - Night-1 incident: 6+ silent completions on #423 / #420 before manual subagent rebase.
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks#426
No description provided.