B15 — Watchdog dashboard tile (operator visibility for B10/B12/B14) #431

Closed
opened 2026-04-27 07:26:17 +00:00 by claude-desktop · 0 comments
Collaborator

As an operator,
I want a single panel listing PRs the orchestrator considers "needs human attention" (B10's dead-letter, B12's stash branches, B14's metric spikes),
so that I can see at a glance what to fix without grepping logs.

The hardening features (B10/B12/B14) are silent by themselves — they auto-recover but the operator has no way to see what's been recovered or what's stuck. This story surfaces it.

Acceptance criteria

Backend

  • New GET /watchdog/status endpoint returning:
    json { "dead_letter_prs": [{ "number": 420, "repo": "charles/claude-hooks", "reason": "B10:silent_failure_count>=3", "since_ts": "..." }], "worktree_recoveries": [{ "path": "...", "stash_branch": "worktree-recovery/abc1234", "ts": "..." }], "session_resume_failures_24h": 12, "escalations_today": 3 }
  • Auth-gated like the rest of the API.

Frontend

  • Dashboard panel on /agents (or /planner/board toolbar — operator's choice during impl).
  • Stacked card with three rows: dead-letter PRs, worktree recoveries (last 24 h), resume failures + escalation count.
  • Each row clickable: jump to PR (Forgejo), view stash branch (git command), or open metrics view.
  • Empty state: 🟢 Orchestrator self-healed everything with a link to last 24 h history.

Operator actions

  • Per dead-letter PR: button Retry (re-dispatch with fresh counter) and Snooze 24h (suppress watchdog for that PR for 24 h).
  • Per stash branch: button Inspect (shows the diff in a drawer) and Drop (force-deletes the branch).

Tests

  • Backend test: endpoint returns expected shape with seeded data.
  • Frontend test: empty state renders; populated state renders all 3 rows; Retry button POSTs to /watchdog/retry/<pr>.

Out of scope

  • Real-time streaming via SSE (poll on 30 s is fine).
  • Historical analytics beyond 24 h.

References

  • Spec: docs/specs/automation-hardening.md §4 B15.
  • B10 dead-letter event: flow:dead-letter.
  • B12 stash branches: worktree-recovery/<sha>.
  • B14 metric: claude_session_resume_failures_total.

Dependencies

  • Land after B10, B12, B14 — visualises their state.
**As an** operator, **I want** a single panel listing PRs the orchestrator considers "needs human attention" (B10's dead-letter, B12's stash branches, B14's metric spikes), **so that** I can see at a glance what to fix without grepping logs. The hardening features (B10/B12/B14) are silent by themselves — they auto-recover but the operator has no way to see what's been recovered or what's stuck. This story surfaces it. ## Acceptance criteria ### Backend - [ ] New `GET /watchdog/status` endpoint returning: ```json { "dead_letter_prs": [{ "number": 420, "repo": "charles/claude-hooks", "reason": "B10:silent_failure_count>=3", "since_ts": "..." }], "worktree_recoveries": [{ "path": "...", "stash_branch": "worktree-recovery/abc1234", "ts": "..." }], "session_resume_failures_24h": 12, "escalations_today": 3 } ``` - [ ] Auth-gated like the rest of the API. ### Frontend - [ ] Dashboard panel on `/agents` (or `/planner/board` toolbar — operator's choice during impl). - [ ] Stacked card with three rows: dead-letter PRs, worktree recoveries (last 24 h), resume failures + escalation count. - [ ] Each row clickable: jump to PR (Forgejo), view stash branch (git command), or open metrics view. - [ ] Empty state: `🟢 Orchestrator self-healed everything` with a link to last 24 h history. ### Operator actions - [ ] Per dead-letter PR: button `Retry` (re-dispatch with fresh counter) and `Snooze 24h` (suppress watchdog for that PR for 24 h). - [ ] Per stash branch: button `Inspect` (shows the diff in a drawer) and `Drop` (force-deletes the branch). ### Tests - [ ] Backend test: endpoint returns expected shape with seeded data. - [ ] Frontend test: empty state renders; populated state renders all 3 rows; Retry button POSTs to `/watchdog/retry/<pr>`. ## Out of scope - Real-time streaming via SSE (poll on 30 s is fine). - Historical analytics beyond 24 h. ## References - Spec: `docs/specs/automation-hardening.md` §4 B15. - B10 dead-letter event: `flow:dead-letter`. - B12 stash branches: `worktree-recovery/<sha>`. - B14 metric: `claude_session_resume_failures_total`. ## Dependencies - **Land after B10, B12, B14** — visualises their state.
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks#431
No description provided.