M19-6: Stall detection, round counter, force-merge badge polish #179

Closed
opened 2026-04-20 18:44:16 +00:00 by code-lead · 1 comment
Collaborator

As an operator, I want the pipeline to surface risk at a glance — stuck stages, review-loop rounds approaching MAX_ROUNDS, force-merge triggers — so that I catch issues before the escape-hatch fires.

Acceptance criteria

Stall detection (server-side, in #M19-1 derivation)

  • A stage is stalled when it has been running or pending longer than its threshold without SSE activity
  • Per-stage thresholds, tunable via config/agents.json::pipeline:
    • ci_threshold_ms (default 900_000 = 15 min)
    • review_threshold_ms (default 600_000 = 10 min — PR #160 stalled at ~3 min; 10 min is the comfort margin)
    • implement_threshold_ms (default 1_800_000 = 30 min)
    • default_stall_ms (600_000)

UI signals

  • Stalled stage pills render amber ⚠. Tooltip shows stalled_since and a "Bounce" quick-action
  • Review stage pill shows ↺ N when round counter > 0. Pill red ↺ N ⚠ when round counter >= MAX_ROUNDS - 1 (configurable via existing max_review_rounds)
  • Force-merge badge (gold ★) on Merge stage card when TaskRecord.force_merge is true
  • Filter chip "Only stalled / at-risk" in the filter bar

Quick actions

  • From a stalled Review stage tooltip: "Bounce review request" → fires DELETE /requested_reviewers + POST /requested_reviewers on the PR. Auth-gated by #M18-8 (operator-only)
  • From a stalled CI stage: "Re-trigger workflow" deep-link to the Forgejo Actions UI (no auto-retrigger server-side — safety)

Tests

  • pipeline-stall.test.ts — task that hasn't emitted in >threshold marked stalled; under threshold not
  • pipeline-stall.test.tsx — stalled pill renders amber, tooltip surfaces Bounce action, click fires the right mutations
  • pipeline-round-counter.test.tsx — round counter rendering at 0, 1, N-1 (amber), N (terminator fired)

Docs

  • CLAUDE.md "Label routing" gains "Pipeline stall thresholds" subsection
  • README: "Spotting stuck PRs in the monitor" paragraph

Out of scope

  • Push notifications / email on stall. SSE + dashboard is enough for now
  • Automatic bounce — operator-pull, not auto (safety)

Dependencies

  • Blocks on #M19-2
  • Can parallel with #M19-3 + #M19-4 + #M19-5

References

  • Spec: specs/m19-pipeline-monitor.md §Story M19-6
  • Real-world stall repros this fix surfaces visually: #160 (stale REQUEST_REVIEW), #171 (CI-fallback misfire)
As an operator, I want the pipeline to surface **risk** at a glance — stuck stages, review-loop rounds approaching MAX_ROUNDS, force-merge triggers — so that I catch issues before the escape-hatch fires. ## Acceptance criteria ### Stall detection (server-side, in #M19-1 derivation) - [ ] A stage is `stalled` when it has been `running` or `pending` longer than its threshold without SSE activity - [ ] Per-stage thresholds, tunable via `config/agents.json::pipeline`: - `ci_threshold_ms` (default 900_000 = 15 min) - `review_threshold_ms` (default 600_000 = 10 min — PR #160 stalled at ~3 min; 10 min is the comfort margin) - `implement_threshold_ms` (default 1_800_000 = 30 min) - `default_stall_ms` (600_000) ### UI signals - [ ] Stalled stage pills render amber ⚠. Tooltip shows `stalled_since` and a "Bounce" quick-action - [ ] Review stage pill shows `↺ N` when round counter > 0. Pill red `↺ N ⚠` when round counter >= `MAX_ROUNDS - 1` (configurable via existing `max_review_rounds`) - [ ] Force-merge badge (gold ★) on Merge stage card when `TaskRecord.force_merge` is true - [ ] Filter chip "Only stalled / at-risk" in the filter bar ### Quick actions - [ ] From a stalled Review stage tooltip: "Bounce review request" → fires `DELETE /requested_reviewers` + `POST /requested_reviewers` on the PR. Auth-gated by #M18-8 (operator-only) - [ ] From a stalled CI stage: "Re-trigger workflow" deep-link to the Forgejo Actions UI (no auto-retrigger server-side — safety) ### Tests - [ ] `pipeline-stall.test.ts` — task that hasn't emitted in >threshold marked stalled; under threshold not - [ ] `pipeline-stall.test.tsx` — stalled pill renders amber, tooltip surfaces Bounce action, click fires the right mutations - [ ] `pipeline-round-counter.test.tsx` — round counter rendering at 0, 1, N-1 (amber), N (terminator fired) ### Docs - [ ] CLAUDE.md "Label routing" gains "Pipeline stall thresholds" subsection - [ ] README: "Spotting stuck PRs in the monitor" paragraph ## Out of scope - Push notifications / email on stall. SSE + dashboard is enough for now - Automatic bounce — operator-pull, not auto (safety) ## Dependencies - **Blocks on #M19-2** - **Can parallel with #M19-3 + #M19-4 + #M19-5** ## References - Spec: `specs/m19-pipeline-monitor.md` §Story M19-6 - Real-world stall repros this fix surfaces visually: #160 (stale REQUEST_REVIEW), #171 (CI-fallback misfire)
Collaborator

Mockup reference: this story blocks on the Penpot frames produced by #181 (M19-0) — specifically the Stall tooltip and Filter bar frames (round counter, force-merge badge, "Only stalled / at-risk" toggle). Do not start CSS/layout work until the designer hands off and the design-reviewer verdict is APPROVED.

**Mockup reference**: this story blocks on the Penpot frames produced by **#181 (M19-0)** — specifically the **Stall tooltip** and **Filter bar** frames (round counter, force-merge badge, "Only stalled / at-risk" toggle). Do not start CSS/layout work until the designer hands off and the `design-reviewer` verdict is APPROVED.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks#179
No description provided.