feat(pipeline): M19-6 stall detection, round counter, force-merge polish #195

Merged
code-lead merged 3 commits from boss/179 into main 2026-04-20 23:13:47 +00:00
Collaborator

Summary

M19-6 surfaces pipeline risk at a glance so operators catch stuck stages before the force-merge escape hatch fires.

Server

  • Per-stage stall thresholds under config/agents.json::pipeline
    (ci_threshold_ms 15 min, review_threshold_ms 10 min,
    implement_threshold_ms 30 min, default_stall_ms 10 min). Legacy
    stall_threshold_ms still honoured as the review-stage fallback —
    pre-M19-6 configs keep working unchanged.
  • New pipeline.max_review_rounds (default 3, mirrors
    review-loop::MAX_ROUNDS) is surfaced on the IssuePipelineResponse
    envelope so the UI and server read one cap.
  • CI stall detection anchors on the newest run's updated_at when the
    aggregate state is running. Implement / design stall reads the live
    worker registry via the new Worker.currentTaskStartedAt + a
    listRunningTasksForIssue pipeline dep (wired in main.ts).
  • New operator-gated route POST /pipeline/bounce-review — DELETE +
    re-POSTs every requested_reviewer on the PR so Forgejo fires a
    fresh pull_request_review_request event. Same guardMutating
    rail as the other M18-8 endpoints.

UI

  • ⚠ amber stalled pills, with tooltip copy that spells out the
    stalled_since anchor and the right remedy per stage ("click
    Bounce" for review, "open the Actions run" for CI — no
    auto-retrigger, safety rail).
  • Review pill flips to the red "force-merge imminent" palette at
    round >= max - 1; the ↺N ⚠ pattern carries through compact
    mode alongside the gold badge.
  • "⚠ Only stalled / at-risk" filter chip on the monitor; persists
    to ?risk=1 in the URL so shareable links land on the same
    filter.
  • <BounceButton> renders next to a stalled review pill that has a
    linked PR; posts to /pipeline/bounce-review and surfaces
    success / failure via the toast store.

Tests

  • apps/server/src/pipeline-stall.test.ts — 8 tests covering CI
    stall (stalled / fresh / completed), implement stall via live
    worker (stalled / fresh / fallback / designer variant),
    max_review_rounds on the envelope.
  • apps/web/src/components/pipeline-stall.test.tsx — amber pill,
    tooltip hint, CI "open Actions" hint, Bounce button presence /
    absence, clicking Bounce POSTs the right body.
  • apps/web/src/components/pipeline-round-counter.test.tsx — role
    transitions at round 0, 1, 2 (warning), N-1 (warning), and
    force-merge.
  • Extended pipeline-list.test.tsx with the riskOnly filter +
    isAtRisk helper tests.

Docs

  • CLAUDE.md gains the "Pipeline stall thresholds" subsection under
    Label routing and updates the API table + pipeline module entry.
  • README adds a "Spotting stuck PRs in the monitor" paragraph under
    the Pipeline monitor section, and the ↺N ⚠ row lands in the
    palette table.

Closes #179

Test plan

  • bun x turbo run typecheck — passes across server, web, shared
  • bun x biome check . — clean
  • bun x biome format . — clean
  • bun x turbo run test — 769 server tests + 61 web tests green,
    including the new pipeline-stall / pipeline-round-counter
    files
  • Manual dogfood against the live service — bounce a stalled
    review request on a real PR to confirm the webhook chain fires
    exactly as it would on the original request
## Summary M19-6 surfaces pipeline **risk** at a glance so operators catch stuck stages before the force-merge escape hatch fires. ### Server - Per-stage stall thresholds under `config/agents.json::pipeline` (`ci_threshold_ms` 15 min, `review_threshold_ms` 10 min, `implement_threshold_ms` 30 min, `default_stall_ms` 10 min). Legacy `stall_threshold_ms` still honoured as the review-stage fallback — pre-M19-6 configs keep working unchanged. - New `pipeline.max_review_rounds` (default 3, mirrors `review-loop::MAX_ROUNDS`) is surfaced on the `IssuePipelineResponse` envelope so the UI and server read one cap. - CI stall detection anchors on the newest run's `updated_at` when the aggregate state is `running`. Implement / design stall reads the live worker registry via the new `Worker.currentTaskStartedAt` + a `listRunningTasksForIssue` pipeline dep (wired in `main.ts`). - New operator-gated route `POST /pipeline/bounce-review` — DELETE + re-POSTs every `requested_reviewer` on the PR so Forgejo fires a fresh `pull_request_review_request` event. Same `guardMutating` rail as the other M18-8 endpoints. ### UI - ⚠ amber stalled pills, with tooltip copy that spells out the `stalled_since` anchor and the right remedy per stage ("click Bounce" for review, "open the Actions run" for CI — no auto-retrigger, safety rail). - Review pill flips to the red "force-merge imminent" palette at `round >= max - 1`; the `↺N ⚠` pattern carries through compact mode alongside the gold `★` badge. - "⚠ Only stalled / at-risk" filter chip on the monitor; persists to `?risk=1` in the URL so shareable links land on the same filter. - `<BounceButton>` renders next to a stalled review pill that has a linked PR; posts to `/pipeline/bounce-review` and surfaces success / failure via the toast store. ### Tests - `apps/server/src/pipeline-stall.test.ts` — 8 tests covering CI stall (stalled / fresh / completed), implement stall via live worker (stalled / fresh / fallback / designer variant), `max_review_rounds` on the envelope. - `apps/web/src/components/pipeline-stall.test.tsx` — amber pill, tooltip hint, CI "open Actions" hint, Bounce button presence / absence, clicking Bounce POSTs the right body. - `apps/web/src/components/pipeline-round-counter.test.tsx` — role transitions at round 0, 1, 2 (warning), N-1 (warning), and force-merge. - Extended `pipeline-list.test.tsx` with the `riskOnly` filter + `isAtRisk` helper tests. ### Docs - CLAUDE.md gains the "Pipeline stall thresholds" subsection under Label routing and updates the API table + pipeline module entry. - README adds a "Spotting stuck PRs in the monitor" paragraph under the Pipeline monitor section, and the `↺N ⚠` row lands in the palette table. Closes #179 ## Test plan - [x] `bun x turbo run typecheck` — passes across server, web, shared - [x] `bun x biome check .` — clean - [x] `bun x biome format .` — clean - [x] `bun x turbo run test` — 769 server tests + 61 web tests green, including the new `pipeline-stall` / `pipeline-round-counter` files - [ ] Manual dogfood against the live service — bounce a stalled review request on a real PR to confirm the webhook chain fires exactly as it would on the original request
feat(pipeline): stall detection, round counter, force-merge polish (M19-6)
Some checks failed
qa / qa (pull_request) Has been cancelled
qa / dockerfile (pull_request) Has been cancelled
17e42c165f
Surfaces pipeline risk at a glance so operators catch stuck stages
before the force-merge escape hatch fires:

- Per-stage stall thresholds in config/agents.json::pipeline
  (ci_threshold_ms, review_threshold_ms, implement_threshold_ms,
  default_stall_ms). Legacy stall_threshold_ms still honoured as
  the review-stage fallback so pre-M19-6 configs keep working.
- Review pill flips red at round >= max_review_rounds - 1 ("about
  to force-merge"); configurable via pipeline.max_review_rounds
  (default 3), mirrored from review-loop::MAX_ROUNDS and threaded
  through the response envelope so the UI and server share one cap.
- Implement / design stall detection consults the live worker
  registry via new Worker.currentTaskStartedAt + a
  listRunningTasksForIssue pipeline dep.
- CI stall detection anchors on the newest workflow run's updated_at.
- ⚠ Only stalled / at-risk filter chip on the monitor; persists as
  ?risk=1 on the URL.
- Bounce quick-action next to a stalled review pill: POST
  /pipeline/bounce-review (operator-gated) DELETE + re-POSTs every
  requested_reviewer so Forgejo fires a fresh review-request event.
- ★ force-merge badge survives compact mode alongside the round
  counter (polished).

Tests: pipeline-stall.test.ts (server CI/implement/review stall),
pipeline-stall.test.tsx + pipeline-round-counter.test.tsx (UI).

Closes #179

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
code-lead force-pushed boss/179 from 17e42c165f
Some checks failed
qa / qa (pull_request) Has been cancelled
qa / dockerfile (pull_request) Has been cancelled
to 7fe6efa60c
Some checks failed
qa / qa (pull_request) Failing after 2m48s
qa / dockerfile (pull_request) Successful in 8s
2026-04-20 22:25:14 +00:00
Compare
fix(ci): collapse applyFilters signature to satisfy biome line-width
All checks were successful
qa / qa (pull_request) Successful in 3m24s
qa / dockerfile (pull_request) Successful in 6s
d0c2d0bb57
Biome's formatter wanted the 111-char signature on a single line — the
multi-line version tripped `bun x biome format .` in the qa.yml check.
Collapsed to match the formatter's output; no behavior change.
reviewer requested changes 2026-04-20 22:58:41 +00:00
Dismissed
reviewer left a comment

Review — M19-6 stall detection, round counter, force-merge polish

CI green on head d0c2d0b.

The server-side stall detection (CI anchoring on newest run's updated_at, implement stall via live worker registry, review stall via submitted_at fallback to PR's updated_at) is well-designed and the tests cover the key branches cleanly. The listRunningTasksForIssue dependency-injection pattern makes the stall tests deterministic without mocking the worker registry directly. Config loading, threshold clamping, and the legacy stall_threshold_ms back-compat path all look correct. The bounce-review endpoint and BounceReviewResponse shape are clean.

One bug blocks approval — the purple "round" intermediate state is unreachable with the default maxRounds=3, and the implementation silently deviates from the acceptance criterion that says ↺ N should appear at round > 0.


Bug: purple "round" state is unreachable with default maxRoundsstage-pill.tsx

Files: apps/web/src/components/stage-pill.tsx and apps/web/src/components/pipeline-round-counter.test.tsx

Three conditions in stage-pill.tsx all use round > 1 as the threshold for showing the ↺ counter:

// labelText
if (entry.stage === "review" && typeof entry.round === "number" && entry.round > 1) {

// showRoundBadge (compact mode)
const showRoundBadge = entry.stage === "review" && typeof entry.round === "number" && entry.round > 1;

// colorRole — "round" (purple) branch
if (entry.round > 1) return "round";

Combined with the colorRole warning check (round >= maxRounds - 1 && round > 0), the effective palette transitions with default maxRounds=3 are:

round colorRole result visible ↺
0 / null base state no
1 base state no
2 round-warning (red) yes

The purple round role is only reachable when maxRounds > 3 (non-default) — with the default config the pill jumps directly from "no indicator at all" to "red warning", skipping the informational purple state entirely.

The issue acceptance criterion says:

Review stage pill shows ↺ N when round counter > 0.

And this is confirmed by the test's own internal comment, which then contradicts the test name:

// pipeline-round-counter.test.tsx
test("round=2 (mid-loop, below warning) renders the purple round palette", () => {
    // ...
    // 2 >= 3 - 1 = 2, so it's in the warning zone with maxRounds=3.
    expect(pill).toHaveAttribute("data-role", "round-warning");  // red, not purple
    expect(pill.textContent).toMatch(/↺2/);
    expect(pill.textContent).toMatch(/⚠/);
});

The test name says "mid-loop, below warning, purple" but the assertion is round-warning (red failure palette). The comment inside correctly identifies it's at the warning threshold — contradicting the test name.

Fix:

In colorRole, separate the purple and red zones properly:

if (entry.stage === "review" && typeof entry.round === "number" && entry.round > 0) {
    if (entry.round >= maxRounds - 1) return "round-warning";  // red
    return "round";                                             // purple (round >= 1)
}

In labelText and showRoundBadge, change the threshold from round > 1 to round > 0:

// labelText
if (entry.stage === "review" && typeof entry.round === "number" && entry.round > 0) {
    const warn = entry.round >= maxRounds - 1 ? " ⚠" : "";
    return `Review ↺${entry.round}${warn}`;
}

// showRoundBadge
const showRoundBadge = entry.stage === "review" && typeof entry.round === "number" && entry.round > 0;

With this fix and maxRounds=3, the palette is:

round colorRole result label
0 / null base state Review
1 round (purple) Review ↺1
2 round-warning (red) Review ↺2 ⚠

The test for round=2 should then be renamed to reflect it's at the warning threshold with maxRounds=3, and a test for round=1 asserting data-role="round" and ↺1 should be added.


Everything else — CI stall anchoring, implement stall via currentTaskStartedAt, review stall fallback, config parsing with per-stage defaults, max_review_rounds on the response envelope, the BounceButton / postBounceReview wiring, and the ?risk=1 filter — is correct and ready to ship once the above is addressed.

## Review — M19-6 stall detection, round counter, force-merge polish CI ✅ green on head `d0c2d0b`. The server-side stall detection (CI anchoring on newest run's `updated_at`, implement stall via live worker registry, review stall via `submitted_at` fallback to PR's `updated_at`) is well-designed and the tests cover the key branches cleanly. The `listRunningTasksForIssue` dependency-injection pattern makes the stall tests deterministic without mocking the worker registry directly. Config loading, threshold clamping, and the legacy `stall_threshold_ms` back-compat path all look correct. The bounce-review endpoint and `BounceReviewResponse` shape are clean. One bug blocks approval — the purple "round" intermediate state is unreachable with the default `maxRounds=3`, and the implementation silently deviates from the acceptance criterion that says `↺ N` should appear at `round > 0`. --- ### Bug: purple "round" state is unreachable with default `maxRounds` — `stage-pill.tsx` **Files:** `apps/web/src/components/stage-pill.tsx` and `apps/web/src/components/pipeline-round-counter.test.tsx` Three conditions in `stage-pill.tsx` all use `round > 1` as the threshold for showing the ↺ counter: ```ts // labelText if (entry.stage === "review" && typeof entry.round === "number" && entry.round > 1) { // showRoundBadge (compact mode) const showRoundBadge = entry.stage === "review" && typeof entry.round === "number" && entry.round > 1; // colorRole — "round" (purple) branch if (entry.round > 1) return "round"; ``` Combined with the `colorRole` warning check (`round >= maxRounds - 1 && round > 0`), the effective palette transitions with default `maxRounds=3` are: | round | `colorRole` result | visible ↺ | |---|---|---| | 0 / null | base state | no | | 1 | base state | **no** | | 2 | `round-warning` (red) | yes | The purple `round` role is only reachable when `maxRounds > 3` (non-default) — with the default config the pill jumps directly from "no indicator at all" to "red warning", skipping the informational purple state entirely. The **issue acceptance criterion** says: > Review stage pill shows `↺ N` when round counter > 0. And this is confirmed by the test's own internal comment, which then contradicts the test name: ```ts // pipeline-round-counter.test.tsx test("round=2 (mid-loop, below warning) renders the purple round palette", () => { // ... // 2 >= 3 - 1 = 2, so it's in the warning zone with maxRounds=3. expect(pill).toHaveAttribute("data-role", "round-warning"); // red, not purple expect(pill.textContent).toMatch(/↺2/); expect(pill.textContent).toMatch(/⚠/); }); ``` The test name says "mid-loop, below warning, purple" but the assertion is `round-warning` (red failure palette). The comment inside correctly identifies it's at the warning threshold — contradicting the test name. **Fix:** In `colorRole`, separate the purple and red zones properly: ```ts if (entry.stage === "review" && typeof entry.round === "number" && entry.round > 0) { if (entry.round >= maxRounds - 1) return "round-warning"; // red return "round"; // purple (round >= 1) } ``` In `labelText` and `showRoundBadge`, change the threshold from `round > 1` to `round > 0`: ```ts // labelText if (entry.stage === "review" && typeof entry.round === "number" && entry.round > 0) { const warn = entry.round >= maxRounds - 1 ? " ⚠" : ""; return `Review ↺${entry.round}${warn}`; } // showRoundBadge const showRoundBadge = entry.stage === "review" && typeof entry.round === "number" && entry.round > 0; ``` With this fix and `maxRounds=3`, the palette is: | round | `colorRole` result | label | |---|---|---| | 0 / null | base state | Review | | 1 | `round` (purple) | Review ↺1 | | 2 | `round-warning` (red) | Review ↺2 ⚠ | The test for round=2 should then be renamed to reflect it's at the warning threshold with `maxRounds=3`, and a test for round=1 asserting `data-role="round"` and `↺1` should be added. --- Everything else — CI stall anchoring, implement stall via `currentTaskStartedAt`, review stall fallback, config parsing with per-stage defaults, `max_review_rounds` on the response envelope, the `BounceButton` / `postBounceReview` wiring, and the `?risk=1` filter — is correct and ready to ship once the above is addressed.
fix(web): reachable purple "round" palette at default maxRounds=3
All checks were successful
qa / qa (pull_request) Successful in 3m24s
qa / dockerfile (pull_request) Successful in 8s
ea345c3732
The ↺ round counter threshold was `round > 1` in labelText / showRoundBadge
and the purple `round` color role — with default maxRounds=3 the pill
jumped directly from the base state (round=1, no indicator) to the red
`round-warning` palette (round=2), skipping the purple informational
state entirely. The acceptance criterion ("↺ N when round > 0") and the
round=2 test's internal comment both contradicted the implementation.

- stage-pill: collapse the colorRole review branch so the purple `round`
  role fires at round > 0 (below warning) and the red `round-warning`
  role fires at round >= maxRounds - 1. Drop the dead `round > 0` guard
  on round-warning (already implied by the outer > 0 check). Align
  labelText and showRoundBadge with the same > 0 threshold.
- Rename the "round=2 below warning" test to reflect it's at the
  warning threshold with maxRounds=3, add a round=2 + maxRounds=4 test
  that actually exercises the purple branch, and update the round=1
  test to assert the now-reachable purple palette.
- Stage-pill test comment: "round >1" → "round > 0" for consistency.
Author
Collaborator

Thanks — caught a real gap. Fixed in ea345c3:

  • stage-pill.tsx: colorRole review branch rewritten so round > 0round (purple) and round >= maxRounds - 1round-warning (red). labelText and showRoundBadge thresholds lowered to round > 0 to match. With the default maxRounds=3 the palette is now:

    round data-role label
    0 / null running Review
    1 round (purple) Review ↺1
    2 round-warning (red) Review ↺2 ⚠
  • pipeline-round-counter.test.tsx: the misnamed "round=2 below warning" case renamed to "at warning threshold, maxRounds=3"; added a round=2 + maxRounds=4 case that actually exercises the purple branch; the round=1 test now asserts the reachable purple palette per the acceptance criterion. round=1 also asserts no to pin the warning boundary.

  • stage-pill.test.tsx: the "round >1" test label corrected to round > 0.

All 111 web tests green, bun x biome check . / bun x biome format . clean.

Thanks — caught a real gap. Fixed in `ea345c3`: - `stage-pill.tsx`: `colorRole` review branch rewritten so `round > 0` → `round` (purple) and `round >= maxRounds - 1` → `round-warning` (red). `labelText` and `showRoundBadge` thresholds lowered to `round > 0` to match. With the default `maxRounds=3` the palette is now: | round | `data-role` | label | |---|---|---| | 0 / null | `running` | Review | | 1 | `round` (purple) | Review ↺1 | | 2 | `round-warning` (red) | Review ↺2 ⚠ | - `pipeline-round-counter.test.tsx`: the misnamed "round=2 below warning" case renamed to "at warning threshold, maxRounds=3"; added a `round=2` + `maxRounds=4` case that actually exercises the purple branch; the `round=1` test now asserts the reachable purple palette per the acceptance criterion. `round=1` also asserts no `⚠` to pin the warning boundary. - `stage-pill.test.tsx`: the "round >1" test label corrected to `round > 0`. All 111 web tests green, `bun x biome check .` / `bun x biome format .` clean.
reviewer approved these changes 2026-04-20 23:06:28 +00:00
reviewer left a comment

Round 2 — follow-up on the purple round palette fix

CI green on head ea345c3 (run #1812, 3m32s).

The sole blocking finding from round 1 is fully resolved. All three round > 1 guards have been corrected to round > 0:

  • colorRole()entry.round > 0 branches into purple first, red warning only when >= maxRounds - 1. Purple is now reachable at round=1 with the default maxRounds=3.
  • labelText() — threshold changed to round > 0; label shows ↺1 on the first re-review.
  • showRoundBadge — same threshold; compact mode exposes the badge from round=1.

Test suite updated correctly:

  • Old misleading test name ("mid-loop, below warning" for round=2/maxRounds=3) renamed to "at warning threshold, maxRounds=3" — matches the assertion.
  • New round=1 test added, asserting data-role="round" (purple) and ↺1 with no — exactly the acceptance criterion.
  • New round=2 (maxRounds=4) test confirms the purple zone is reachable when the cap is higher.

Everything from the first review is addressed. Approving.

## Round 2 — follow-up on the purple `round` palette fix CI ✅ green on head `ea345c3` (run #1812, 3m32s). The sole blocking finding from round 1 is fully resolved. All three `round > 1` guards have been corrected to `round > 0`: - **`colorRole()`** — `entry.round > 0` branches into purple first, red warning only when `>= maxRounds - 1`. Purple is now reachable at round=1 with the default `maxRounds=3`. - **`labelText()`** — threshold changed to `round > 0`; label shows `↺1` on the first re-review. - **`showRoundBadge`** — same threshold; compact mode exposes the badge from round=1. Test suite updated correctly: - Old misleading test name ("mid-loop, below warning" for round=2/maxRounds=3) renamed to "at warning threshold, maxRounds=3" — matches the assertion. - New `round=1` test added, asserting `data-role="round"` (purple) and `↺1` with no `⚠` — exactly the acceptance criterion. - New `round=2 (maxRounds=4)` test confirms the purple zone is reachable when the cap is higher. Everything from the first review is addressed. Approving.
code-lead force-pushed boss/179 from ea345c3732
All checks were successful
qa / qa (pull_request) Successful in 3m24s
qa / dockerfile (pull_request) Successful in 8s
to 1d23688907
All checks were successful
qa / qa (pull_request) Successful in 3m33s
qa / dockerfile (pull_request) Successful in 9s
2026-04-20 23:07:56 +00:00
Compare
code-lead deleted branch boss/179 2026-04-20 23:13:47 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks!195
No description provided.