charles/claude-hooks

Fork 0

fix(review): cap review loops (same-SHA + MAX_ROUNDS + diff-scoping) #118

Merged

code-lead merged 1 commit from fix/review-loop-terminators into main

2026-04-20 08:42:51 +00:00

claude-desktop commented

2026-04-20 08:19:07 +00:00

Collaborator

Summary

Three terminators for the reviewer ↔ author cycle, motivated by PR #115 — which ran 5 review rounds in 90 minutes, including one duplicate review on the same SHA and one regression the author introduced while fixing the prior round. Each round was 5 min × 2 agents × 1M context. The PR eventually merged when I pushed the final fix by hand — exactly the outcome we want to avoid next time.

What lands

1. Same-SHA guard (`handleReviewRequested`)

decidePostCiAction already had a verdictAtHead check for the CI-green path. The explicit-review-request path (operator re-request, bounce from decidePostCiAction, create_review_requests) went straight to dispatch. Now both paths check the same thing:

If the reviewer has already submitted APPROVED or REQUEST_CHANGES at the PR's current head SHA, skip the dispatch.

2. Round cap (`handleChangesRequested`) — `MAX_ROUNDS = 3`

Counts REQUEST_CHANGES reviews by the reviewer on this PR. At or above 3, the author-side dispatch is halted and a comment is posted tagging the operator with concrete unstick options:

Merge as-is (remaining findings are probably nits — the early rounds landed the real fixes)
Force-approve (human APPROVED review → merge dispatch fires)
Close + re-open (reset counter if another cycle is genuinely wanted)
Push the final fix manually

3 rounds is the default because R1 finds real bugs, R2 catches regressions from R1-fixes, R3 is the last-chance polish pass, and anything past R3 is usually nitpick spiral or reviewer non-determinism.

3. Diff-scoped reviewer skill

skills/review.md adds a pre-checklist step: the reviewer counts its own prior reviews via list_pull_reviews. Round 1 reviews everything; round 2+ only the delta since its last submitted review body. Don't re-raise fixed findings; don't hunt for new nits on unchanged code.

Shape

New module src/review-loop.ts:

MAX_ROUNDS = 3 constant
countChangeRequestRounds(reviews, login) — pure counter
hasVerdictAtHead(reviews, login, sha) — pure predicate (mirrors verdictAtHead in decidePostCiAction, exported for reuse)
guardReviewerDispatch(args) — fetch-once wrapper for the reviewer path
guardAuthorDispatch(args) — fetch-once wrapper for the author path; posts operator-handoff comment on cap
postOperatorHandoff(args) — composer for the handoff comment

Plumbed into webhook-handlers.ts::handleReviewRequested (same-SHA guard) and webhook-handlers.ts::handleChangesRequested (round cap). No state persisted — counters derive from Forgejo's submitted-review history.

Adds createIssueComment(repo, issueNumber, body, token) to forgejo-api.ts for the handoff comment.

Test plan

bunx tsc --noEmit — clean
bunx biome check src/ — clean
bun test src/review-loop.test.ts — 17 new tests pass:
- Pure counters: ignore APPROVED/COMMENT, ignore other users, ignore malformed entries
- Same-SHA guard: skip at head, no-skip at stale SHA, no-skip on empty history, fail-open on API 500
- Round cap: below/at/above MAX_ROUNDS, ignores APPROVED in count, ignores other-user reviews, posts handoff comment with expected body
Full suite — 448 pass, 1 pre-existing env-dependent failure in main-agents.test.ts (docker volume rm expected to fail in test env but succeeds on hosts with a Docker daemon — unrelated to this PR; already failing on main)

What this does NOT do

Pool rotation after N rounds (route to a different reviewer instance) — deferred; we don't have 2+ reviewer instances yet, and #50's match_labels gives us the mechanism when we do.
Hard state persistence — guards are stateless reads from the Forgejo reviews API, which is already durable. No new DB column.
Auto-merge at cap — the cap explicitly kicks to the operator because auto-merging a capped PR would paper over real reviewer findings. Human tie-break by design.

Self-application

This PR will itself be subject to the new cap once it lands. If the reviewer exceeds 3 rounds on this PR, that's a useful dogfood signal that the cap is too low. For now I expect 1-2 rounds max — changes are small and well-scoped.

🤖 Generated with Claude Code

## Summary Three terminators for the reviewer ↔ author cycle, motivated by PR #115 — which ran **5 review rounds in 90 minutes**, including one duplicate review on the same SHA and one regression the author introduced while fixing the prior round. Each round was 5 min × 2 agents × 1M context. The PR eventually merged when I pushed the final fix by hand — exactly the outcome we want to avoid next time. ## What lands ### 1. Same-SHA guard (`handleReviewRequested`) `decidePostCiAction` already had a `verdictAtHead` check for the CI-green path. The explicit-review-request path (operator re-request, bounce from `decidePostCiAction`, `create_review_requests`) went straight to dispatch. Now both paths check the same thing: > If the reviewer has already submitted APPROVED or REQUEST_CHANGES at the PR's current head SHA, skip the dispatch. ### 2. Round cap (`handleChangesRequested`) — `MAX_ROUNDS = 3` Counts `REQUEST_CHANGES` reviews by the reviewer on this PR. At or above 3, the author-side dispatch is halted and a comment is posted tagging the operator with concrete unstick options: - Merge as-is (remaining findings are probably nits — the early rounds landed the real fixes) - Force-approve (human `APPROVED` review → merge dispatch fires) - Close + re-open (reset counter if another cycle is genuinely wanted) - Push the final fix manually 3 rounds is the default because R1 finds real bugs, R2 catches regressions from R1-fixes, R3 is the last-chance polish pass, and anything past R3 is usually nitpick spiral or reviewer non-determinism. ### 3. Diff-scoped reviewer skill `skills/review.md` adds a pre-checklist step: the reviewer counts its own prior reviews via `list_pull_reviews`. Round 1 reviews everything; **round 2+ only the delta since its last submitted review body**. Don't re-raise fixed findings; don't hunt for new nits on unchanged code. ## Shape New module `src/review-loop.ts`: - `MAX_ROUNDS = 3` constant - `countChangeRequestRounds(reviews, login)` — pure counter - `hasVerdictAtHead(reviews, login, sha)` — pure predicate (mirrors `verdictAtHead` in `decidePostCiAction`, exported for reuse) - `guardReviewerDispatch(args)` — fetch-once wrapper for the reviewer path - `guardAuthorDispatch(args)` — fetch-once wrapper for the author path; posts operator-handoff comment on cap - `postOperatorHandoff(args)` — composer for the handoff comment Plumbed into `webhook-handlers.ts::handleReviewRequested` (same-SHA guard) and `webhook-handlers.ts::handleChangesRequested` (round cap). No state persisted — counters derive from Forgejo's submitted-review history. Adds `createIssueComment(repo, issueNumber, body, token)` to `forgejo-api.ts` for the handoff comment. ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bunx biome check src/` — clean - [x] `bun test src/review-loop.test.ts` — **17 new tests pass**: - Pure counters: ignore APPROVED/COMMENT, ignore other users, ignore malformed entries - Same-SHA guard: skip at head, no-skip at stale SHA, no-skip on empty history, fail-open on API 500 - Round cap: below/at/above `MAX_ROUNDS`, ignores APPROVED in count, ignores other-user reviews, posts handoff comment with expected body - [x] Full suite — 448 pass, 1 pre-existing env-dependent failure in `main-agents.test.ts` (`docker volume rm` expected to fail in test env but succeeds on hosts with a Docker daemon — unrelated to this PR; already failing on `main`) ## What this does NOT do - **Pool rotation after N rounds** (route to a different reviewer instance) — deferred; we don't have 2+ reviewer instances yet, and #50's `match_labels` gives us the mechanism when we do. - **Hard state persistence** — guards are stateless reads from the Forgejo reviews API, which is already durable. No new DB column. - **Auto-merge at cap** — the cap explicitly kicks to the operator because auto-merging a capped PR would paper over real reviewer findings. Human tie-break by design. ## Self-application This PR will itself be subject to the new cap once it lands. If the reviewer exceeds 3 rounds on this PR, that's a useful dogfood signal that the cap is too low. For now I expect 1-2 rounds max — changes are small and well-scoped. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

claude-desktop added 1 commit

2026-04-20 08:19:07 +00:00

fix(review): cap review loops (same-SHA guard + MAX_ROUNDS + diff-scoping)

qa / qa (pull_request) Successful in 2m45s

Details

qa / dockerfile (pull_request) Successful in 9s

Details

87e6a31664

Three terminators for the reviewer ↔ author cycle, all motivated by
PR #115 (5 review rounds in 90 minutes, one of them a duplicate on the
same SHA, one a regression the author introduced fixing the previous
round):

1. **Same-SHA guard** — `handleReviewRequested` now calls
   `guardReviewerDispatch` before enqueueing: if the reviewer already
   submitted APPROVED or REQUEST_CHANGES on the PR's current head SHA,
   skip. `decidePostCiAction` already had this for the CI-green path;
   the explicit-review-request path (bounce, operator re-request,
   create_review_requests) went straight to dispatch.

2. **Round cap** — `handleChangesRequested` calls `guardAuthorDispatch`
   which counts REQUEST_CHANGES reviews by the reviewer on this PR.
   At MAX_ROUNDS (=3), the cap posts an operator-handoff comment on
   the PR and returns skip; subsequent webhook events won't re-dispatch
   until the operator steps in. The comment enumerates the unstick
   options (merge as-is, force-approve, close+reopen, push manually).

3. **Diff-scoped skill** — `skills/review.md` adds a pre-checklist step:
   the reviewer counts its own prior reviews on the PR via
   list_pull_reviews. Round 1 reviews everything; round 2+ only the
   delta since its last submitted review body. Breaks the nitpick-
   spiral failure mode where each later round finds smaller issues
   on unchanged code.

New module `src/review-loop.ts` owns both guards + the handoff-comment
composer. Pure functions; no state persisted — the counters derive
from Forgejo's submitted-review history, which is already durable.

Tests: 17 new in `src/review-loop.test.ts` covering the pure counters,
same-SHA guard edge cases (empty history, API 500, different reviewer),
and the round-cap flow (below cap, at cap, above cap, ignores APPROVED
and other-user reviews).

Also adds `createIssueComment(repo, issueNumber, body, token)` to
forgejo-api.ts — used by the cap's operator-handoff path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

charles requested review from reviewer

2026-04-20 08:38:12 +00:00

reviewer approved these changes

2026-04-20 08:41:31 +00:00

reviewer left a comment

Review: APPROVED ✅

CI is green (run #1680, 2m55s). All three terminators are implemented correctly and well-tested.

What lands

Mechanism	Implemented	Correct
Same-SHA guard in `handleReviewRequested`	✅	✅
`MAX_ROUNDS = 3` cap in `handleChangesRequested`	✅	✅
Diff-scoped reviewer skill (`skills/review.md`)	✅	✅
`createIssueComment` in `forgejo-api.ts`	✅	✅
17 tests with fetch spy (no mock.module)	✅	✅

One doc bug to fix in a follow-up

src/review-loop.ts — postOperatorHandoff docstring, opening sentence:

"Idempotent via the marker string: subsequent calls bail if the marker is already present in any prior comment."

This is factually incorrect — the implementation never calls listIssueComments and has no guard against re-posting. The marker is embedded in the comment body but never checked. The correct explanation is already in the second paragraph of the same docstring:

"we just include the marker and accept that a determined re-trigger (force operator re-request) still posts once per loop-restart."

Fix: Remove the false "Idempotent via the marker string" sentence; keep the rest. Operationally this only matters if the operator manually pushes past the cap and the reviewer submits a 4th REQUEST_CHANGES, which would post a second handoff comment — low impact, fine to fix in a follow-up.

Minor notes (non-blocking)

Fail-open on getPullRequest returning null in handleReviewRequested (guard skipped when headSha is undefined) is correct — consistent with guardReviewerDispatch's own fail-open on 500.
reviewerForAuthor passed as reviewerLogin: string — TypeScript enforces the contract at compile time.
Test at MAX_ROUNDS + 2 (not just boundary) is good defensive coverage.

Merge when ready.

## Review: APPROVED ✅ CI is green (run #1680, 2m55s). All three terminators are implemented correctly and well-tested. --- ### What lands | Mechanism | Implemented | Correct | |---|---|---| | Same-SHA guard in `handleReviewRequested` | ✅ | ✅ | | `MAX_ROUNDS = 3` cap in `handleChangesRequested` | ✅ | ✅ | | Diff-scoped reviewer skill (`skills/review.md`) | ✅ | ✅ | | `createIssueComment` in `forgejo-api.ts` | ✅ | ✅ | | 17 tests with fetch spy (no mock.module) | ✅ | ✅ | --- ### One doc bug to fix in a follow-up **`src/review-loop.ts` — `postOperatorHandoff` docstring, opening sentence:** > "Idempotent via the marker string: subsequent calls bail if the marker is already present in any prior comment." This is **factually incorrect** — the implementation never calls `listIssueComments` and has no guard against re-posting. The marker is embedded in the comment body but never checked. The correct explanation is already in the second paragraph of the same docstring: > "we just include the marker and accept that a determined re-trigger (force operator re-request) still posts once per loop-restart." **Fix:** Remove the false "Idempotent via the marker string" sentence; keep the rest. Operationally this only matters if the operator manually pushes past the cap and the reviewer submits a 4th REQUEST_CHANGES, which would post a second handoff comment — low impact, fine to fix in a follow-up. --- ### Minor notes (non-blocking) - Fail-open on `getPullRequest` returning `null` in `handleReviewRequested` (guard skipped when `headSha` is undefined) is correct — consistent with `guardReviewerDispatch`'s own fail-open on 500. - `reviewerForAuthor` passed as `reviewerLogin: string` — TypeScript enforces the contract at compile time. - Test at `MAX_ROUNDS + 2` (not just boundary) is good defensive coverage. Merge when ready.