Expand webhook test depth — handler logic, CI dedup, session resume #37

Closed
opened 2026-04-18 10:34:23 +00:00 by claude-desktop · 0 comments
Collaborator

User story

As a maintainer, I want the webhook tests to actually assert what the handlers do — not just "returns 200 without crashing" — so that a future regression in CI dedup, no-workflows routing, or session resume fails a test instead of silently shipping.

Context

Audit 2026-04-18 flagged shallow tests: webhook.test.ts checks that a handler returns HTTP 200 and bails when no token is configured, but no spy on fetch confirms whether any API call was (or was not) made. The pass conditions are "didn't crash", not "did the right thing".

Complexity without corresponding tests:

  • prMergeable, dispatchRebaseIfNotMergeable — the synchronize rebase dispatch.
  • requestReviewIfFresh dedup — already-requested vs. verdict-at-head skip paths.
  • handleStatusEvent vs. handleActionRunEvent — aggregate green → request review routing.
  • Fallback timer arm/cancel — new in this session.
  • Worker.runAgent session resume — zero tests on the three-state transition (fresh, resume-success, resume-fail-and-retry).

Acceptance criteria

Fetch spy harness

  • webhook.test.ts (or the split test files if #A landed) gains a thin fetch mock harness: tests can register URL-pattern → response handlers, and assert which URLs were called in what order. Prefer a tiny in-repo helper over a library.
  • Existing "returns 200" tests gain explicit expect(fetchSpy.calls).toHaveLength(0) (or .toEqual([...])) assertions so a future regression that adds a stray API call fails.

CI gate tests

  • requestReviewIfFresh tests: (a) no request made if reviewer already in requested_reviewers; (b) no request if an APPROVED/REQUEST_CHANGES review exists at the head SHA; (c) POST /requested_reviewers called exactly once with {reviewers: ["reviewer"]} when neither condition holds; (d) PR author == reviewer → no-op.
  • handleStatusEvent: aggregate success routes to requestReviewIfFresh; aggregate failure routes to dispatchFixCi; pending is a no-op.
  • handleActionRunEvent: action_run_failure dispatches fix-ci with the failing workflow_id in the prompt; action_run_success checks aggregate before requesting review.
  • repoHasWorkflows returns true when .forgejo/workflows/ has entries, true when .github/workflows/ has entries, false when neither does, false on 404/network error.

Fallback timer tests

  • armReviewFallback schedules a call to requestReviewIfFresh after REVIEW_FALLBACK_MS; cancelReviewFallback prevents it. Use fake timers (Bun's vi.useFakeTimers / setSystemTime equivalent) or manually mock setTimeout.
  • Any action_run_* or status event for the head SHA cancels a previously-armed fallback.

Session resume tests

  • runAgent (or its extracted runAgentTask if #C landed) is tested across: (a) fresh dispatch (no stored session) → SDK called without resume; (b) stored session present → SDK called with resume: <id>; (c) SDK rejects resumed session → session dropped, SDK called a second time without resume, result captured.

Coverage bookkeeping

  • just qa passes including new tests.
  • No flaky tests introduced (each test run deterministic; fake timers / fetch mocks fully controlled).
  • Coverage on the touched files is visibly higher (spot-check via bun test --coverage if available, otherwise count assertions).

Out of scope

  • Full E2E rework (the existing live-service E2E stays manual).
  • Rewriting untouched tests for unrelated modules.

References

  • Codebase audit 2026-04-18.

Dependencies

  • Blocked by: #A (cleaner seams make mocking easier).
  • Optionally helped by: #C (extracts runAgent — simpler to test in isolation).
  • Branch off: whichever branch #A merged to (likely main).
## User story As a **maintainer**, I want the webhook tests to **actually assert** what the handlers do — not just "returns 200 without crashing" — so that a future regression in CI dedup, no-workflows routing, or session resume fails a test instead of silently shipping. ## Context Audit 2026-04-18 flagged shallow tests: `webhook.test.ts` checks that a handler returns HTTP 200 and bails when no token is configured, but **no spy on `fetch`** confirms whether any API call was (or was not) made. The pass conditions are "didn't crash", not "did the right thing". Complexity without corresponding tests: - `prMergeable`, `dispatchRebaseIfNotMergeable` — the synchronize rebase dispatch. - `requestReviewIfFresh` dedup — already-requested vs. verdict-at-head skip paths. - `handleStatusEvent` vs. `handleActionRunEvent` — aggregate green → request review routing. - Fallback timer arm/cancel — new in this session. - `Worker.runAgent` session resume — zero tests on the three-state transition (fresh, resume-success, resume-fail-and-retry). ## Acceptance criteria ### Fetch spy harness - [ ] `webhook.test.ts` (or the split test files if **#A** landed) gains a thin `fetch` mock harness: tests can register URL-pattern → response handlers, and assert which URLs were called in what order. Prefer a tiny in-repo helper over a library. - [ ] Existing "returns 200" tests gain explicit `expect(fetchSpy.calls).toHaveLength(0)` (or `.toEqual([...])`) assertions so a future regression that adds a stray API call fails. ### CI gate tests - [ ] `requestReviewIfFresh` tests: (a) no request made if reviewer already in `requested_reviewers`; (b) no request if an APPROVED/REQUEST_CHANGES review exists at the head SHA; (c) POST `/requested_reviewers` called exactly once with `{reviewers: ["reviewer"]}` when neither condition holds; (d) PR author == reviewer → no-op. - [ ] `handleStatusEvent`: aggregate success routes to `requestReviewIfFresh`; aggregate failure routes to `dispatchFixCi`; pending is a no-op. - [ ] `handleActionRunEvent`: `action_run_failure` dispatches fix-ci with the failing workflow_id in the prompt; `action_run_success` checks aggregate before requesting review. - [ ] `repoHasWorkflows` returns true when `.forgejo/workflows/` has entries, true when `.github/workflows/` has entries, false when neither does, false on 404/network error. ### Fallback timer tests - [ ] `armReviewFallback` schedules a call to `requestReviewIfFresh` after `REVIEW_FALLBACK_MS`; `cancelReviewFallback` prevents it. Use fake timers (Bun's `vi.useFakeTimers` / `setSystemTime` equivalent) or manually mock `setTimeout`. - [ ] Any `action_run_*` or `status` event for the head SHA cancels a previously-armed fallback. ### Session resume tests - [ ] `runAgent` (or its extracted `runAgentTask` if **#C** landed) is tested across: (a) fresh dispatch (no stored session) → SDK called without `resume`; (b) stored session present → SDK called with `resume: <id>`; (c) SDK rejects resumed session → session dropped, SDK called a second time without resume, result captured. ### Coverage bookkeeping - [ ] `just qa` passes including new tests. - [ ] No flaky tests introduced (each test run deterministic; fake timers / fetch mocks fully controlled). - [ ] Coverage on the touched files is visibly higher (spot-check via `bun test --coverage` if available, otherwise count assertions). ## Out of scope - Full E2E rework (the existing live-service E2E stays manual). - Rewriting untouched tests for unrelated modules. ## References - Codebase audit 2026-04-18. ## Dependencies - **Blocked by:** `#A` (cleaner seams make mocking easier). - **Optionally helped by:** `#C` (extracts `runAgent` — simpler to test in isolation). - **Branch off:** whichever branch `#A` merged to (likely `main`).
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks#37
No description provided.