Expand webhook test depth — handler logic, CI dedup, session resume #37
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks#37
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
User story
As a maintainer, I want the webhook tests to actually assert what the handlers do — not just "returns 200 without crashing" — so that a future regression in CI dedup, no-workflows routing, or session resume fails a test instead of silently shipping.
Context
Audit 2026-04-18 flagged shallow tests:
webhook.test.tschecks that a handler returns HTTP 200 and bails when no token is configured, but no spy onfetchconfirms whether any API call was (or was not) made. The pass conditions are "didn't crash", not "did the right thing".Complexity without corresponding tests:
prMergeable,dispatchRebaseIfNotMergeable— the synchronize rebase dispatch.requestReviewIfFreshdedup — already-requested vs. verdict-at-head skip paths.handleStatusEventvs.handleActionRunEvent— aggregate green → request review routing.Worker.runAgentsession resume — zero tests on the three-state transition (fresh, resume-success, resume-fail-and-retry).Acceptance criteria
Fetch spy harness
webhook.test.ts(or the split test files if #A landed) gains a thinfetchmock harness: tests can register URL-pattern → response handlers, and assert which URLs were called in what order. Prefer a tiny in-repo helper over a library.expect(fetchSpy.calls).toHaveLength(0)(or.toEqual([...])) assertions so a future regression that adds a stray API call fails.CI gate tests
requestReviewIfFreshtests: (a) no request made if reviewer already inrequested_reviewers; (b) no request if an APPROVED/REQUEST_CHANGES review exists at the head SHA; (c) POST/requested_reviewerscalled exactly once with{reviewers: ["reviewer"]}when neither condition holds; (d) PR author == reviewer → no-op.handleStatusEvent: aggregate success routes torequestReviewIfFresh; aggregate failure routes todispatchFixCi; pending is a no-op.handleActionRunEvent:action_run_failuredispatches fix-ci with the failing workflow_id in the prompt;action_run_successchecks aggregate before requesting review.repoHasWorkflowsreturns true when.forgejo/workflows/has entries, true when.github/workflows/has entries, false when neither does, false on 404/network error.Fallback timer tests
armReviewFallbackschedules a call torequestReviewIfFreshafterREVIEW_FALLBACK_MS;cancelReviewFallbackprevents it. Use fake timers (Bun'svi.useFakeTimers/setSystemTimeequivalent) or manually mocksetTimeout.action_run_*orstatusevent for the head SHA cancels a previously-armed fallback.Session resume tests
runAgent(or its extractedrunAgentTaskif #C landed) is tested across: (a) fresh dispatch (no stored session) → SDK called withoutresume; (b) stored session present → SDK called withresume: <id>; (c) SDK rejects resumed session → session dropped, SDK called a second time without resume, result captured.Coverage bookkeeping
just qapasses including new tests.bun test --coverageif available, otherwise count assertions).Out of scope
References
Dependencies
#A(cleaner seams make mocking easier).#C(extractsrunAgent— simpler to test in isolation).#Amerged to (likelymain).