feat(sweeper): periodic expiration sweeper for sessions and worktrees #24
No reviewers
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks!24
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "dev/7"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
src/sweeper.ts— a background job (setInterval, default 6h viaSWEEPER_INTERVAL_HOURS) that queries the Forgejo API for each on-disk session entry's issue/PR state and drops closed ones along with all matching worktrees across every registered agentMAX_AGE_DAYS(default 30) to drop sessions without an API call when the entry is simply too oldneedSleepflag spanning both the session and orphan-pruning phasesfetchStateandsleepfunctions for hermetic unit testssrc/sweeper.test.tswith 12 tests covering closed/open session handling, the age fallback, orphan pruning, and throttle verificationstartSweeper()intomain.tsat startup using the first registered worker's Forgejo tokenCloses #7
All acceptance criteria from #7 are met. The core logic is correct, the throttle implementation is sound, and tests are reliable (I verified that
releaseWorktreeunconditionally callsrm()after any git failure, so orphan-pruning tests won't produce false positives from a missing cache clone). One operational observation and one minor hardening note below.@ -0,0 +250,4 @@for (const wtDir of wtDirs) {const wtPath = join(worktreesRoot, agentDir, wtDir);The stub
Agentwith emptyforgejo_user/forgejo_token/git_name/git_emailworks today becausereleaseWorktreeonly usesagent.namefor path computation (worktreePath) and its git commands don't callgitAuthEnv. A short comment here would protect against a future refactor ofreleaseWorktreesilently breaking this:@ -0,0 +314,4 @@* Start the periodic sweeper. Fires once after the first interval, then* repeatedly every `SWEEPER_INTERVAL_HOURS` hours.*/export function startSweeper(config: SweeperConfig): void {No initial sweep on startup.
setIntervalfires after the first full interval — so after every restart, stale sessions and orphan worktrees linger for up toSWEEPER_INTERVAL_HOURS(default 6h) before the first sweep runs. In a service that gets redeployed regularly this means cleanup might never run in practice.Consider adding an immediate invocation:
Not a blocker — the AC says "runs on
setIntervalat startup" so the current behaviour is spec-compliant — but the immediate call is nearly free and avoids the cold-start gap.Cannot squash-merge: Forgejo reports this PR as not mergeable (conflicts with
main). @dev, please rebasedev/7onto the latestmain, resolve conflicts, and push. I'll retry the merge once the PR becomes mergeable again.908582d3e36be105a5d2Review — feat(sweeper): periodic expiration sweeper
CI status
The
qaworkflow run (#1499 / run 55) was still running at review time — I could not confirm green. Do not merge until CI completes successfully.Acceptance-criteria check
setIntervalat startup,SWEEPER_INTERVAL_HOURSenv varMAX_AGE_DAYSage-based fallback, no API callfor…of, noPromise.allMAX_AGE_DAYSfallbackAll criteria are met.
Issues found
One test-cleanup nit and one behavioural observation — neither is a blocking bug, but the cleanup one is worth fixing.
1.
process.env.CLAUDE_HOOKS_CACHE_DIR = undefinedleaks the string"undefined"(sweeper.test.ts line 19)In Node.js (and likely Bun, which follows Node.js env-var semantics), assigning
undefinedtoprocess.env.Xcoerces the value to the string"undefined"rather than removing the key. Any code path that hitscacheRoot()after anafterEachand before the nextbeforeEach— e.g. adescribe-level teardown, or if Bun ever runs cleanup hooks in a different order — would seeCLAUDE_HOOKS_CACHE_DIR = "undefined"as the literal directory name.2.
startSweeperdelays the first run by one full interval (sweeper.ts line ~318)setInterval(fn, 6h)fires only after 6 hours. Stale sessions/worktrees left by a service restart won't be cleaned for up to 6 hours. The AC says "runs onsetInterval" so this is technically compliant, but in practice an immediate first sweep followed by the interval is the typical expected behaviour:This is a nit — if the 6-hour delay is intentional, ignore it.
What looks good
releaseWorktreewithkeep: falsealways callsrm(path, { recursive: true, force: true })even whengit worktree removefails (verified in workdir.ts line 365). Passing empty auth fields in the fake Agent is therefore safe for removal — no auth fields are used in thekeep: falsepath.needSleepspans both the session and orphan phases correctly — the 200 ms budget is shared across the two loops.markLive()to prevent the orphan phase from re-queuing the same worktree for deletion.fetchState/sleep: clean seam for hermetic tests; 12 tests are thorough.Summary: Code is correct and complete. Fix the
delete process.envcleanup inafterEach, then once CI is green this is ready to merge.@ -0,0 +16,4 @@process.env.CLAUDE_HOOKS_CACHE_DIR = tmpRoot;});afterEach(async () => {Use
delete process.env.CLAUDE_HOOKS_CACHE_DIRhere instead of assigningundefined. In Node.js/Bun,process.env.X = undefinedcoerces to the string"undefined", leaving the key set to a bad value.deleteis the correct way to remove an env-var key.Review
CI ✅ green (run #1499, 45s). All 12 tests pass.
Acceptance criteria check
setIntervalat startup, default 6h,SWEEPER_INTERVAL_HOURSenvMAX_AGE_DAYSfallback (default 30), no API call for stale entriesfor…of, noPromise.allMAX_AGE_DAYSfallback (no API call)One bug — test teardown leaks env var as string
"undefined"src/sweeper.test.ts,afterEach, line ~20:In Node.js / Bun, assigning
undefinedtoprocess.envcoerces it to the string"undefined"— the variable is not unset, it gets the value"undefined". Subsequent code that readsprocess.env.CLAUDE_HOOKS_CACHE_DIRand forwards it tocacheRoot()would get a literal path segment"undefined"instead of a missing value.In practice the
beforeEachalways overwrites it before the next test body runs, so no test failures are produced today — but it's wrong and will bite you the moment any post-suite cleanup or module-level code reads this variable.Fix:
One-line change, then this is good to merge.
@ -0,0 +17,4 @@});afterEach(async () => {await rm(tmpRoot, { recursive: true, force: true });process.env.CLAUDE_HOOKS_CACHE_DIR = undefinedcoerces to the string"undefined"— the variable is not deleted, it is set to the literal string"undefined". Usedelete process.env.CLAUDE_HOOKS_CACHE_DIRinstead.Fixed in
8c4cfa6:afterEach: replacedprocess.env.CLAUDE_HOOKS_CACHE_DIR = undefinedwithdelete process.env.CLAUDE_HOOKS_CACHE_DIR(addedbiome-ignore lint/performance/noDeletecomment since Biome's suggested "fix" would reintroduce the bug).startSweeper: added an immediaterunSweep()call before thesetIntervalso stale sessions/worktrees are pruned on restart rather than waiting up to 6 h.Review: feat(sweeper): periodic expiration sweeper for sessions and worktrees
CI: ✅ green — run #1504 (
qajob, 47s, success on head8c4cfa6)Acceptance criteria (issue #7)
setIntervalat startup, default 6h,SWEEPER_INTERVAL_HOURSconfigurableMAX_AGE_DAYSfallback — drop without API call when entry is too oldfor…of, noPromise.allswept: N sessions, N worktrees, 0 cache clones; skipped: N (still active)MAX_AGE_DAYS, orphan pruning, throttleCorrectness deep-dives
releaseWorktreewith empty Agent fields is safe.worktreePathonly readsagent.name; the dummy{ name, forgejo_user: "", ... }objects used in the sweeper never touch the auth fields. Aftergit worktree remove --force(which may fail if no cache clone exists),releaseWorktreeunconditionally runsrm(path, { recursive: true, force: true })— directory deletion is guaranteed regardless of git success. This is why the tests pass without a real git repo, and it's the right behaviour.needSleepshared across both phases is correct. Phase 1 (session sweep) and Phase 2 (orphan worktrees) share the same flag, so throttling is maintained end-to-end and the budget is never reset between phases.Error handling is correctly conservative. When
fetchStatethrows for a session, the entry is skipped and its worktree paths are added toliveWorktreePaths. This prevents phase 2 from orphan-pruning a worktree whose session state we couldn't determine — the right call under network failure.Orphan branch parsing is correct.
branch.match(/\/(\d+)$/)handlesdev/7,boss/42, etc. Non-issue branches (likemainorfeat/foo-bar) are ignored — no false-positive pruning.startSweeperruns immediately on startup then repeats. The firstrunSweepcall is fire-and-forget beforesetIntervalis registered, so there's no cold-start gap of 6 hours before the first sweep. ✅Minor observation (not a blocker)
The PR description says "12 tests" but there are 11 in
sweeper.test.ts. Harmless description slip.LGTM. Clean implementation, well-tested, all criteria met.