feat(janitor): periodic reconciler — detect stuck issues/PRs/tasks and self-heal (#269) #273

Merged
charles merged 4 commits from feat/269-janitor into main 2026-04-23 20:33:53 +00:00
Owner

Closes #269

What

New apps/server/src/janitor.ts module with 7 rules that run every 10 min (configurable). Default mode is dry-run — it logs what it would do without touching anything. Flip to auto-heal once you've reviewed a week of dry-run logs.

Rules

Rule Detects Action
closes_keyword_drift PR merged with Closes #N but issue N still open Close + comment
design_approved_not_closed design-reviewer posted **Verdict**: APPROVED but issue still open Close + comment
label_dispatch_miss area:design / area:design-review / area:dashboard label with no dispatch in 2h Bounce label
dependent_unblocked All blockers closed but propagator never fired (service restart mid-webhook) Bounce assignee
zero_output_success Task status=success with 0 turns or null cost Flag only
stale_idle_assigned Assigned, no PR, no blockers, idle > 30 min Comment + bounce assignee
ready_to_merge PR mergeable + CI green + APPROVED + no REQUEST_CHANGES Flag only

Safety rails

  • Idempotency: same (rule, repo, target) triple acted on at most once per 6h
  • Kill-switch: janitor.enabled: false in config/agents.json
  • Dry-run default: logs actions without performing them until mode: "auto-heal" is set
  • Rule allowlist: janitor.rules: ["only_this_rule"] disables all others

Config (add to config/agents.json)

{
  "janitor": {
    "enabled": true,
    "interval_ms": 600000,
    "mode": "dry-run",
    "rules": []
  }
}

New endpoint

GET /janitor/history — last 100 actions from the ring buffer.

SSE

Each action emits { type: "janitor_action", rule, action, repo, target, details, dry_run, ts } on the /events stream.

Supporting changes

  • forgejo-api: patchIssue, listRepoLabels, addIssueLabels, removeIssueLabel, listClosedPullRequests; extended PullRequestDetail with merged/merged_at; made IssueListEntry.labels[].id optional
  • task-store: hasRecentDispatch, listZeroOutputSuccesses
  • webhook-config: parse janitor: block into WebhookConfig.janitor
  • @claude-hooks/shared: JanitorAction, JanitorReport, JanitorHistoryResponse types

Tests

42 unit tests in janitor.test.ts — one "fires" + one "no-op" + dry-run guard per rule, plus idempotency, ring buffer, and broadcast cross-cutting tests. All green, zero Forgejo HTTP calls.

Closes #269 ## What New `apps/server/src/janitor.ts` module with 7 rules that run every 10 min (configurable). Default mode is `dry-run` — it logs what it _would_ do without touching anything. Flip to `auto-heal` once you've reviewed a week of dry-run logs. ### Rules | Rule | Detects | Action | |---|---|---| | `closes_keyword_drift` | PR merged with `Closes #N` but issue N still open | Close + comment | | `design_approved_not_closed` | design-reviewer posted `**Verdict**: APPROVED` but issue still open | Close + comment | | `label_dispatch_miss` | `area:design` / `area:design-review` / `area:dashboard` label with no dispatch in 2h | Bounce label | | `dependent_unblocked` | All blockers closed but propagator never fired (service restart mid-webhook) | Bounce assignee | | `zero_output_success` | Task `status=success` with 0 turns or null cost | Flag only | | `stale_idle_assigned` | Assigned, no PR, no blockers, idle > 30 min | Comment + bounce assignee | | `ready_to_merge` | PR mergeable + CI green + APPROVED + no REQUEST_CHANGES | Flag only | ### Safety rails - **Idempotency**: same `(rule, repo, target)` triple acted on at most once per 6h - **Kill-switch**: `janitor.enabled: false` in `config/agents.json` - **Dry-run default**: logs actions without performing them until `mode: "auto-heal"` is set - **Rule allowlist**: `janitor.rules: ["only_this_rule"]` disables all others ### Config (add to `config/agents.json`) ```jsonc { "janitor": { "enabled": true, "interval_ms": 600000, "mode": "dry-run", "rules": [] } } ``` ### New endpoint `GET /janitor/history` — last 100 actions from the ring buffer. ### SSE Each action emits `{ type: "janitor_action", rule, action, repo, target, details, dry_run, ts }` on the `/events` stream. ## Supporting changes - **`forgejo-api`**: `patchIssue`, `listRepoLabels`, `addIssueLabels`, `removeIssueLabel`, `listClosedPullRequests`; extended `PullRequestDetail` with `merged`/`merged_at`; made `IssueListEntry.labels[].id` optional - **`task-store`**: `hasRecentDispatch`, `listZeroOutputSuccesses` - **`webhook-config`**: parse `janitor:` block into `WebhookConfig.janitor` - **`@claude-hooks/shared`**: `JanitorAction`, `JanitorReport`, `JanitorHistoryResponse` types ## Tests 42 unit tests in `janitor.test.ts` — one "fires" + one "no-op" + dry-run guard per rule, plus idempotency, ring buffer, and broadcast cross-cutting tests. All green, zero Forgejo HTTP calls.
feat(janitor): periodic reconciler — detect stuck issues/PRs/tasks and self-heal (#269)
Some checks failed
qa / qa (pull_request) Failing after 5m29s
qa / dockerfile (pull_request) Successful in 11s
5de6ba05f5
New module apps/server/src/janitor.ts with 7 rules:

- closes_keyword_drift: close issues referenced by Closes/Fixes/Resolves
  in a recently merged PR that Forgejo didn't auto-close
- design_approved_not_closed: close issues where design-reviewer posted
  '**Verdict**: APPROVED' but the skill missed the close step
- label_dispatch_miss: bounce area:design / area:design-review /
  area:dashboard routing labels that have no dispatch in the last 2h
- dependent_unblocked: re-fire assignee bounce for issues whose
  blockers are all closed but the propagator never ran
- zero_output_success: flag success tasks with 0 turns / null cost
- stale_idle_assigned: comment + bounce assignee on issues idle >30 min
- ready_to_merge: flag PRs that are mergeable + CI green + APPROVED

Config block janitor: { enabled, interval_ms, mode, rules } in
config/agents.json. Default mode is dry-run. GET /janitor/history
returns last 100 actions. Each action emits a janitor_action SSE
envelope. Per-rule per-target 6h idempotency guard.

Supporting changes:
- forgejo-api: patchIssue, listRepoLabels, addIssueLabels,
  removeIssueLabel, listClosedPullRequests, extend PullRequestDetail
  with merged/merged_at, make IssueListEntry labels id optional
- task-store: hasRecentDispatch, listZeroOutputSuccesses
- webhook-config: parse janitor: block
- shared: JanitorAction, JanitorReport, JanitorHistoryResponse types

42 unit tests, all green.
refactor: s11
Some checks failed
qa / qa (pull_request) Failing after 3m52s
qa / dockerfile (pull_request) Successful in 8s
cca21080f8
CLAUDE.md was 55.2k chars, past the 40k perf threshold. Keep the
navigation essentials at the root (overview, roles, modules table,
API table, tech stack, conventions, commands) and move detail
sections to per-topic files under docs/ linked from the root index.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fix(janitor): replace mock.module with impl seam; fix test-isolation leaks
All checks were successful
qa / qa (pull_request) Successful in 4m11s
qa / dockerfile (pull_request) Successful in 6s
97e9db3e8a
The s11 refactor moved dispatchByType / getWorker / broadcastSSE out of
main.ts into registry.ts / sse.ts. Several test files updated the module
specifier in their mock.module() calls, but three related leaks made
20 sibling-file tests fail in the full bun test run:

1. janitor.test.ts used mock.module("./forgejo-api") and ("./task-analytics")
   to stub its deps. Bun's mock.module is process-global — those stubs
   persisted through every later test file and broke deps.test.ts,
   webhook-handlers.test.ts pivot tests, propagateDependencyClosure, etc.
   Fix: add `impl` test seam to janitor.ts (same pattern as cleanup.ts)
   so the test mutates impl.* instead of touching module registry.

2. webhook-post-merge.test.ts was still mocking "./main" for dispatchByType
   (refactor missed this one) and also mocked ./main's broadcastSSE, both
   of which now live in ./registry / ./sse. Fix: point mock.module at the
   correct modules; drop the unused getWorker override (not exercised by
   handlePostMergeRebase).

3. webhook-assign-dedup / webhook-ci / force-merge.integration all mocked
   ./registry with both dispatchByType AND getWorker. getWorker is only
   actually exercised by webhook-assign-dedup's assign-dedup tests; the
   other two carried stale overrides that corrupted getWorker for every
   later test file (main.test.ts POST /cancel + POST /reset). Fix: drop
   the getWorker override where unused; in webhook-assign-dedup, keep the
   stub but fall back to the real function for names the test doesn't
   track, so main.test.ts's registered workers still resolve.

Plus: fix the stale mock.module("./main") → ("./registry") in
webhook-handlers.test.ts and refresh an outdated comment in main.test.ts.

Server test suite now: 990 pass / 4 fail — and the 4 remaining failures
(session JSONL pruning + foreman session CRUD) are pre-existing on main,
out of scope for this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Group 42 flat modules into 6 responsibility-scoped folders to prepare for
hexagonal adapter extraction. No behavior change — pure file moves plus
import-path updates, mock.module specifier fixes, and 5 import.meta.dir
adjustments for files that moved deeper in the tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sketch the hexagonal port interfaces that domain code will consume instead
of forgejo-api.ts and @anthropic-ai/claude-agent-sdk. Interface-only
scaffolding — no adapters, no imports updated yet.

ForgePort covers 22 methods across issue/PR/comment/review/dep/CI/label/file
buckets with domain-pure shapes (ForgeIssue, ForgePullRequest, …).
ClaudeAgentPort exposes a single runTask() returning an AsyncIterable of
TaskEvents; SDKMessage never leaks across the boundary (escape hatch via
raw: unknown on each turn).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Concrete adapter that wraps forgejo-api.ts and maps Forgejo's raw REST
shapes into the domain-pure types defined in forgejo-port.ts.

Token is bound at construction — build one adapter per agent-type from
the token resolved by webhook-config.ts.

No consumers migrated in this commit; forgejo-api.ts remains the path of
record for the 10 existing importers. Consumer migrations will land as
separate per-module commits so each one is reviewable in isolation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch three low-surface consumers from forgejo-api.ts direct calls to
ForgejoAdapter. Each call site now goes through the port, which means the
domain logic reads ForgeReview / ForgeIssue shapes instead of raw Forgejo
JSON.

review-loop.ts: guards now receive ForgeReview[] with lowercase state
enums (approved, changes_requested, comment, …). Tests updated to use
the domain shape when calling the pure counters directly; fetch-spy
integration tests untouched since the adapter still hits the same HTTP
paths.

slash-commands.ts: /raise-cap ack and miss comments go through the port.

main.ts: /redispatch probe issue fetch uses the port. Drops five unused
re-exports (createIssueComment, deleteReviewRequest, getPullRequest,
requestReviewers, updateIssueAssignees) that were imported but never
called.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Port additions:
- ForgeDirEntry shape for listDir (name, path, type, size, sha)
- htmlUrl field on ForgeIssue (needed by createIssue → breakdown summary link)
- readFile returns { content, sha } so callers can chain writeFile
- writeFile takes optional sha for update-vs-create discrimination

foreman.ts migration:
- /foreman/repos, /foreman/files, /foreman/specs/:name save,
  /foreman/breakdown-preview remote fetch, /foreman/create-issues all
  go through ForgejoAdapter
- spec save now surfaces a 502 on adapter failure instead of 500ing
  out of createOrUpdateRepoFile's throw (adapter swallows to bool)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
refactor(server): add SdkClaudeAgent implementing ClaudeAgentPort
All checks were successful
qa / qa (pull_request) Successful in 4m13s
qa / dockerfile (pull_request) Successful in 8s
e311000726
Concrete adapter that wraps @anthropic-ai/claude-agent-sdk's query() and
re-emits each SDKMessage as a domain-pure TaskEvent. SDKMessage is
preserved on event.raw as an escape hatch for callers that still need
SDK internals (e.g. tool_use_summary parsing for PR-URL extraction).

Port additions based on actual SDK surface:
- ResultEvent now carries resultText (needed for PR-URL matching against
  the final result payload that agent-runner consumes today)
- Added ToolUseEvent / ToolProgressEvent / ToolSummaryEvent to cover the
  SDK's tool-related message types — previously TaskEvent only covered
  user / assistant / result / system
- permissionMode values corrected: "bypassPermissions" replaces "bypass"
  to match SDKOptions.permissionMode

No consumers migrated — agent-runner.ts and foreman.ts still call query()
directly. Migration is Phase 5, which requires splitting agent-runner's
multi-responsibility body (container pre-flight + session resume + SDK
call + tool-policy + session persistence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
charles force-pushed feat/269-janitor from e311000726
All checks were successful
qa / qa (pull_request) Successful in 4m13s
qa / dockerfile (pull_request) Successful in 8s
to 97e9db3e8a
All checks were successful
qa / qa (pull_request) Successful in 4m11s
qa / dockerfile (pull_request) Successful in 6s
2026-04-23 20:11:35 +00:00
Compare
charles deleted branch feat/269-janitor 2026-04-23 20:33:54 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks!273
No description provided.