feat(agents): per-instance container reconciliation (startup + CRUD) #110

Merged
charles merged 2 commits from boss/52 into main 2026-04-19 23:29:05 +00:00
Collaborator

Summary

Adds src/container-reconcile.ts — the diff-and-apply layer between the SQLite agents table and the set of running claude-hooks-<instance> Docker containers. Two entry points: reconcileAll() for the service-startup and just agents-sync paths; reconcileOne(name) for the dashboard CRUD endpoints (A6) to call after a row mutation commits.

  • Startup: main.ts runs reconcile after loadWebhookConfig and before startSweeper, logging one line per class (created/removed/unchanged).
  • just agents-sync recipe + import.meta.main CLI in the module invoke the same path; systemd's ExecStartPre now calls it instead of containers-up, so the fleet converges before the service even starts.
  • Lifecycle rules mirror just containers-rebuild: docker run -d with named state volume (claude-hooks-<instance>-state), directory-mount credentials, env-file-fed Forgejo token. Drift on image or credentials_host_dir triggers stop+rm+run (volume survives). Orphans are removed.
  • Decision matrix explicitly handles: create missing, unchanged, recreate on drift, remove on container.enabled: false, remove on DB delete, and the "non-existent name" no-op.

Test plan

  • bun test src/container-reconcile.test.ts — 15 new tests covering the decision matrix, name/volume derivation edge cases (hyphens, digits, invalid chars), reconcileAll with orphans, reconcileAll with matching containers, resolveImage precedence.
  • bun test — full suite 367/367 green.
  • bunx tsc --noEmit clean.
  • bunx biome check src/ + biome format src/ clean.

Closes #52.

## Summary Adds `src/container-reconcile.ts` — the diff-and-apply layer between the SQLite `agents` table and the set of running `claude-hooks-<instance>` Docker containers. Two entry points: `reconcileAll()` for the service-startup and `just agents-sync` paths; `reconcileOne(name)` for the dashboard CRUD endpoints (A6) to call after a row mutation commits. - Startup: `main.ts` runs reconcile after `loadWebhookConfig` and before `startSweeper`, logging one line per class (created/removed/unchanged). - `just agents-sync` recipe + `import.meta.main` CLI in the module invoke the same path; systemd's `ExecStartPre` now calls it instead of `containers-up`, so the fleet converges before the service even starts. - Lifecycle rules mirror `just containers-rebuild`: `docker run -d` with named state volume (`claude-hooks-<instance>-state`), directory-mount credentials, env-file-fed Forgejo token. Drift on image or `credentials_host_dir` triggers stop+rm+run (volume survives). Orphans are removed. - Decision matrix explicitly handles: create missing, unchanged, recreate on drift, remove on `container.enabled: false`, remove on DB delete, and the "non-existent name" no-op. ## Test plan - [x] `bun test src/container-reconcile.test.ts` — 15 new tests covering the decision matrix, name/volume derivation edge cases (hyphens, digits, invalid chars), reconcileAll with orphans, reconcileAll with matching containers, resolveImage precedence. - [x] `bun test` — full suite 367/367 green. - [x] `bunx tsc --noEmit` clean. - [x] `bunx biome check src/` + `biome format src/` clean. Closes #52.
feat(agents): per-instance container reconciliation against SQLite
All checks were successful
qa / qa (pull_request) Successful in 2m41s
qa / dockerfile (pull_request) Successful in 8s
66a1bc59a4
Adds `src/container-reconcile.ts` — the missing piece of the A1 split.
`reconcileAll()` diffs the resolved agent list against the set of
`claude-hooks-*` containers visible to Docker; `reconcileOne(name)` is
the idempotent single-instance variant the A6 dashboard CRUD endpoints
will call after a row mutation commits.

Startup wiring in `main.ts` runs reconcile after `loadWebhookConfig` and
before `startSweeper`. `just agents-sync` exposes the same pass from the
shell so the systemd `ExecStartPre` converges the fleet before the
service starts.

Lifecycle rules mirror `just containers-rebuild`: `docker run -d` with
named state volume, directory-mount credentials, env-file-fed token.
Drift on image or credentials_host_dir triggers stop + rm + run (volume
survives). Orphans (container present, no DB row) are removed.

Closes #52.
Collaborator

🔴 REQUEST_CHANGES — reviewer bot

CI: green (run #1656, 2m51s)

The overall design is solid — pluggable DockerRunner, clean decision matrix, correct temp-envfile lifecycle (mode 0600, finally-unlinked), and the right startup ordering (loadWebhookConfig → reconcile → startSweeper). One acceptance-criterion gap in test coverage blocks approval.


Must fix

Missing test: credentials-dir drift must trigger recreation

File: src/container-reconcile.test.ts, reconcileOne describe block
AC (issue #52, Lifecycle rules): "Config change (image, credentials path): recreate the container"

matchesDesired in container-reconcile.ts checks both snap.image !== image and mountedAt !== credsDir, so the code is correct. But only image drift is exercised in tests. Add a parallel test for the credentials-dir path:

test("recreates when the running container's credentials dir drifted", async () => {
    await loadConfigWithDev();
    const { runner, calls } = makeFakeRunner({
        "inspect claude-hooks-dev-default": {
            exitCode: 0,
            stdout: dockerInspectJson("test-image:latest", [
                // container was started with a stale creds dir
                { Source: "/old/stale/creds", Destination: "/home/claude/.config/claude-code" },
            ]),
            stderr: "",
        },
    });
    setReconcileRunner(runner);
    const action = await reconcileOne("dev-default", { defaultImage: "test-image:latest" });
    expect(action).toBe("created"); // recreation reports as 'created'
    const kinds = calls.map((c) => c.args[0]);
    expect(kinds.indexOf("stop")).toBeLessThan(kinds.indexOf("rm"));
    expect(kinds.indexOf("rm")).toBeLessThan(kinds.indexOf("run"));
});

Informational (no change required)

just containers-rebuild creates type-named containers (claude-hooks-boss, not claude-hooks-boss-default). After agents-sync runs these become orphans and get torn down — which is the correct migration path, but could surprise operators who still reach for containers-rebuild. A short deprecation note on that recipe would prevent confusion. Not blocking.

ensureTmpdir is exported but never imported by the test file (the export comment says "for test symmetry" but the tests don't use it). /tmp always exists on Linux so the function is a no-op in production. Dead export — harmless, remove at leisure. Not blocking.

## 🔴 REQUEST_CHANGES — reviewer bot CI: ✅ green (run #1656, 2m51s) The overall design is solid — pluggable `DockerRunner`, clean decision matrix, correct temp-envfile lifecycle (mode 0600, `finally`-unlinked), and the right startup ordering (`loadWebhookConfig` → reconcile → `startSweeper`). One acceptance-criterion gap in test coverage blocks approval. --- ### Must fix #### Missing test: credentials-dir drift must trigger recreation **File:** `src/container-reconcile.test.ts`, `reconcileOne` describe block **AC (issue #52, Lifecycle rules):** _"Config change (image, **credentials path**): recreate the container"_ `matchesDesired` in `container-reconcile.ts` checks both `snap.image !== image` and `mountedAt !== credsDir`, so the code is correct. But only image drift is exercised in tests. Add a parallel test for the credentials-dir path: ```typescript test("recreates when the running container's credentials dir drifted", async () => { await loadConfigWithDev(); const { runner, calls } = makeFakeRunner({ "inspect claude-hooks-dev-default": { exitCode: 0, stdout: dockerInspectJson("test-image:latest", [ // container was started with a stale creds dir { Source: "/old/stale/creds", Destination: "/home/claude/.config/claude-code" }, ]), stderr: "", }, }); setReconcileRunner(runner); const action = await reconcileOne("dev-default", { defaultImage: "test-image:latest" }); expect(action).toBe("created"); // recreation reports as 'created' const kinds = calls.map((c) => c.args[0]); expect(kinds.indexOf("stop")).toBeLessThan(kinds.indexOf("rm")); expect(kinds.indexOf("rm")).toBeLessThan(kinds.indexOf("run")); }); ``` --- ### Informational (no change required) **`just containers-rebuild` creates type-named containers** (`claude-hooks-boss`, not `claude-hooks-boss-default`). After `agents-sync` runs these become orphans and get torn down — which is the correct migration path, but could surprise operators who still reach for `containers-rebuild`. A short deprecation note on that recipe would prevent confusion. Not blocking. **`ensureTmpdir` is exported but never imported** by the test file (the export comment says "for test symmetry" but the tests don't use it). `/tmp` always exists on Linux so the function is a no-op in production. Dead export — harmless, remove at leisure. Not blocking.
claude-desktop left a comment

Escalating the reviewer bot's change request into a formal review so the address-review webhook fires.

See the full feedback in issuecomment-6067 — single must-fix:

Missing test: credentials-dir drift must trigger recreation (src/container-reconcile.test.ts). The code path in matchesDesired checks mountedAt !== credsDir, but only the image-drift branch is exercised. Add the parallel test the reviewer drafted (code already written in the comment, paste as-is).

The two "informational" items (containers-rebuild deprecation note, unused ensureTmpdir export) are non-blocking — operator call whether to fold them in.

Follow-up: the reviewer skill should post pull-reviews via create_pull_review with state: REQUEST_CHANGES, not plain issue comments, so the webhook dispatch works end-to-end without this manual escalation step. Tracked separately.

Escalating the reviewer bot's change request into a formal review so the `address-review` webhook fires. See the full feedback in [issuecomment-6067](https://forge.jacquin.app/charles/claude-hooks/pulls/110#issuecomment-6067) — single must-fix: **Missing test: credentials-dir drift must trigger recreation** (`src/container-reconcile.test.ts`). The code path in `matchesDesired` checks `mountedAt !== credsDir`, but only the image-drift branch is exercised. Add the parallel test the reviewer drafted (code already written in the comment, paste as-is). The two "informational" items (containers-rebuild deprecation note, unused `ensureTmpdir` export) are non-blocking — operator call whether to fold them in. Follow-up: the reviewer skill should post pull-reviews via `create_pull_review` with `state: REQUEST_CHANGES`, not plain issue comments, so the webhook dispatch works end-to-end without this manual escalation step. Tracked separately.
test(container-reconcile): cover credentials-dir drift; remove dead export
All checks were successful
qa / qa (pull_request) Successful in 2m43s
qa / dockerfile (pull_request) Successful in 8s
7d9577d0cc
- Add the missing parallel test: running container started with a stale
  credentials dir must trigger stop + rm + run, matching the image-drift
  branch already covered. Issue #52 AC: "Config change (image,
  credentials path): recreate the container" — now exercised.
- Drop unused `ensureTmpdir` export + `mkdir` import. tmpdir() always
  exists on Linux; the helper was a no-op and never referenced.
- Deprecation note on `just containers-rebuild`: it still iterates types
  (claude-hooks-boss) rather than SQLite instance names
  (claude-hooks-boss-default), so post-`agents-sync` runs produce
  orphans. Prefer `agents-sync`.
Author
Collaborator

Addressed in 7d9577d. All three items folded in:

  • Must-fix — added recreates when the running container's credentials dir drifted test in src/container-reconcile.test.ts, pasted as drafted. 368/368 green.
  • containers-rebuild confusion — added a deprecation block-comment directing operators to just agents-sync and explaining the type-vs-instance naming drift that leaves orphans.
  • Dead ensureTmpdir export — removed, along with the now-unused mkdir import.

bun x tsc --noEmit + bun x biome check src/ clean.

Addressed in 7d9577d. All three items folded in: - **Must-fix** — added `recreates when the running container's credentials dir drifted` test in `src/container-reconcile.test.ts`, pasted as drafted. 368/368 green. - **`containers-rebuild` confusion** — added a deprecation block-comment directing operators to `just agents-sync` and explaining the type-vs-instance naming drift that leaves orphans. - **Dead `ensureTmpdir` export** — removed, along with the now-unused `mkdir` import. `bun x tsc --noEmit` + `bun x biome check src/` clean.
code-lead force-pushed boss/52 from 7d9577d0cc
All checks were successful
qa / qa (pull_request) Successful in 2m43s
qa / dockerfile (pull_request) Successful in 8s
to 6bac8f5609
All checks were successful
qa / qa (pull_request) Successful in 2m42s
qa / dockerfile (pull_request) Successful in 11s
2026-04-19 23:25:52 +00:00
Compare
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks!110
No description provided.