Agents: per-instance container reconciliation (create/remove on CRUD) #52

Closed
opened 2026-04-18 15:00:54 +00:00 by claude-desktop · 0 comments
Collaborator

User story

As an operator, I want each agent instance to have its own long-lived Docker container and state volume, reconciled against the SQLite agents table on startup and on CRUD mutations, so that creating dev-frontend from the dashboard (A6) spins up claude-hooks-dev-frontend with its own /state volume — and deleting it tears everything down without leaving orphans.

Context

Today three containers are hardcoded in the systemd wrapper: claude-hooks-boss, claude-hooks-dev, claude-hooks-reviewer. With A1 landed, the set of containers is dynamic: one per SQLite agents row, named claude-hooks-<instance-name>.

Reconciliation happens at:

  1. Startup — read SQLite, compare with docker ps, create missing, stop+remove extras.
  2. CRUD mutation (from A6) — the HTTP handler calls the same reconcile routine for the affected instance only.

V1 scope: reconcile via shelling out to docker run / docker rm. Later we can evaluate Docker's Go / SDK if the surface grows.

Acceptance criteria

Container naming & volume

  • Container: claude-hooks-<instance-name>.
  • Volume: claude-hooks-<instance-name>-state (mounted at /state in-container).
  • Token env-file: ~/.config/claude-hooks/tokens/<type>shared across instances of the same type per the milestone's non-goal on per-agent tokens.
  • Credentials bind-mount: same path for all instances of a type (reads from the type default in agents.json).

Reconciliation

  • New module src/container-reconcile.ts (or extend src/container.ts):
    • reconcileAll(): Promise<{created: string[], removed: string[], unchanged: string[]}> — compares SQLite agents ↔ running containers.
    • reconcileOne(name): Promise<"created" | "removed" | "unchanged"> — idempotent for a single instance.
  • Startup wiring: call reconcileAll() after loadWebhookConfig(), before startSweeper(). Log the outcome as one line per class (created / removed / unchanged).
  • CRUD hook: A6's create/delete endpoints call reconcileOne(name) after the SQLite mutation commits. A6 does not call docker directly; it always goes through this module.

Lifecycle rules

  • Create: docker run -d --name <container> --restart unless-stopped -v <volume>:/state -v <creds>:/home/claude/.config/claude-code/.credentials.json:ro --env-file <tmpenv> <image>. Same as today's just containers-rebuild path.
  • Remove: docker stop (15s grace) then docker rm. The state volume persists unless the operator passes --wipe via a separate CLI command (not in scope).
  • Config change (image, credentials path): recreate the container; volume survives.

Systemd + justfile

  • Update the systemd unit's ExecStartPre from just containers-up to just agents-sync (new recipe) that invokes the same reconcileAll path the service runs on startup — so pre-service-start has a consistent view.
  • Document in README.md: creating a new instance from the dashboard triggers reconcileOne automatically; just agents-sync is the manual fallback.

Tests

  • reconcileAll with fake docker runner: added row → create called; missing row for running container → remove called; matching → no-op.
  • reconcileOne with a non-existent name → returns "unchanged", no docker calls.
  • Volume name derivation + container name derivation from instance name (edge cases: hyphens, digits, max length).

Out of scope

  • Container image pull on reconcile (the just containers-rebuild recipe handles that separately; reconcile assumes image is local or pullable).
  • Resource limits (cpu / memory) per instance — v1 is unconstrained, matches today.
  • Container health monitoring / restart policy beyond unless-stopped.
  • Wiping the state volume when deleting an instance — explicit operator action via a just wipe-agent <name> recipe, separate story if we decide we need it.

References

  • Tracking issue: #47.
  • Current container lifecycle: justfile recipes containers-up, containers-down, containers-rebuild.
  • Container path conventions: src/container.ts.

Dependencies

  • Blocked by: A1.
  • Blocks: A6 (CRUD triggers reconcile).
  • Branch off: main (after A1 lands).
## User story As an **operator**, I want each agent instance to have its own long-lived Docker container and state volume, reconciled against the SQLite `agents` table on startup and on CRUD mutations, so that creating `dev-frontend` from the dashboard (A6) spins up `claude-hooks-dev-frontend` with its own `/state` volume — and deleting it tears everything down without leaving orphans. ## Context Today three containers are hardcoded in the systemd wrapper: `claude-hooks-boss`, `claude-hooks-dev`, `claude-hooks-reviewer`. With A1 landed, the set of containers is dynamic: one per SQLite `agents` row, named `claude-hooks-<instance-name>`. Reconciliation happens at: 1. **Startup** — read SQLite, compare with `docker ps`, create missing, stop+remove extras. 2. **CRUD mutation** (from A6) — the HTTP handler calls the same reconcile routine for the affected instance only. V1 scope: reconcile via shelling out to `docker run` / `docker rm`. Later we can evaluate Docker's Go / SDK if the surface grows. ## Acceptance criteria ### Container naming & volume - [ ] Container: `claude-hooks-<instance-name>`. - [ ] Volume: `claude-hooks-<instance-name>-state` (mounted at `/state` in-container). - [ ] Token env-file: `~/.config/claude-hooks/tokens/<type>` — **shared across instances of the same type** per the milestone's non-goal on per-agent tokens. - [ ] Credentials bind-mount: same path for all instances of a type (reads from the type default in `agents.json`). ### Reconciliation - [ ] New module `src/container-reconcile.ts` (or extend `src/container.ts`): - `reconcileAll(): Promise<{created: string[], removed: string[], unchanged: string[]}>` — compares SQLite agents ↔ running containers. - `reconcileOne(name): Promise<"created" | "removed" | "unchanged">` — idempotent for a single instance. - [ ] Startup wiring: call `reconcileAll()` after `loadWebhookConfig()`, before `startSweeper()`. Log the outcome as one line per class (created / removed / unchanged). - [ ] CRUD hook: A6's create/delete endpoints call `reconcileOne(name)` after the SQLite mutation commits. A6 does not call `docker` directly; it always goes through this module. ### Lifecycle rules - [ ] Create: `docker run -d --name <container> --restart unless-stopped -v <volume>:/state -v <creds>:/home/claude/.config/claude-code/.credentials.json:ro --env-file <tmpenv> <image>`. Same as today's `just containers-rebuild` path. - [ ] Remove: `docker stop` (15s grace) then `docker rm`. The state volume persists unless the operator passes `--wipe` via a separate CLI command (not in scope). - [ ] Config change (image, credentials path): recreate the container; volume survives. ### Systemd + justfile - [ ] Update the systemd unit's `ExecStartPre` from `just containers-up` to `just agents-sync` (new recipe) that invokes the same reconcileAll path the service runs on startup — so pre-service-start has a consistent view. - [ ] Document in `README.md`: creating a new instance from the dashboard triggers `reconcileOne` automatically; `just agents-sync` is the manual fallback. ### Tests - [ ] `reconcileAll` with fake docker runner: added row → create called; missing row for running container → remove called; matching → no-op. - [ ] `reconcileOne` with a non-existent name → returns `"unchanged"`, no docker calls. - [ ] Volume name derivation + container name derivation from instance name (edge cases: hyphens, digits, max length). ## Out of scope - Container image pull on reconcile (the `just containers-rebuild` recipe handles that separately; reconcile assumes image is local or pullable). - Resource limits (cpu / memory) per instance — v1 is unconstrained, matches today. - Container health monitoring / restart policy beyond `unless-stopped`. - Wiping the state volume when deleting an instance — explicit operator action via a `just wipe-agent <name>` recipe, separate story if we decide we need it. ## References - Tracking issue: #47. - Current container lifecycle: `justfile` recipes `containers-up`, `containers-down`, `containers-rebuild`. - Container path conventions: `src/container.ts`. ## Dependencies - **Blocked by:** A1. - **Blocks:** A6 (CRUD triggers reconcile). - **Branch off:** `main` (after A1 lands).
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks#52
No description provided.