Containers: per-agent runtime + state volumes #19

Closed
opened 2026-04-17 12:43:33 +00:00 by claude-desktop · 0 comments
Collaborator

User story

As an operator, I want each worker (boss, dev, reviewer) to run inside its own long-lived Docker container with an empty $HOME and a dedicated named volume for its cache / worktrees / sessions, so that agents cannot scrape host credentials and their persistent state survives container restarts.

Split out of the original containerisation story (#17).

Acceptance criteria

Runtime model

  • One long-lived container per agent name: claude-hooks-boss, claude-hooks-dev, claude-hooks-reviewer
  • Container $HOME is an empty volume owned by the image's non-root claude user — no .claude.json, no .credentials.json, no .config/claude-hooks inside
  • Per-worker Forgejo token injected as the FORGEJO_ACCESS_TOKEN env var at container start, read on the host from ~/.config/claude-hooks/tokens/<name> and passed via docker run -e. Token files are never bind-mounted into the container.
  • Claude Code OAuth credentials bind-mounted read-only from a dedicated host path (not ~/.claude/.credentials.json) into $CLAUDE_CONFIG_DIR/.credentials.json. Rotating on the host propagates without container restart.
  • Stdio control bridge: claude-hooks (still on host) talks to each container via docker exec (decide vs. Unix-socket bridge and document)
  • Container has no access to the host Docker socket

State persistence

  • Named Docker volume per agent: claude-hooks-<name>-state
  • Volume contains cache/, worktrees/, sessions.json — same layout as today, rooted in the volume
  • Operator can mount the volume read-only elsewhere for inspection / backup without entering the container

claude-hooks integration

  • runAgent spawns the Claude Agent SDK via docker exec targeting the agent's container
  • End-to-end still works: cache clone, worktree acquire, session resume (when #6 is in)
  • If a container is down when a task arrives, claude-hooks fails the task with a clear error (do not auto-start)

Security invariants (testable)

  • Inside a running container: ls -la ~ shows only image-provided files
  • Inside a running container: grep -r "<stand-in-token>" / returns nothing
  • docker inspect confirms no bind-mount references the interactive user's ~/.claude/ or ~/.config/claude-hooks/tokens/

Tests

  • Integration test: spawn container, run a trivial task, assert forbidden files are inaccessible
  • Integration test: kill and restart a container mid-task — worktree + session persist, the in-flight task fails fast and re-enqueues cleanly
  • just qa still passes without the Docker daemon running (host-only unit tests)

Out of scope

  • Image build / publish — #18
  • just recipes, systemd, rolling updates, docs — #20
  • Migrating existing on-host cache/worktree state into volumes — new install starts clean

References

  • Parent tracking issue: #17
  • Security incident: 2026-04-17 dev-agent identity leak (commit bab386a)

Dependencies

  • Blocked by: #18 (image), #6 (sessions store), #7 (sweeper) — stabilise persistent-state design before moving into volumes
  • Blocks: #20 (orchestration)
  • Branch off: main
  • Full graph: #17
## User story As an **operator**, I want each worker (boss, dev, reviewer) to run inside its own long-lived Docker container with an empty `$HOME` and a dedicated named volume for its cache / worktrees / sessions, so that agents cannot scrape host credentials and their persistent state survives container restarts. Split out of the original containerisation story (#17). ## Acceptance criteria ### Runtime model - [ ] One long-lived container per agent name: `claude-hooks-boss`, `claude-hooks-dev`, `claude-hooks-reviewer` - [ ] Container `$HOME` is an empty volume owned by the image's non-root `claude` user — no `.claude.json`, no `.credentials.json`, no `.config/claude-hooks` inside - [ ] Per-worker Forgejo token injected as the `FORGEJO_ACCESS_TOKEN` env var at container start, read on the host from `~/.config/claude-hooks/tokens/<name>` and passed via `docker run -e`. Token files are never bind-mounted into the container. - [ ] Claude Code OAuth credentials bind-mounted **read-only** from a dedicated host path (not `~/.claude/.credentials.json`) into `$CLAUDE_CONFIG_DIR/.credentials.json`. Rotating on the host propagates without container restart. - [ ] Stdio control bridge: claude-hooks (still on host) talks to each container via `docker exec` (decide vs. Unix-socket bridge and document) - [ ] Container has no access to the host Docker socket ### State persistence - [ ] Named Docker volume per agent: `claude-hooks-<name>-state` - [ ] Volume contains `cache/`, `worktrees/`, `sessions.json` — same layout as today, rooted in the volume - [ ] Operator can mount the volume read-only elsewhere for inspection / backup without entering the container ### claude-hooks integration - [ ] `runAgent` spawns the Claude Agent SDK via `docker exec` targeting the agent's container - [ ] End-to-end still works: cache clone, worktree acquire, session resume (when #6 is in) - [ ] If a container is down when a task arrives, claude-hooks fails the task with a clear error (do not auto-start) ### Security invariants (testable) - [ ] Inside a running container: `ls -la ~` shows only image-provided files - [ ] Inside a running container: `grep -r "<stand-in-token>" /` returns nothing - [ ] `docker inspect` confirms no bind-mount references the interactive user's `~/.claude/` or `~/.config/claude-hooks/tokens/` ### Tests - [ ] Integration test: spawn container, run a trivial task, assert forbidden files are inaccessible - [ ] Integration test: kill and restart a container mid-task — worktree + session persist, the in-flight task fails fast and re-enqueues cleanly - [ ] `just qa` still passes without the Docker daemon running (host-only unit tests) ## Out of scope - Image build / publish — #18 - `just` recipes, systemd, rolling updates, docs — #20 - Migrating existing on-host cache/worktree state into volumes — new install starts clean ## References - Parent tracking issue: #17 - Security incident: 2026-04-17 dev-agent identity leak (commit `bab386a`) ## Dependencies - **Blocked by:** #18 (image), #6 (sessions store), #7 (sweeper) — stabilise persistent-state design before moving into volumes - **Blocks:** #20 (orchestration) - **Branch off:** `main` - **Full graph:** #17
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks#19
No description provided.