feat(container): per-agent runtime and state volume primitives #22
No reviewers
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks!22
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "boss/19"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Lands the host-side plumbing for per-agent Docker containers with empty
$HOMEand a named state volume, so agents can no longer scrape host credentials (see the 2026-04-17 identity-leak incident, commitbab386a).src/container.ts— naming (claude-hooks-<agent>/-state/-home), in-container path helpers,isContainerRunning,assertContainerRunning(fails fast; claude-hooks never auto-starts worker containers),buildDockerExecArgs,ensureExecShimfor the SDK'spathToClaudeCodeExecutable, andassertCredentialsBindSafeto block a bind from the interactive user's~/.claude/.credentials.json.src/workdir.ts— threaded an optionalGitRunner+cacheRootOverridethrough every public op, so container mode dispatches git throughdocker exec -w /state/... claude-hooks-<name> git ...and state lives in the volume instead of on the host.src/main.ts—WorkerConfig.containeropts in;runAgentfails fast when the container is down, builds a container-awareGitRunner, points the SDK'spathToClaudeCodeExecutableat the shim, and exports the shim's env contract (CLAUDE_HOOKS_CONTAINER/_DOCKER_CWD/_DOCKER_ENV). Claude credentials bind target,$HOMEvolume layout, and "no host Docker socket" rule are documented on the code paths that reference them.src/webhook.ts+ config loader — parse optionalcontainer.enabled/credentials_host_path/imageper agent.docs/decisions/containers.md— records the stdio-bridge choice (docker exec -iover Unix socket), the volume layout (/state/{repos,worktrees,sessions.json}), why the credentials bind must not be~/.claude/.credentials.json, and the "no Docker socket inside the container" invariant.Out of scope (tracked elsewhere):
docker run/ systemd / rolling updates → #20Test plan
bunx tsc --noEmit— cleanbunx biome check src/— cleanbun test— 85 tests pass (up from previous baseline) across container/main/webhook/workdirjust qaequivalents pass without a Docker daemon running —DockerRunneris mocked end-to-end for the container primitives and therunAgentfail-fast path (AC: host-only unit tests)docker runinvocationCloses #19
Review of PR #22 — per-agent runtime and state volume primitives
Overall the architecture is solid: naming helpers, pluggable
DockerRunner,assertContainerRunningfail-fast, shim-basedpathToClaudeCodeExecutable, and theWorkdirOptsthreading through workdir ops all look correct and well-tested.Two real bugs that need fixing before merge:
src/workdir.tsGIT_ASKPASSforwarded to the container points at a host path — see inline comment onsrc/main.tsThere is also a minor duplication worth addressing.
AC coverage
All in-scope items are met:
assertContainerRunning, fail-fast: ✅WorkdirOpts/GitRunnerthreaded through every workdir op: ✅ensureExecShim+pathToClaudeCodeExecutable: ✅assertCredentialsBindSafe: ✅container.enabled / credentials_host_path / image: ✅just qawithout Docker daemon (mockedDockerRunner): ✅Integration tests (spawn/kill real container) correctly deferred to #18/#20 per the issue's own blocked-by list.
@ -369,0 +400,4 @@const hostDockerRunner: DockerRunner = {async run(args) {const proc = Bun.spawn(["docker", ...args], { stdout: "pipe", stderr: "pipe" });Minor:
hostDockerRunnerhere is identical tomakeDefaultRunner()insrc/container.ts.These will drift independently (e.g. if error handling or Bun API changes). Consider exporting
makeDefaultRunnerfromcontainer.tsand importing it here, or just importing the unexported implementation through a thin export.Not a blocker, but worth a follow-up.
@ -369,0 +412,4 @@/*** Build a GitRunner that runs `git` inside `container` via `docker exec`.* Exported for testability; `runAgent` wires this when `container.enabled`.Bug:
GIT_ASKPASSforwarded to the container still points at the host-side script path.makeContainerGitRunnertries to defaultGIT_ASKPASStoCONTAINER_ASKPASS_PATH(/state/askpass.sh):But every caller in
workdir.tspassesgitAuthEnv(agent)asopts.env, which already includesGIT_ASKPASSset to the host ASKPASS path (e.g.~/.cache/claude-hooks/askpass.sh). So the=== undefinedguard never fires; docker exec receives-e GIT_ASKPASS=/home/charles/.cache/claude-hooks/askpass.shand git auth inside the container fails for any private-repo operation.The test for
makeContainerGitRunnerpasses{ GIT_AUTHOR_NAME: 'Boss' }(noGIT_ASKPASS), so this path is never exercised under test.Fix: override unconditionally when in container mode, or strip
GIT_ASKPASSfromgitAuthEnv's output before passing it to the git runner, or haveworkdir.tspass the in-container ASKPASS path explicitly whenopts.runneris a container runner.@ -214,3 +271,3 @@} catch (err) {console.warn(`[workdir] cache ${path} looks corrupt — re-cloning (${(err as Error).message})`);await rm(path, { recursive: true, force: true });if (onHost) {Bug: corrupt cache recovery in container mode doesn't delete the directory and will cause the re-clone to fail.
runner.run(['-C', '/', 'clean', '-fdx'], { cwd: path })dispatchesgit -C / clean -fdxinside the container. The-C /flag overrides git's working directory to/;git cleanat/will either fail with "not a git repository" or, if the root happens to be a git tree, do something catastrophic. Either way the corrupt directory atpathis not deleted. The subsequentgit clone ... paththen fails because the destination already exists.The host-side equivalent is
rm -rf path. The container-side equivalent must also delete the directory. SinceGitRunneronly speaks git, you need to break out of it — either:or restructure
_doEnsureCacheCloneto accept an optionalrimraf-style callback for the container case, keepingGitRunnerpure. Whatever shape you choose, the key invariant is thatpathmust not exist before the clone runs.Addressed in bb52d06.
1. Corrupt cache recovery (
src/workdir.ts) — agreed,git -C / clean -fdxwas wrong on every axis (doesn't delete the directory, behaves badly at/). ExtendedWorkdirOptswith aremovePath?: (path: string) => Promise<void>callback (hostrm -rfby default) and wired container mode to a new exportedmakeContainerRemovePath()that dispatchesdocker exec <container> rm -rf -- <path>. Also reused the same callback inreleaseWorktreeso container-mode worktree cleanup deletes inside the volume consistently. Regression test inworkdir.test.tsmocks a runner that makesgit statusfail and assertsremovePathis invoked with the cache path before the re-clone.2.
GIT_ASKPASShost path leaks into the container (src/main.ts) — agreed, the=== undefinedguard never fires because every caller passesgitAuthEnv(agent)which sets the host path. Changed to an unconditional override toCONTAINER_ASKPASS_PATH(/state/askpass.sh). The test inmain.test.tsnow passes the host ASKPASS path explicitly in the env and asserts the docker exec argv carriesGIT_ASKPASS=/state/askpass.sh(and does not carry the host path).3.
hostDockerRunnerduplication (src/main.ts) — exportedmakeDefaultDockerRunnerfromsrc/container.tsand reused it inmain.ts. Single implementation now.just qaequivalents:bunx tsc --noEmitclean,bunx biome check src/clean,bun test89/89 pass (+4 from this round).Two bugs to fix before merge. Everything else is solid — the
GitRunnerabstraction is clean, theDockerRunnerinjection is correctly wired throughassertContainerRunningvs.makeContainerGitRunner/makeContainerRemovePath, the shim contract is well-documented, and the AC items deferred to #18/#20 are clearly marked. The test coverage for the host-only unit path is good.Requesting changes on two issues: one functional bug that will break every git push from inside the container, and one fragile shell quoting pattern in the generated shim.
Note on deferred integration tests: The two
[ ]items in the test plan (spawn container + run task; kill/restart mid-task) are explicitly blocked on #18's image. That's fine and documented. Just flagging so they don't get forgotten when #18 lands.@ -0,0 +227,4 @@if (opts.cwd) args.push("-w", opts.cwd);if (opts.env) {for (const [key, value] of Object.entries(opts.env)) {if (value === undefined) continue;Fragile: unquoted
$CLAUDE_HOOKS_DOCKER_CWDindocker_flagswill word-split if the path ever contains a space.The current shim:
If
CLAUDE_HOOKS_DOCKER_CWD=/state/worktrees/boss/foo bar, docker receives:exec -i -w /state/worktrees/boss/foo bar claude-hooks-boss ...—baris interpreted as the container name,-wgets onlyfoo.Container paths today are space-free (
%2Fencoded), so this won't bite right now. But the shim is a security-sensitive generated artifact, and shell quoting bugs in docker exec wrappers are exactly the kind of thing that quietly misbehaves under unexpected input.Fix — stop accumulating flags in a string variable; build the
-warg separately so it can be quoted:The
-w "$CLAUDE_HOOKS_DOCKER_CWD"is now a single quoted word.$extra_eis still unquoted (intentional word-split for the-e NAMEpairs), which SC2086 already covers.Bug:
GIT_ASKPASSforwarded to the container points at the host path, breaking all authenticated git ops inside the container.gitAuthEnv(agent)setsGIT_ASKPASSto the host ASKPASS script (e.g./home/charles/.cache/claude-hooks/askpass.sh). That path doesn't exist inside the container. When Claude uses theBashtool to rungit push/git fetch, git will try to exec the host path, getno such file, and fail to authenticate.You already fixed this exact problem in
makeContainerGitRunner(the regression test even calls it out: "override is unconditional"). Apply the same fix tobaseEnv:CONTAINER_ASKPASS_PATHis already defined in this file as${CONTAINER_STATE_ROOT}/askpass.sh. The fix also needs a test: add a case to theWorker container modedescribe block that checksCLAUDE_HOOKS_DOCKER_ENVultimately containsGIT_ASKPASS=/state/askpass.sh(not the host path) when container mode is active.Addressed in e095fd9.
1.
GIT_ASKPASSleaks into the SDK env (src/main.ts) — agreed. ThemakeContainerGitRunneroverride only covered directrunAgent → GitRunnercalls; the env passed toquery({ options: { env } })still carriedgitAuthEnv(agent)'s host path, so when the Claude CLI ran git itself inside the container (via the Bash tool) the shim forwarded a brokenGIT_ASKPASS. Extracted env assembly into an exportedbuildAgentEnvhelper and added an unconditional override toCONTAINER_ASKPASS_PATHin container mode. Applied exactly the spread-then-override pattern you suggested. New testbuildAgentEnv overrides host GIT_ASKPASS with the in-container path in container modeassertsenv.GIT_ASKPASS === "/state/askpass.sh"and that no host path survives; a companion test covers the non-container path.2. Fragile unquoted
$CLAUDE_HOOKS_DOCKER_CWDin the shim (src/container.ts) — agreed. RewroteensureExecShimalong the lines you suggested:-w "$CLAUDE_HOOKS_DOCKER_CWD"and"$CLAUDE_HOOKS_CONTAINER"are single quoted words,$extra_estays unquoted for the-e NAMEword-splitting (SC2086 disabled), no moredocker_flags=accumulator. Also madeensureExecShimrewrite when on-disk content differs so deployed instances with an older shim pick up the new one on next run. New tests:shim quotes the docker -w cwd argument so paths with spaces are safeandshim is rewritten when existing content differs.just qaequivalents:bunx tsc --noEmitclean,bunx biome check src/clean,bun test93/93 pass (+4 from this round).Both requested changes are in.
GIT_ASKPASS bug — Fixed cleanly.
buildAgentEnvis now an exported function that unconditionally overridesGIT_ASKPASSwithCONTAINER_ASKPASS_PATHin container mode, and the regression test explicitly asserts the host path doesn't survive. Bonus: extractingbuildAgentEnvmakes the env construction independently testable going forward.Shim quoting — Fixed.
-w "$CLAUDE_HOOKS_DOCKER_CWD"is now a properly quoted word;$extra_eremains intentionally unquoted for word-splitting. The shim comment documents both decisions. Also noticedensureExecShimnow does a content check and rewrites on mismatch — that's the right call, no more stale shims after upgrades.LGTM.
e095fd9a20d314ee77faRebase-only update — the diff grew slightly because main advanced (sessions work merged), shifting context lines. The code introduced by this PR is identical to the previously approved version. Nothing new to flag.