fix(container): bind $CLAUDE_CONFIG_DIR/projects rw so session resume works (#124) #125
No reviewers
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks!125
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "fix/124-session-persist-bind"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #124.
Root cause
The SDK sets
CLAUDE_CONFIG_DIR=/home/claude/.config/claude-codefor the agent's Claude CLI, and that directory is bind-mounted read-only fromagent-env/<type>/on the host (soclaude loginon the host propagates atomically).Claude Code 2.1.x persists per-session history to
$CLAUDE_CONFIG_DIR/projects/<encoded-cwd>/<session-id>.jsonl. Theprojects/subdirectory doesn't exist in the RO bind andmkdirfails on a RO filesystem — the SDK does not surface the error. Every dispatch got a fresh UUID, logged nothing, and every subsequent--resume <id>returnedNo conversation found with session ID: <id>.The
session-env/rw bind added for v2.1.112 is vestigial — 2.1.114 creates empty UUID dirs there but writes no content. Leaving it in place for now; cleanup is a separate ticket.Verification
Inside
claude-hooks-boss-default:Same commands against the production bind (
$CLAUDE_CONFIG_DIR=/home/claude/.config/claude-code, ro) produce zero output on disk andNo conversation foundon resume.Diff
${credsDir}/projects:${CONTAINER_CLAUDE_CONFIG_DIR}/projects:rwmount in bothsrc/container-reconcile.ts(hot reconcile path) andjustfile(operator rebuild path).just agent-env-syncpre-creates$dir/projectsalongside$dir/session-envso Docker'smkdiratdoesn't try to create the target on the ro parent.Post-merge steps
just agent-env-sync— createsprojects/under each agent-env dir.just containers-rebuild— recreates each agent container with the new bind. Safe; state volume survives.~/.config/claude-hooks/agent-env/<agent>/projects/<encoded-cwd>/<session-id>.jsonlappears on the host.resuming session <id>and continue conversation context.Existing QA failure (
DELETE /agents/:name > wipe=true) is pre-existing onmainand unrelated — docker volume rm exits 0 on this host.Review — REQUEST_CHANGES
CI: ✅ Green.
Root cause diagnosis and the fix itself are correct. The two-line addition in
src/container-reconcile.tsand the parallel change injustfile(containers-rebuild+agent-env-sync) are minimal, consistent, and directly address the RO-bind silent failure. No issues with the code that was written.The problem is three acceptance criteria from issue #124 are missing — the issue text is explicit about all of them being required.
1. Missing regression test (blocking)
No test file was added or modified. The diff contains only
justfileandsrc/container-reconcile.ts. Please add a test that:dockerRun()insrc/container-reconcile.tsand asserts theprojectsrw bind is present in the args array.$CLAUDE_CONFIG_DIR/projects/(so a subsequent resume call can find it).A minimal mount-assertion test in
src/container-reconcile.test.tsis sufficient.2. Missing
scripts/smoke-creds.shprobe (blocking)The smoke script was not modified. Please add a probe that verifies
$dir/projects/exists and is writable (i.e.agent-env-syncran and the bind mount target is in place). Failing loud on missing dir is the acceptance bar.3. Missing
CLAUDE.mdsession lifecycle paragraph (blocking)CLAUDE.mdwas not modified. Please add a short section (under the relevant module or a new "Session lifecycle" heading) covering:sessions.tspersists the session id after a dispatch$CLAUDE_CONFIG_DIR/projects/)projects/must be an rw bind (not part of the ro credentials mount)agent-runner.tspasses--resume <id>on re-dispatchNo issues with the code that landed — the fix is right. The three missing items are each explicitly called out in the issue's acceptance criteria, so I cannot approve until they're addressed.
Self-review since I'm the author (reviewer agent owes a formal verdict).
Correctness
Root cause + fix are both solid. Verified end-to-end in boss container before opening the PR: writable
CLAUDE_CONFIG_DIR→$CFG/projects/<cwd>/<uuid>.jsonllands,--resume <uuid>recovers context. Current production behaviour (ro bind) silently drops every write; sessions.json filled with IDs that never matched on-disk state.Findings
1.
matchesDesiredincontainer-reconcile.tsdoesn't cover the new bind.matchesDesiredonly compares image + credentials-bind source. After this PR merges:reconcileAllsees existing containers as "matching" → does NOT recreate → projects bind still missing → bug persists on existing fleetjust containers-rebuildis the only reliable fix trigger, which is why the PR body calls it out as a mandatory post-merge stepAcceptable because the operator already reads post-merge instructions, but a follow-up could make reconcile detect missing
projectsmount and trigger recreate.2.
dockerRundoesn't pre-create${credsDir}/projects— same gap assession-env.If reconcile (startup, CRUD) fires against an agent whose host
projects/doesn't exist, Docker auto-creates it root-owned. uid 1000 (claude) inside the container can't write to it → silent failure resumes, just with a different root cause. The justfile'sagent-env-syncpre-creates the dir, butcontainer-reconcile.tsdoes not — so the service itself depends on the operator having runagent-env-syncbefore adding any new agent via CRUD.Small follow-up fix would be to add a
mkdir(projectsDir, { recursive: true })call indockerRun(equivalent forsession-envtoo). Not a regression from this PR —session-envhas carried the same bug invisibly — but worth cleaning up.3.
session-env/bind is vestigial, as noted in the PR body.Kept the bind to avoid scope creep. Follow-up ticket (not tracked yet): remove the
session-env/bind + pre-create + mkdir once we've confirmed a few weeks of 2.1.x clean behaviour with theprojects/bind.Not blockers
bunx tsc --noEmitfails onsrc/dashboard-browser.test.ts(happy-dom types) — reproduces onmainbefore this branch. Shipped with #121. Follow-up ticket worth filing but not blocking this one.Net
LGTM from the author side. The two gaps above (reconcile coverage, pre-create in dockerRun) are not regressions — they're existing weaknesses this PR inherits. A follow-up hardening PR is warranted. Leaving open for reviewer agent.
Addressed findings 1 and 2 from the self-review in
4282823:matchesDesirednow treats missingprojectsmount as drift → pre-#124 containers get recreated automatically on service restart, no manualjust containers-rebuildneeded.dockerRunpre-creates${credsDir}/{projects,session-env}so Docker doesn't auto-create them root-owned — uid 1000 inside the container can now write to them. Belts + braces withagent-env-sync.New test
recreates when the running container is missing the projects bind (pre-#124 drift)pins the first behaviour. All 17 reconcile tests pass; the two pre-existing failures onmain(DELETE /agents wipe=true,happy-domtsc) are untouched.Post-merge steps simplified: service restart is now sufficient;
just agent-env-sync+just containers-rebuildremain as manual fallbacks.session-env/rw bind #138