fix(setup): new agent types don't get credentials seeded until operator runs just agent-env-sync manually #257

Closed
opened 2026-04-21 19:55:31 +00:00 by claude-desktop · 0 comments
Collaborator

Bug

When a new agent type lands in config/agents.json (e.g., foreman from #217), the container reconcile (just agents-sync) creates the container/registry, but does not seed .credentials.json into the agent's bind dir. The first dispatch fails with:

Claude Code returned an error result: Not logged in · Please run /login

Only just agent-env-sync or just agent-env-sync <name> fans out the shared creds from ~/.config/claude-hooks/claude-credentials/.credentials.json into each type's container.credentials_host_dir. Operators don't know to run it.

Repro (2026-04-21)

  1. #217 renamed architectforeman. The type's bind dir is ~/.config/claude-hooks/agent-env/foreman/.
  2. Operator created the bind dir (mkdir -p) per the Forgejo-identity runbook I gave.
  3. just agents-sync registered the foreman-default container + worker.
  4. Service started, foreman worker registered.
  5. Operator sent a message in /app/planner.
  6. Task 66f21f3d failed in 1 turn with Not logged in (SDK API error: authentication_failed).
  7. ls ~/.config/claude-hooks/agent-env/foreman/ → no .credentials.json.
  8. Manual just agent-env-sync foreman → seeded the file; next dispatch worked.

Acceptance criteria

Option A (preferred) — agents-sync seeds creds

  • just agents-sync (or the apps/server/src/container-reconcile.ts path it invokes) detects any container.credentials_host_dir that's missing .credentials.json and copies from the shared dir BEFORE the container starts.
  • Skip the copy when the bind dir's .credentials.json is newer than the shared source (mtime-guard parity with agent-env-sync per the 2026-04 token-rotation fix).
  • Log each seeded agent at boot: [agent-env] seeded credentials for <name>.

Option B (fallback, lighter) — loud boot-time warning

  • On service startup, after loadWebhookConfig, walk every type with container.credentials_host_dir and stat its .credentials.json.
  • For any missing / zero-byte file, log a [agent-env] WARNING: <name> has no credentials — run 'just agent-env-sync <name>' AND emit a startup_warning SSE envelope so the dashboard can paint a banner.
  • The agent's first dispatch still fails, but the operator sees why loudly at boot instead of hunting through a task's failure log.

Operator docs

  • Update CLAUDE.md §"Container credentials" with: "Adding a new agent type that declares container.credentials_host_dir: after creating the bind dir, run just agent-env-sync <name> to seed the shared credentials. just agents-sync alone doesn't do this (today)."
  • Optionally fold the call into just setup so fresh clones never hit this.

Verification

  • Unit / integration test: reconcile with a fresh bind dir; assert post-reconcile the file exists (Option A) or a startup warning was emitted (Option B).
  • Manual: remove ~/.config/claude-hooks/agent-env/foreman/.credentials.json, restart the service, send a foreman chat turn — should either work (Option A) or surface the missing-creds warning in the dashboard (Option B).

Out of scope

  • Rotating the shared creds — covered by just agent-env-sync --force per the 2026-04 fix.
  • Adding host-mode types to the reconcile path in general — foreman is already an exception today.

References

  • justfile::agent-env-sync — the recipe that does the fan-out correctly when asked.
  • justfile::agents-sync — the recipe that creates containers but skips cred seeding.
  • apps/server/src/container-reconcile.ts — the code path behind agents-sync.
  • CLAUDE.md §"Container credentials" — current doc that misses this step for new types.
  • 2026-04-21 incident: foreman task 66f21f3d failed with SDK authentication_failed because the bind dir was empty post-#217.
## Bug When a new agent type lands in `config/agents.json` (e.g., `foreman` from #217), the container reconcile (`just agents-sync`) creates the container/registry, but **does not seed `.credentials.json` into the agent's bind dir**. The first dispatch fails with: ``` Claude Code returned an error result: Not logged in · Please run /login ``` Only `just agent-env-sync` or `just agent-env-sync <name>` fans out the shared creds from `~/.config/claude-hooks/claude-credentials/.credentials.json` into each type's `container.credentials_host_dir`. Operators don't know to run it. ## Repro (2026-04-21) 1. `#217` renamed `architect` → `foreman`. The type's bind dir is `~/.config/claude-hooks/agent-env/foreman/`. 2. Operator created the bind dir (`mkdir -p`) per the Forgejo-identity runbook I gave. 3. `just agents-sync` registered the foreman-default container + worker. 4. Service started, foreman worker registered. 5. Operator sent a message in `/app/planner`. 6. Task `66f21f3d` failed in 1 turn with `Not logged in` (SDK API error: `authentication_failed`). 7. `ls ~/.config/claude-hooks/agent-env/foreman/` → no `.credentials.json`. 8. Manual `just agent-env-sync foreman` → seeded the file; next dispatch worked. ## Acceptance criteria ### Option A (preferred) — `agents-sync` seeds creds - [ ] `just agents-sync` (or the `apps/server/src/container-reconcile.ts` path it invokes) detects any `container.credentials_host_dir` that's missing `.credentials.json` and copies from the shared dir BEFORE the container starts. - [ ] Skip the copy when the bind dir's `.credentials.json` is newer than the shared source (mtime-guard parity with `agent-env-sync` per the 2026-04 token-rotation fix). - [ ] Log each seeded agent at boot: `[agent-env] seeded credentials for <name>`. ### Option B (fallback, lighter) — loud boot-time warning - [ ] On service startup, after `loadWebhookConfig`, walk every type with `container.credentials_host_dir` and stat its `.credentials.json`. - [ ] For any missing / zero-byte file, log a `[agent-env] WARNING: <name> has no credentials — run 'just agent-env-sync <name>'` AND emit a `startup_warning` SSE envelope so the dashboard can paint a banner. - [ ] The agent's first dispatch still fails, but the operator sees why loudly at boot instead of hunting through a task's failure log. ### Operator docs - [ ] Update `CLAUDE.md` §"Container credentials" with: *"Adding a new agent type that declares `container.credentials_host_dir`: after creating the bind dir, run `just agent-env-sync <name>` to seed the shared credentials. `just agents-sync` alone doesn't do this (today)."* - [ ] Optionally fold the call into `just setup` so fresh clones never hit this. ### Verification - [ ] Unit / integration test: reconcile with a fresh bind dir; assert post-reconcile the file exists (Option A) or a startup warning was emitted (Option B). - [ ] Manual: remove `~/.config/claude-hooks/agent-env/foreman/.credentials.json`, restart the service, send a foreman chat turn — should either work (Option A) or surface the missing-creds warning in the dashboard (Option B). ## Out of scope - Rotating the shared creds — covered by `just agent-env-sync --force` per the 2026-04 fix. - Adding host-mode types to the reconcile path in general — foreman is already an exception today. ## References - `justfile::agent-env-sync` — the recipe that does the fan-out correctly when asked. - `justfile::agents-sync` — the recipe that creates containers but skips cred seeding. - `apps/server/src/container-reconcile.ts` — the code path behind `agents-sync`. - `CLAUDE.md` §"Container credentials" — current doc that misses this step for new types. - 2026-04-21 incident: foreman task `66f21f3d` failed with SDK `authentication_failed` because the bind dir was empty post-#217.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks#257
No description provided.