fix(agents): re-sync worker config.type after rename — closes #711 #723

Merged
charles merged 1 commit from boss/711 into main 2026-05-02 07:43:25 +00:00
Collaborator

Summary

Closes #711.

POST /agents/types/{old}/rename updates the DB agents.type column + rewrites agents.json + reloads config.types, but each in-memory Worker was constructed at boot with config.type = oldName. Without a re-sync, pool.getWorkersByType(workers, newName) returns an empty list (every worker still reports the old type), so dispatchByType(newName, …) silently drops every webhook that targets the freshly-renamed type — the agent goes idle on the dashboard until the next service restart.

Two-line fix in the existing post-commit side-effects loop: walk listAgentsByType(newName) and update each registered Worker's config.type in place, alongside the existing enqueueRender call.

Why not unregister + re-register

  • agents.name is NOT touched by this rename — only the type column moves. The worker registry is keyed by agents.name, so the lookup key stays valid; only the cached config.type field on the existing Worker is stale.
  • Mutating in place preserves currentTask, the queue, the abort signal, and the lifecycle hooks. An in-flight task continues to drain through the same Worker instance under the new type.
  • Container name (claude-hooks-<name>) and CLAUDE_CONFIG_DIR (~/.config/claude-hooks/agent-env/<name>) are both keyed by name, so no reconcile is needed for the type-only rename.

The bug we observed in the wild — boss-default workers running while DB carries code-lead-default rows — comes from a separate path (the boot-time migrateForLegacyTypeRenames rewriting DB names + types in one pass). That path's worker-registry parity relies on the next service restart re-reading the DB, which is the documented operator workflow per docs/shutdown.md. This PR scopes the runtime endpoint only.

Tests

Two new cases in agent-type-rename.test.ts:

  • re-syncs every live worker's config.type to the new name (#711) — registers three senior-* workers, asserts getWorkersByType("senior") returns them pre-rename, calls the rename endpoint, asserts getWorkersByType("tech-lead") returns them post-rename and getWorkersByType("senior") is empty.
  • preserves currentTask + queue depth across the rename (#711) — stashes a fake currentTask + a queued entry on senior-default, runs the rename, asserts both survive and config.type flipped.

Plus the existing 13 tests on the rename endpoint continue to pass — 15/15 in bun test src/http/handlers/agent-type-rename.test.ts. Server suite: 3067 / 3070 (the 3 remaining failures are the pre-existing bun 1.3.11 fs/promises.utimes int32 overflow in sweeper.test.ts, unrelated and tracked separately).

Typecheck + biome clean.

Out of scope

  • Container reconcile post-rename — names don't change, mount keys stay valid.
  • Re-registering workers when agents.name changes — that path lives in the boot-time migrateForLegacyTypeRenames migration and is already handled by the next-restart re-read.
  • The 3 pre-existing sweeper.test.ts flakes (bun upstream).

🤖 Generated with Claude Code

## Summary Closes #711. `POST /agents/types/{old}/rename` updates the DB `agents.type` column + rewrites `agents.json` + reloads `config.types`, but each in-memory `Worker` was constructed at boot with `config.type = oldName`. Without a re-sync, `pool.getWorkersByType(workers, newName)` returns an empty list (every worker still reports the old type), so `dispatchByType(newName, …)` silently drops every webhook that targets the freshly-renamed type — the agent goes idle on the dashboard until the next service restart. Two-line fix in the existing post-commit side-effects loop: walk `listAgentsByType(newName)` and update each registered `Worker`'s `config.type` in place, alongside the existing `enqueueRender` call. ## Why not unregister + re-register - `agents.name` is NOT touched by this rename — only the `type` column moves. The worker registry is keyed by `agents.name`, so the lookup key stays valid; only the cached `config.type` field on the existing `Worker` is stale. - Mutating in place preserves `currentTask`, the queue, the abort signal, and the lifecycle hooks. An in-flight task continues to drain through the same `Worker` instance under the new type. - Container name (`claude-hooks-<name>`) and `CLAUDE_CONFIG_DIR` (`~/.config/claude-hooks/agent-env/<name>`) are both keyed by `name`, so no reconcile is needed for the `type`-only rename. The bug we observed in the wild — `boss-default` workers running while DB carries `code-lead-default` rows — comes from a separate path (the boot-time `migrateForLegacyTypeRenames` rewriting DB names + types in one pass). That path's worker-registry parity relies on the next service restart re-reading the DB, which is the documented operator workflow per `docs/shutdown.md`. This PR scopes the runtime endpoint only. ## Tests Two new cases in `agent-type-rename.test.ts`: - **`re-syncs every live worker's config.type to the new name (#711)`** — registers three `senior-*` workers, asserts `getWorkersByType("senior")` returns them pre-rename, calls the rename endpoint, asserts `getWorkersByType("tech-lead")` returns them post-rename and `getWorkersByType("senior")` is empty. - **`preserves currentTask + queue depth across the rename (#711)`** — stashes a fake `currentTask` + a queued entry on `senior-default`, runs the rename, asserts both survive and `config.type` flipped. Plus the existing 13 tests on the rename endpoint continue to pass — 15/15 in `bun test src/http/handlers/agent-type-rename.test.ts`. Server suite: 3067 / 3070 (the 3 remaining failures are the pre-existing bun 1.3.11 `fs/promises.utimes` int32 overflow in `sweeper.test.ts`, unrelated and tracked separately). Typecheck + biome clean. ## Out of scope - Container reconcile post-rename — names don't change, mount keys stay valid. - Re-registering workers when `agents.name` changes — that path lives in the boot-time `migrateForLegacyTypeRenames` migration and is already handled by the next-restart re-read. - The 3 pre-existing `sweeper.test.ts` flakes (bun upstream). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
fix(agents): re-sync worker config.type after rename — closes #711
All checks were successful
qa / dockerfile (pull_request) Successful in 7s
qa / qa (pull_request) Successful in 3m44s
8b9fc67718
`POST /agents/types/{old}/rename` updates the DB `agents.type` column
and reloads `config.types`, but each in-memory `Worker` was constructed
at boot with `config.type = oldName`. Without a re-sync,
`pool.getWorkersByType(workers, newName)` returns an empty list (every
worker still reports the old type), so `dispatchByType(newName, …)`
silently drops every webhook that targets the freshly-renamed type —
the agent goes idle on the dashboard until the next service restart.

Walk `listAgentsByType(newName)` post-commit and update each registered
worker's `config.type` in place, alongside the existing
`enqueueRender` loop. Worker names are not touched (the rename only
changes the `type` column on `agents`); `currentTask` + queue depth
are preserved verbatim, so an in-flight task continues draining
through the freshly re-synced worker.

Two new tests pin the behaviour:
- post-rename `getWorker(name).config.type` flips to the new name and
  `getWorkersByType(newName)` finds every renamed instance,
- a worker carrying a fake `currentTask` + queue entry through the
  rename retains both intact.

Closes #711

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
reviewer approved these changes 2026-05-02 01:38:52 +00:00
reviewer left a comment

Minimal, correct fix. getWorker(row.name) + w.config.type = newName piggybacks on the existing listAgentsByType(newName) loop at the right point (post-DB-commit), so no inconsistency window. Two tests cover the re-sync invariant and non-destructive behaviour (currentTask + queue depth preserved). CI green.

Minimal, correct fix. `getWorker(row.name)` + `w.config.type = newName` piggybacks on the existing `listAgentsByType(newName)` loop at the right point (post-DB-commit), so no inconsistency window. Two tests cover the re-sync invariant and non-destructive behaviour (currentTask + queue depth preserved). CI green.
charles deleted branch boss/711 2026-05-02 07:43:26 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks!723
No description provided.