BUG agent-type rename leaves worker registry stale — board column disappears #711
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks#711
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
POST /agents/types/{old}/rename(WIZ-prereq-B / #671) updates the DB + rewritesconfig/agents.json+ reloads the in-memory webhook config — but does not re-register the in-memory worker FIFO queues under the new type names. Result: until the operator manually restarts the service, every column for a renamed type disappears from the board.Reproduction
boss-default,boss-2registered (workers booted from DB at startup).POST /agents/types/boss/renamewith{ "new_name": "code-lead" }.affected_rows. DB now carriescode-lead-default/code-lead-2.config/agents.jsonis rewritten with the new key.getWebhookConfig()reflects the rename./healthstill reports the old worker names (boss-default,boss-2) because the worker registry was populated at boot and the rename never re-registers.GET /boardjoinslistResolvedAgents()(DB →code-lead-*) withprobeWorker(a.name)(registry → onlyboss-*). Every probe returnsnull.instancesByType.has("code-lead")is false.typeOrderfilters the type out atapps/server/src/domain/views/board.ts:593. Column gone.A
just restartrepopulates the registry from the renamed DB rows and the column re-appears.Root cause
apps/server/src/http/handlers/agent-type-rename.tsruns (in order) post-commit:rewriteAgentsJson(configPath, oldName, newName)— disk file rewrite.loadWebhookConfig(configPath)— refresh the config cache.enqueueRender(row.name)for every renamed agent — refresh the per-agent env directory (agent-env-sync.renderForInstance).broadcastSSE({ type: "agent_type_renamed", … })— dashboard cache invalidation.Missing: nothing touches the worker FIFO registry. The registry is keyed by
<type>-<instance>and was populated at boot. Renaming the type changes the DB row'stypecolumn but the in-memory registry entries never move.Acceptance criteria
Worker registry refresh
${old}-${suffix}keys) and re-registers it under the new${new}-${suffix}key, preservingcurrentTask/queueso an in-flight task is not orphaned.currentslot — the task continues to drain through the freshly-keyed worker;/cancel+/historycontinue to work via the new name (and the old name 404s).Container reconcile
reconcileAgentOne(name, …)for each new name so the container is reconciled under the new name (per-agentCLAUDE_CONFIG_DIRmount key changes; the container itself is namedclaude-hooks-<type>-<instance>).Tests
boss→code-lead. Assert that the worker registry hascode-lead-default/code-lead-2(and notboss-*) after the call returns.currentTaskcarries over to the new key, the old key is unregistered, and/healthreflects the new name.queue.lengthmatches before / after; the queued items are still in FIFO order under the new name.GET /boardimmediately after rename renders a column for the new type with the rightcapacity/in_flightnumbers (no missing column).Backfill (operator path)
specs/first-login-wizard.md— that operators on a service older than this fix canjust restartto recover. Already standard, but worth flagging where the symptom shows up.Out of scope
host/architect/ other reserved-name guards — those are already enforced upstream.pool_sizes).<type>so a rename necessarily generates a fresh dir. The reconcile call above ensures it's rendered.References
apps/server/src/http/handlers/agent-type-rename.ts— handler that needs the new step.apps/server/src/domain/views/board.ts:546-553, 593— where the join silently drops the renamed type.apps/server/src/infrastructure/container/container-reconcile.ts—reconcileOneto call post-rename.apps/server/src/background/worker.ts(or wherever the FIFO registry lives) — for the unregister + register surface.boss-*/code-lead-*mismatch onforge.jacquin.app.