M26-1 Container lifecycle module + state machine #588
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks#588
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
User story
As a service operator, I want a per-instance lifecycle module that tracks
Stopped → Starting → Running → Stoppingfor every container, so that lazy roles can transition safely between states without races between dispatch enqueue and idle-stop.Acceptance criteria
Module
apps/server/src/infrastructure/container/lifecycle.tsexports agetLifecycle(name)factory returning a per-instance state holder withstate,acquireForEnqueue(),acquireForIdleStop(), andmarkStopped()/markRunning()transitions.Stopped | Starting | Running | Stopping. Hot instances live exclusively inRunning(orMissing, owned by watchdog).Mutexguards every transition. Held duringdocker start,docker stop, and the readiness probe — not during the actual agent task body.Enqueue path
enqueuefor a lazy instance acquires the lock, observes state, and: ifStopped→ firesdocker start, runs readiness probe, transitionsRunning; ifStarting→ awaits the in-flight transition; ifRunning→ no-op.Idle-stop path
idle_stop_seconds. On fire: acquire lock, re-checkqueue.length === 0 && currentTask === null && lifecycle === lazy, transitionStopping,docker stop, transitionStopped. Re-arm cancelled if a new task lands.Readiness probe
docker start, rundocker exec <name> sh -c 'true'with 5 s timeout. Retry up to 3× with 200 ms backoff. Failure surfaces acontainer.lazy_start_failedevent (event wiring lands in M26-5) and the dispatch fails with a clear error.Tests
Stopped(starts container), enqueue whileStarting(awaits), idle-tick while queue non-empty (no-op), idle-tick while in-flight task (no-op), concurrent enqueue + idle-tick (no double start, noexec into Exited).DockerRunner(same harness ascontainer.test.ts/container-reconcile.test.ts).Out of scope
References
specs/container-lazy-lifecycle.md— full spec.apps/server/src/infrastructure/container/container.ts— existing docker runner.apps/server/src/domain/dispatch/registry.ts:61— currentonEnqueuehook to wrap.