M26-2 Reconcile + watchdog: lazy at boot, no flap event #589

Closed
opened 2026-04-30 19:31:16 +00:00 by claude-desktop · 0 comments
Collaborator

User story

As a service operator, I want lazy containers to be created at boot but stay stopped, and I want the watchdog to recognise that stopped-lazy is normal, so that the fleet's idle state doesn't flood the event log with false container_stopped flap reports.

Acceptance criteria

Reconcile

  • container-reconcile.ts reads lifecycle: "hot" | "lazy" from the SQLite agents row.
  • At reconcileAll() boot pass: hot rows behave as today (start if stopped). Lazy rows ensure container exists with the right config but do not start it. Initial state seeds as Stopped.
  • reconcileOne() on CRUD applies the same split for create + update.
  • lifecycle change counts as drift → recreate (so --restart flag is updated).

Restart policy

  • Hot containers continue using --restart unless-stopped.
  • Lazy containers use --restart no so the daemon does not undo intentional stops.
  • container.ts docker run arg builder branches on lifecycle.

Watchdog

  • container-watchdog.ts reads lifecycle. For lazy + currently-stopped (consult lifecycle module from M26-1), container_stopped event is suppressed.
  • container_missing (#132 failure mode — not in docker ps -a) still fires for both hot and lazy. Recovery path unchanged.

Tests

  • Reconcile unit cover: hot row → start, lazy row → exists+stopped, lifecycle change → recreate.
  • Watchdog unit cover: lazy + Stopped → no event, lazy + Missing → event + recreate, hot + Stopped → event as today.

Out of scope

  • Lifecycle module itself (M26-1).
  • Pool selection (M26-3).
  • Config schema / CRUD surface (M26-4).
  • SSE / metrics events (M26-5).

References

  • specs/container-lazy-lifecycle.md §Reconcile changes / §Watchdog changes / §Restart policy.
  • apps/server/src/infrastructure/container/container-reconcile.ts.
  • apps/server/src/infrastructure/container/container-watchdog.ts.
  • Issue #188 — restart-drops-half-the-fleet (existing started action variant).
  • Issue #132 — watchdog container_missing rationale.
## User story As a service operator, I want lazy containers to be created at boot but stay stopped, and I want the watchdog to recognise that stopped-lazy is normal, so that the fleet's idle state doesn't flood the event log with false `container_stopped` flap reports. ## Acceptance criteria ### Reconcile - [ ] `container-reconcile.ts` reads `lifecycle: "hot" | "lazy"` from the SQLite agents row. - [ ] At `reconcileAll()` boot pass: hot rows behave as today (start if stopped). Lazy rows ensure container *exists* with the right config but do **not** start it. Initial state seeds as `Stopped`. - [ ] `reconcileOne()` on CRUD applies the same split for create + update. - [ ] `lifecycle` change counts as drift → recreate (so `--restart` flag is updated). ### Restart policy - [ ] Hot containers continue using `--restart unless-stopped`. - [ ] Lazy containers use `--restart no` so the daemon does not undo intentional stops. - [ ] `container.ts` `docker run` arg builder branches on `lifecycle`. ### Watchdog - [ ] `container-watchdog.ts` reads `lifecycle`. For lazy + currently-stopped (consult lifecycle module from M26-1), `container_stopped` event is suppressed. - [ ] `container_missing` (#132 failure mode — not in `docker ps -a`) still fires for both hot and lazy. Recovery path unchanged. ### Tests - [ ] Reconcile unit cover: hot row → start, lazy row → exists+stopped, `lifecycle` change → recreate. - [ ] Watchdog unit cover: lazy + Stopped → no event, lazy + Missing → event + recreate, hot + Stopped → event as today. ## Out of scope - Lifecycle module itself (M26-1). - Pool selection (M26-3). - Config schema / CRUD surface (M26-4). - SSE / metrics events (M26-5). ## References - `specs/container-lazy-lifecycle.md` §Reconcile changes / §Watchdog changes / §Restart policy. - `apps/server/src/infrastructure/container/container-reconcile.ts`. - `apps/server/src/infrastructure/container/container-watchdog.ts`. - Issue #188 — restart-drops-half-the-fleet (existing `started` action variant). - Issue #132 — watchdog `container_missing` rationale.
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks#589
No description provided.