flows-yaml: runtime global-mode flip leaves dispatcher unconstructed until restart #1087
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks#1087
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
When the service boots with global
mode: "off"(and no per-flow override),main.ts:3860short-circuits dispatcher construction:If the operator later flips global mode via
POST /api/flows/global-mode {"mode":"live"}(the route documented atflows-routes.ts:970as the deliberate operator override for exactly this kind of recovery), the settings row is updated and the audit event is broadcast — but no dispatcher is constructed post-hoc, no trigger bus is subscribed.Result: silent half-state. Webhooks arrive,
[sse-broadcast] kind=issue.assigned …fires, board cards stay idle.mode-statusreportsliveon every flow. No error surface.Only
just restart(or any other service restart) recovers, because the boot-timehasAnyLivingFlowcheck then seesmode=liveand wires the dispatcher.Reproduction
flows_yaml_settings.mode = "off"and allper_flowentries off (or omitted). Boot log shows[flows-yaml] mode=off loaded=N errors=0.curl -X POST http://127.0.0.1:4500/api/flows/global-mode -H 'Content-Type: application/json' -d '{"mode":"live"}'→{"from":"off","to":"live","changed":true}.just flows-yaml-mode-statusshows global=live + every flow live.issues.assigned).[<agent>] enqueued …, no[webhook] dispatch …, no flow_run row. Board stays idle.just restart. Boot log now shows[flows-yaml] mode=live loaded=N errors=0. Repeat step 4. Observed: dispatch fires immediately.This was hit live on 2026-05-11: after #1083 (legacy engine deletion) merged with global mode left at
off, the per-flow cutover step was skipped → every event went undispatched until the symptom was diagnosed and the service restarted.Acceptance criteria
Pick one of the two approaches below — both are acceptable, the spec preference is (A) since the route's docstring already advertises it as a recovery mechanism that should "just work".
(A) Lazy dispatcher construction (preferred)
if (hasAnyLivingFlow)block inmain.ts:3860so the trigger bus, live/shadow capability bags, per-flow router, and dispatcher can be constructed (or torn down) at any time, not only at boot.POST /api/flows/global-modeandPOST /api/flows/:name/modeensure the dispatcher is alive after any flip fromoff → shadow|liveand idle (or torn down) after a full flip back toofffor every flow.setEventHandlersTriggerBus()is called with the live bus whenever the dispatcher is constructed; it must be safe to call repeatedly.assertCapabilitiesSatisfyOps()still runs and refuses to construct the dispatcher when a live-capability dep is missing — but the failure surfaces a409 capabilities_missingfrom the mode-flip endpoint instead of being deferred to restart.(B) Refuse the flip + explicit restart
If (A) is rejected as too invasive:
POST /api/flows/global-modereturns409 restart_required(with a cleardetailfield) when the dispatcher was not constructed at boot AND the requested mode is non-off.just flows-yaml-cutover NAME MODEsurfaces the same 409 with a one-line operator instruction ("flip refused — dispatcher was not constructed at boot; runjust restartand retry").POST /api/flows/:name/modewhen global isoffand the dispatcher was not constructed.dispatcher_state: "boot_inert"field so operators can spot the half-state.Tests
mode=off, flip via API, fire a syntheticissues.assignedevent, assert aflow_runsrow appears (A) or 409 was returned (B).mode=liveboot still wires the dispatcher exactly as it does today.Telemetry / observability
[flows-yaml] dispatcher constructed at runtime via mode flipwhen post-boot construction happens.[flows-yaml] mode flip refused — dispatcher inert since boot, restart requiredwhen the 409 is returned.Out of scope
evaluateCutover) — orthogonal, untouched.POST /api/flows/:name/modeper-flow gate behaviour beyond ensuring the dispatcher is alive when needed.createFlowWatcher.References
apps/server/src/main.ts:3853-3923—hasAnyLivingFlowboot gate + dispatcher constructionapps/server/src/http/flows-routes.ts:970-1000—handleSetGlobalMode(current handler; only writes settings + broadcasts)apps/server/src/domain/flows-yaml/dispatcher.ts—createDispatcherapps/server/src/domain/flows-yaml/trigger-bus.ts—createTriggerBus28d1f2f6) — legacy engine deletion; the cutover checklist that operators are expected to follow post-mergejust restart