fix(flows-yaml): lazy dispatcher construction on runtime mode flip (#1087) #1088
No reviewers
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks!1088
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "dev/1087"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
bootstrapFlowsYamlEngineinto a sharedensureYamlDispatcherAlive()helper that is idempotent and callable at runtime.ensureDispatcherAliveintoFlowsRoutesDepsand calls it fromhandleSetGlobalModeandhandleSetFlowModeafter anyoff → shadow|liveflip — recovering the half-state without a restart.409 capabilities_missing(with settings rollback) is returned ifassertCapabilitiesSatisfyOpsfails, so the operator learns immediately rather than at the next webhook.[flows-yaml] dispatcher constructed at runtime via mode flipon post-boot construction.Closes #1087
When the service booted with global mode=off, the dispatcher was never constructed. A subsequent POST /api/flows/global-mode {mode:live} updated the DB row but left the service in a silent half-state — webhooks arrived, SSE fired, but no flow_run rows were written and no agents were enqueued. Extract the dispatcher construction block from bootstrapFlowsYamlEngine into ensureYamlDispatcherAlive(), a shared, idempotent helper that: - Returns {ok:true} immediately if the dispatcher is already live. - Re-reads settings so it sees the post-write mode row. - Constructs trigger bus, capability bags, per-flow router, and dispatcher. - Returns {ok:false, error} when assertCapabilitiesSatisfyOps fails so the mode-flip endpoint can surface 409 capabilities_missing and roll back the settings write instead of silently leaving the system wired incorrectly. - Logs "[flows-yaml] dispatcher constructed at runtime via mode flip" on post-boot construction. Wire ensureDispatcherAlive into FlowsRoutesDeps and call it from both handleSetGlobalMode (after any off→shadow|live flip) and handleSetFlowMode (after any per-flow off→shadow|live flip). Add unit tests covering: successful lazy construction, 409 rollback on capability failure, and the off→off no-op path. Co-authored-by: Cursor <cursoragent@cursor.com>Review — fix(flows-yaml): lazy dispatcher construction on runtime mode flip (#1087)
APPROVED — the fix is correct, the idempotency guarantee holds, rollback is safe, and the unit tests cover the meaningful paths. Notes below; none are blocking.
What I verified
ensureYamlDispatcherAlive— idempotencyThe
yamlFlowsDispatcher !== nullearly return is the right sentinel. BecauseyamlFlowsDispatcheris only ever assigned inside this function and the function is the sole construction path, repeated calls from mode-flip endpoints after the first successful construction are no-ops (O(1), no re-allocation). ✅Settings read ordering
Both endpoints write the new mode row before calling
ensureDispatcherAlive, so the re-read insideensureYamlDispatcherAlive(const settings = getFlowsYamlSettings()) will see the post-write value. ThehasAnyLivingFlowcheck therefore correctly reflects the intent of the flip. ✅Rollback correctness —
handleSetFlowModeTraced
entry.fromtoservice-config-store.ts:setFlowsYamlPerFlowMode:fromis already typed asFlowsYamlMode, so theentry.from as FlowsYamlModecast in the rollback call is redundant (minor: can be removed). The rollback value itself is correct — it restores the previous effective mode, not just the per-flow key. ✅Rollback correctness —
handleSetGlobalModebeforeis snapshotted viagetFlowsYamlSettings()before the write;setFlowsYamlSettings({ mode: before.mode })in the 409 path correctly undoes the write. ✅off→offfast path in the testhandleSetGlobalModehas a real early return:The function returns before ever reaching the
ensureDispatcherAliveblock, soensureCalledis false for a different reason than themode !== "off"guard. The test is valid either way (both mechanisms agree), but the comment says "no-change fast path" which is accurate. ✅_yamlDiffReporterCleanupguardThe
if (_yamlDiffReporterCleanup === null)check prevents double-starting the diff reporter. Technically unreachable given the idempotency guard above it, but it's good defensive hygiene andstartFlowsYamlDiffReportercorrectly returns a cleanup handle now. ✅makeDeps/ backward compatibilityensureDispatcherAliveis declared optional (?) on the interface. The handlers guard withdeps.ensureDispatcherAlive &&before calling. Existing tests that don't pass the field are unaffected. ✅Minor observations (non-blocking)
1. Teardown path not implemented
AC item: "idle (or torn down) after a full flip back to
offfor every flow".Flipping everything back to
offleaves the dispatcher allocated. In practice the per-flow router re-reads settings on each dispatch and will route no flows (alloff), so the system is functionally idle._yamlDiffReporterCleanupis stored but never called in theoffpath. Leaving the dispatcher alive is acceptable given the router behaviour, but the stored cleanup handle suggests the author anticipated a future teardown path — worth a follow-up if memory / fd pressure ever matters.2.
entry.from as FlowsYamlModeis a redundant castFlowsYamlCutoverHistoryEntry.fromis alreadyFlowsYamlMode. The cast is noise — can be dropped.3. Integration test missing from AC
The issue listed: "Integration test: boot with
mode=off, flip via API, fire a syntheticissues.assignedevent, assert aflow_runsrow appears." The PR adds unit tests for the handler behaviour but not the end-to-end dispatch path. The unit tests are sufficient to validate this PR's changes; the integration test would test the whole stack includingsetYamlDispatcher. Fine to track separately.4.
shadow→offnot covered by the "off skips ensure" unit testThe
handleSetFlowModeoff-test establishes shadow first (necessary to defeat the cutover gate), then flips to off and assertsensureCalled === false. This is correct. A complementaryglobal-mode shadow→offtest would provide symmetry but isn't needed — themode !== "off"guard is the only branch.CI
No workflow runs are recorded for
f5d9c24e. Theqa.ymlworkflow triggers onpull_request → main. Merge only after CI is green — the QA pipeline (typecheck + lint + tests + db-schema drift) must pass to confirm the new test suite compiles and the unit tests actually run green.f5d9c24e810c752be2d00c752be2d09dd80f200e