fix(flows): seeder must repair default-flow rows that fail to compile, regardless of version #545
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks#545
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
User story
As an operator, I want the default-flow seeder to detect a broken DB body and repair it from source unconditionally, so that a stale flow row left over from a port rename can't silently swallow every dispatch for that trigger.
Why
On 2026-04-28 the
review-requestedflow stopped dispatching reviewer agents. Every PR review-request webhook logged exactly:Root cause:
forgejo_userwas renamed togit_authoronagent.resolve_by_login(rename day 2026-04-16, repo memoryagents.md).review-requested-graph.jsonsource was updated to usegit_authorcorrectly.flowsSQLite row had been seeded at some intermediate version (version=2) with a body that still referencedforgejo_user, and the source JSON was atversion=1.pull_request.review_requestedevent compiled-failed silently, no reviewer dispatched, PRs sat unreviewed for hours before the operator noticed.Fix shipped immediately: rewrote DB body from source via
sqlite3 UPDATE, bumped both DB and source JSON toversion=3so they resync. PR #541 carries the source bump.The deeper bug is the seeder design: a flow row that fails to compile is always broken, regardless of its version number. The seeder should treat a compile failure as authoritative ground for repair (or refuse to boot) — the version field exists to gate operator overrides, not to gate fixing dead defaults.
Acceptance criteria
Seeder behaviour
flowsis compiled against the liveNodeRegistry. Compile-fail counts as "must repair" regardless of version.source = 'operator'is never touched even on compile fail — log and continue).[startup] flow seed <id>: source body fails compileat ERROR and the boot aborts (orprocess.exitCode = 1after current init finishes). Catastrophic — needs a real fix, not a silent skip.source = 'operator'rows).Telemetry
[startup] flow seed <id>: repaired (DB v<old> → v<new>, source v<src>): <reason>. Reason is the compile error.GET /flows/divergence/summary(or a new/flows/healthif the divergence endpoint is read-only) surfaces last seed-time repair counts so the operator can spot a silent fix happening.Tests
process.exitCode = 1forgejo_user, source hasgit_author, DB version > source version → seeder repairs, next dispatch succeedsOut of scope
References
apps/server/src/domain/flows/review-requested-graph.jsonv3 in PR #541UPDATE flows SET body=?, version=3 WHERE id='review-requested'apps/server/src/http/flows-routes.ts::seedDefaultFlow(per thedefault-graph.ts:35–37comment)[flow-dispatch] flow <id> failed to compile: …inapps/server/src/domain/flows/flow-dispatch.tsfeedback_dispatch_triggering_labels(forgejo_user → git_author rename day)Picked up + landing on PR #541 (commit
d3f5679).Root cause refined
After digging in, the bug had two layers, not one:
Source-level inconsistency —
review-requested-graph.jsoncarriesversion: 1(which the parser atreview-requested-graph.ts:42strict-checks:if (g.version !== 1) throw …— that field is the graph DSL version, must stay 1). The flow content version lives in the .ts constantREVIEW_REQUESTED_GRAPH_VERSION = 2. The 2026-04-16 rename edited the JSON body but did NOT bump the .ts constant, so the seeder's spec-version stayed at 2.Seeder bug —
seedDefaultFlowcomparedexisting.version === versionand returned "unchanged". With both at 2, the body-divergent row was declared fine and never repaired.What landed
REVIEW_REQUESTED_GRAPH_VERSIONfrom 2 → 3 (forces reseed of the deployed v=2 row)seedDefaultFlownow compares both version and body. Returns a new"repaired"status when version matches but body diverges. Operator rows (source = 'operator') remain untouched at any version — that contract is preserveddefault-source row with matching version but stale body is repaired (#545)reproduces the production scenarioAcceptance-criteria coverage
/flows/healthtelemetry endpoint surfacing repair counts (TODO — out of scope for this PR)skippedtest path)The boot-abort + telemetry stories I'd file as follow-ups against this issue if you want to keep #545 open until they land, or close once #541 merges and reopen as a fresh narrower issue. Your call.