feat(flows): dry-run mode + divergence detection (NF-5) #358

Merged

code-lead merged 1 commit from feat/326-nf5-dry-run-divergence into main

2026-04-24 13:54:26 +00:00

claude-desktop commented

2026-04-24 13:49:29 +00:00

Collaborator

Summary

Parallel-dispatch feature flag gates whether the Node-Flows executor fires alongside the legacy handlers. In dry-run every mutating forge.* / agent.* node short-circuits to a stub output and records its intended call on the new flow_node_runs.intent column; a heuristic correlates those intents against task_history so the operator can eyeball divergences before NF-6 flips the live cutover.

node_flows: { mode, summary_interval_ms, divergence_window_ms } block in config/agents.json. mode defaults to "off"; typos fail the loader ("dryrun", "LIVE", "Off" — all rejected).
Idempotent migration adds intent TEXT to flow_node_runs (same PRAGMA-table_info pattern used for agents.plugins).
Executor plumbs mutationMode + ctx.recordIntent(method, args) through the NodeHandlerCtx; NodeRunRecord.intent surfaces the recorded payload so the persister can write the column unchanged.
Every mutating forge spec gets a declarative dryRunStub; the handler wrapper reads the mode from the context and either short-circuits (recording intent, returning the stub) or falls through to the real adapter. Read-only nodes always query the real forge.
GET /flows/divergences?since=&window_ms= serves a matched / flow-only / legacy-only bucket envelope with heuristic_limits inline. Route registered before /flows/:id so Hono's radix matching doesn't let the wildcard swallow the literal.
background/divergence-summary.ts emits a daily flow.divergence_summary SSE envelope (interval configurable, default 24 h). Timer is only started when mode !== "off".

Design notes

Mutation wiring. A wrapMutationNode-style refactor would touch every forge spec individually; instead the spec table gained one new field (dryRunStub) and buildDescriptor does the short-circuit declaratively. Adding a new mutating forge node is one spec + one stub line, same shape as today.

Intent column in live mode. Live rows carry NULL in intent — the divergence endpoint's heuristic only correlates dry-run intents against task_history. Recording intent in live mode was considered but dropped (would invert the "NULL means live" invariant the endpoint relies on; easier to trust the existing input / output columns in live mode).

Mutating nodes wired for dry-run. Every NF-3 mutating spec gets a stub:

forge.add_labels, forge.remove_label, forge.create_issue, forge.patch_issue, forge.update_assignees, forge.create_comment, forge.request_review, forge.merge_pull_request, forge.add_blocker, forge.create_label, forge.write_file
agent.dispatch (returns dry-run-<timestamp>-<counter> taskId), agent.cancel, agent.raise_cap

Read-only nodes (forge.get_issue, forge.list_issues, forge.get_pull_request, forge.list_pull_requests, forge.list_comments, forge.list_reviews, forge.list_workflow_runs, forge.get_aggregate_status, forge.get_blockers, forge.read_file) are untouched — they hit the real forge in both modes so downstream router.switch keeps walking.

Heuristic caveats (documented in the endpoint response)

The divergence endpoint is intentionally imperfect — documented inline in heuristic_limits:

Single-mutation-type coverage. Only agent.dispatch intents correlate against task_history (the hottest legacy mutation). Non-dispatch mutations (addLabels, createComment, requestReview, …) always land in flow_only_intents without a legacy counterpart — that's expected, not a divergence signal.
Time-window correlation. Defaults ±60 s around the intent's fired_at; configurable via node_flows.divergence_window_ms or ?window_ms=. A legacy dispatch that lands outside the window reads as "legacy-only" on one side and "flow-only" on the other.
Greedy matching. Each task_history row is matched at most once. When multiple intents target the same (repo, issue, agent_type), the closest-in-time pair wins and the rest bucket as flow_only_intents.
No handler instrumentation. Legacy webhook handlers (webhook-handlers.ts, webhook-ci.ts, deps.ts, slash-commands.ts) were NOT touched — the heuristic reads task_history instead. Non-dispatch legacy side effects (labels / comments / reviews) are invisible to the correlator; that's a known gap that NF-6's live cutover validates empirically.

Scope respected

Untouched: webhook-handlers.ts, webhook-ci.ts, deps.ts, slash-commands.ts, agent-runner.ts, adapter-factory, forgejo-port, every UI file. The feature is purely additive — legacy dispatch stays the authoritative source of truth.

Rollout. The feature flag is not flipped on charles/claude-hooks in this PR — that's an operator action after review.

Test plan

just qa (typecheck clean, biome clean) — web-workspace test failures are preexisting (verified by stash-compare against base).
New tests pass — 90 total across 4 files:
- shared/config/flow-mode.test.ts (26)
- domain/flows/dry-run-executor.test.ts (27)
- http/flows-divergence.test.ts (26)
- background/divergence-summary.test.ts (11)
Server regression suite — no new failures (5 preexisting fails on main, confirmed by stash-compare).
Default config (block absent) yields mode: "off", summary_interval_ms: 86_400_000, divergence_window_ms: 60_000.
Operator smoke test: flip node_flows.mode = "dry-run" in config/agents.json, drive a triggering event, confirm flow_node_runs.intent populates and GET /flows/divergences returns the expected buckets.

Closes #326

🤖 Generated with Claude Code

## Summary Parallel-dispatch feature flag gates whether the Node-Flows executor fires alongside the legacy handlers. In dry-run every mutating `forge.*` / `agent.*` node short-circuits to a stub output and records its intended call on the new `flow_node_runs.intent` column; a heuristic correlates those intents against `task_history` so the operator can eyeball divergences before NF-6 flips the live cutover. - `node_flows: { mode, summary_interval_ms, divergence_window_ms }` block in `config/agents.json`. `mode` defaults to `"off"`; typos fail the loader (`"dryrun"`, `"LIVE"`, `"Off"` — all rejected). - Idempotent migration adds `intent TEXT` to `flow_node_runs` (same PRAGMA-table_info pattern used for `agents.plugins`). - Executor plumbs `mutationMode` + `ctx.recordIntent(method, args)` through the `NodeHandlerCtx`; `NodeRunRecord.intent` surfaces the recorded payload so the persister can write the column unchanged. - Every mutating forge spec gets a declarative `dryRunStub`; the handler wrapper reads the mode from the context and either short-circuits (recording intent, returning the stub) or falls through to the real adapter. Read-only nodes always query the real forge. - `GET /flows/divergences?since=&window_ms=` serves a matched / flow-only / legacy-only bucket envelope with `heuristic_limits` inline. Route registered before `/flows/:id` so Hono's radix matching doesn't let the wildcard swallow the literal. - `background/divergence-summary.ts` emits a daily `flow.divergence_summary` SSE envelope (interval configurable, default 24 h). Timer is only started when `mode !== "off"`. ## Design notes **Mutation wiring.** A `wrapMutationNode`-style refactor would touch every forge spec individually; instead the spec table gained one new field (`dryRunStub`) and `buildDescriptor` does the short-circuit declaratively. Adding a new mutating forge node is one spec + one stub line, same shape as today. **Intent column in live mode.** Live rows carry NULL in `intent` — the divergence endpoint's heuristic only correlates dry-run intents against `task_history`. Recording intent in live mode was considered but dropped (would invert the "NULL means live" invariant the endpoint relies on; easier to trust the existing `input` / `output` columns in live mode). **Mutating nodes wired for dry-run.** Every NF-3 mutating spec gets a stub: - `forge.add_labels`, `forge.remove_label`, `forge.create_issue`, `forge.patch_issue`, `forge.update_assignees`, `forge.create_comment`, `forge.request_review`, `forge.merge_pull_request`, `forge.add_blocker`, `forge.create_label`, `forge.write_file` - `agent.dispatch` (returns `dry-run-<timestamp>-<counter>` taskId), `agent.cancel`, `agent.raise_cap` Read-only nodes (`forge.get_issue`, `forge.list_issues`, `forge.get_pull_request`, `forge.list_pull_requests`, `forge.list_comments`, `forge.list_reviews`, `forge.list_workflow_runs`, `forge.get_aggregate_status`, `forge.get_blockers`, `forge.read_file`) are untouched — they hit the real forge in both modes so downstream `router.switch` keeps walking. ## Heuristic caveats (documented in the endpoint response) The divergence endpoint is intentionally imperfect — documented inline in `heuristic_limits`: - **Single-mutation-type coverage.** Only `agent.dispatch` intents correlate against `task_history` (the hottest legacy mutation). Non-dispatch mutations (`addLabels`, `createComment`, `requestReview`, …) always land in `flow_only_intents` without a legacy counterpart — that's expected, not a divergence signal. - **Time-window correlation.** Defaults ±60 s around the intent's `fired_at`; configurable via `node_flows.divergence_window_ms` or `?window_ms=`. A legacy dispatch that lands outside the window reads as "legacy-only" on one side and "flow-only" on the other. - **Greedy matching.** Each `task_history` row is matched at most once. When multiple intents target the same `(repo, issue, agent_type)`, the closest-in-time pair wins and the rest bucket as `flow_only_intents`. - **No handler instrumentation.** Legacy webhook handlers (`webhook-handlers.ts`, `webhook-ci.ts`, `deps.ts`, `slash-commands.ts`) were NOT touched — the heuristic reads `task_history` instead. Non-dispatch legacy side effects (labels / comments / reviews) are invisible to the correlator; that's a known gap that NF-6's live cutover validates empirically. ## Scope respected Untouched: `webhook-handlers.ts`, `webhook-ci.ts`, `deps.ts`, `slash-commands.ts`, `agent-runner.ts`, `adapter-factory`, `forgejo-port`, every UI file. The feature is purely additive — legacy dispatch stays the authoritative source of truth. **Rollout.** The feature flag is not flipped on `charles/claude-hooks` in this PR — that's an operator action after review. ## Test plan - [x] `just qa` (typecheck clean, biome clean) — web-workspace test failures are preexisting (verified by stash-compare against base). - [x] New tests pass — 90 total across 4 files: - `shared/config/flow-mode.test.ts` (26) - `domain/flows/dry-run-executor.test.ts` (27) - `http/flows-divergence.test.ts` (26) - `background/divergence-summary.test.ts` (11) - [x] Server regression suite — no new failures (5 preexisting fails on `main`, confirmed by stash-compare). - [x] Default config (block absent) yields `mode: "off"`, `summary_interval_ms: 86_400_000`, `divergence_window_ms: 60_000`. - [ ] Operator smoke test: flip `node_flows.mode = "dry-run"` in `config/agents.json`, drive a triggering event, confirm `flow_node_runs.intent` populates and `GET /flows/divergences` returns the expected buckets. Closes #326 🤖 Generated with [Claude Code](https://claude.com/claude-code)

claude-desktop added 1 commit

2026-04-24 13:49:29 +00:00

feat(flows): dry-run mode + divergence detection (NF-5)

qa / qa (pull_request) Successful in 4m25s

Details

qa / dockerfile (pull_request) Successful in 7s

Details

4f4c087bd6

Parallel-dispatch feature flag gates whether the Node-Flows executor fires
alongside the legacy handlers. In dry-run every mutating `forge.*` /
`agent.*` node short-circuits to a stub output and records its intended
call on the new `flow_node_runs.intent` column; a background heuristic
correlates those intents against `task_history` to expose divergences
before NF-6 flips the live cutover.

- `node_flows: { mode, summary_interval_ms, divergence_window_ms }`
  block in `config/agents.json`. `mode` defaults to `"off"`; typos fail
  the loader.
- `flow_node_runs.intent` column (idempotent `ALTER TABLE` migration).
- Executor plumbs `mutationMode` + `ctx.recordIntent(method, args)`
  through the NodeHandlerCtx; NodeRunRecord surfaces the intent.
- Every mutating forge / agent node gets a declarative `dryRunStub`
  and an up-front short-circuit in the handler wrapper; read-only
  nodes keep hitting the real adapter.
- `GET /flows/divergences?since=&window_ms=` serves a
  matched / flow-only / legacy-only bucket envelope. Heuristic is the
  cheap `(repo, issue_number, agent_type)` ±window correlation —
  limits documented in the response's `heuristic_limits` field.
- Daily `flow.divergence_summary` SSE broadcast via a background
  timer (only started when `mode !== "off"`).

30-40 tests target: 90 new tests across flow-mode, dry-run-executor,
flows-divergence, and divergence-summary.

Closes #326

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>