fix(flows): allocate rate-limit budget per flow run, not per process #405

Merged
charles merged 1 commit from fix/flow-rate-limit-per-run into main 2026-04-26 22:34:22 +00:00
Collaborator

Summary

flow-dispatch.ts hoisted defaultArgInjections() out of the per-run loop on the (correct) reasoning that every key it returned was either a stable function or the singleton dedup surface. The rate-limit budget broke that invariant — the limiter is stateful and counts mutating calls. With the bundle reused across N triggers × M flows, every flow run shared one limiter; once cumulative consumption hit the DEFAULT_RATE_LIMIT_BUDGET = 20 ceiling, every subsequent flow run threw RateLimitExceeded until the next service restart.

Production symptom

PR #402's bounced review-request flow at 2026-04-26 20:43:25 UTC fired with the limiter already at 20/20, so the dispatch node errored:

"error": "rate-limit budget exhausted: used 20 of 20 mutating calls for this flow run"

The reviewer never picked up the re-review, and the operator dashboard showed IN REVIEW cards parked indefinitely on the dev column with the reviewer column empty.

The NF-3 docstring in rate-limit.ts:97-101 already flagged this exact leak as "safe for the isolated unit tests shipped here" but warns it "DOES leak across runs". This PR is the production wiring catching up.

Fix

Keep the static injection bundle hoisted (still correct for the stateless functions) but allocate createRateLimiter() inside the loop and spread it onto argDefaults per (trigger, flow) iteration. The allocation is one object per flow run — negligible against the forge-API + executor work that follows.

- const argDefaults = defaultArgInjections();
+ const staticArgDefaults = defaultArgInjections();
  for (const t of triggers) {
    for (const row of flowRows) {
      ...
-     argDefaults,
+     argDefaults: { ...staticArgDefaults, rateLimit: createRateLimiter() },

Tests

  • each flow run sees a distinct RateLimiter instance — pins the per-run-allocation contract via a source-handler wrap that records the limiter object identity across a 3-assignee fan-out. Verifies the three captured limiters are pairwise distinct.
  • a budget-exhausting first run does not block later runs — burns the full 20-call budget in run 1 and verifies runs 2 and 3 still finish status=ok. Pre-fix, both would have died on the first consume() of run 2.
  • Both share an afterEach snapshot/restore of the source handler because defaultRegistry() returns fresh MapRegistry instances but reuses the same BUILT_INS descriptor object references, so the wrap mutation otherwise persists across tests.

Test plan

  • bun test apps/server/src/domain/flows/flow-dispatch.test.ts → 28 pass / 0 fail
  • bun test apps/server/src/domain/flows/ → 376 pass / 0 fail
  • bunx biome check apps/server/src/domain/flows/flow-dispatch{.ts,.test.ts} → clean
  • Operatorsystemctl --user restart claude-hooks to reset the leaked singleton, then POST /pipeline/bounce-review for #401 + #402 to clear the parked review state.

🤖 Generated with Claude Code

## Summary `flow-dispatch.ts` hoisted `defaultArgInjections()` out of the per-run loop on the (correct) reasoning that every key it returned was either a stable function or the singleton dedup surface. The rate-limit budget broke that invariant — the limiter is *stateful* and counts mutating calls. With the bundle reused across N triggers × M flows, every flow run shared one limiter; once cumulative consumption hit the `DEFAULT_RATE_LIMIT_BUDGET = 20` ceiling, **every subsequent flow run threw `RateLimitExceeded` until the next service restart**. ## Production symptom PR #402's bounced review-request flow at `2026-04-26 20:43:25 UTC` fired with the limiter already at 20/20, so the `dispatch` node errored: ``` "error": "rate-limit budget exhausted: used 20 of 20 mutating calls for this flow run" ``` The reviewer never picked up the re-review, and the operator dashboard showed `IN REVIEW` cards parked indefinitely on the dev column with the reviewer column empty. The NF-3 docstring in `rate-limit.ts:97-101` already flagged this exact leak as "safe for the isolated unit tests shipped here" but warns it "DOES leak across runs". This PR is the production wiring catching up. ## Fix Keep the static injection bundle hoisted (still correct for the stateless functions) but allocate `createRateLimiter()` *inside* the loop and spread it onto `argDefaults` per `(trigger, flow)` iteration. The allocation is one object per flow run — negligible against the forge-API + executor work that follows. ```diff - const argDefaults = defaultArgInjections(); + const staticArgDefaults = defaultArgInjections(); for (const t of triggers) { for (const row of flowRows) { ... - argDefaults, + argDefaults: { ...staticArgDefaults, rateLimit: createRateLimiter() }, ``` ## Tests - **`each flow run sees a distinct RateLimiter instance`** — pins the per-run-allocation contract via a `source`-handler wrap that records the limiter object identity across a 3-assignee fan-out. Verifies the three captured limiters are pairwise distinct. - **`a budget-exhausting first run does not block later runs`** — burns the full 20-call budget in run 1 and verifies runs 2 and 3 still finish `status=ok`. Pre-fix, both would have died on the first `consume()` of run 2. - Both share an `afterEach` snapshot/restore of the source handler because `defaultRegistry()` returns fresh `MapRegistry` instances but reuses the same `BUILT_INS` descriptor object references, so the wrap mutation otherwise persists across tests. ## Test plan - [x] `bun test apps/server/src/domain/flows/flow-dispatch.test.ts` → 28 pass / 0 fail - [x] `bun test apps/server/src/domain/flows/` → 376 pass / 0 fail - [x] `bunx biome check apps/server/src/domain/flows/flow-dispatch{.ts,.test.ts}` → clean - [ ] **Operator** — `systemctl --user restart claude-hooks` to reset the leaked singleton, then `POST /pipeline/bounce-review` for #401 + #402 to clear the parked review state. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
fix(flows): allocate rate-limit budget per flow run, not per process
All checks were successful
qa / qa (pull_request) Successful in 6m34s
qa / dockerfile (pull_request) Successful in 13s
5391a20b5a
`flow-dispatch.ts` hoisted `defaultArgInjections()` out of the per-run
loop on the (correct) reasoning that every key it returned was either a
stable function or the singleton dedup surface. The rate-limit budget
broke that invariant: the limiter is *stateful* and counts mutating
calls. With the bundle reused across N triggers × M flows, every flow
run shared one limiter; once cumulative consumption hit the
`DEFAULT_RATE_LIMIT_BUDGET = 20` ceiling, every subsequent flow run
threw `RateLimitExceeded` until the next service restart.

Production symptom: PR #402's bounced review-request flow fired with
the budget already at 20/20, so the `dispatch` node errored, the
reviewer never picked up the re-review, and the operator dashboard
showed `IN REVIEW` cards parked indefinitely on the dev column. The
NF-3 docstring in `rate-limit.ts:97-101` already flagged the leak as
"safe for the isolated unit tests shipped here" but warns it "DOES
leak across runs" — this is the production wiring catching up.

Fix: keep the static injection bundle hoisted (still correct for the
stateless functions) but allocate `createRateLimiter()` inside the
loop and spread it onto `argDefaults` per (trigger, flow) iteration.
The allocation is one object per flow run — negligible vs. the
forge-API + executor work that follows.

Tests:
- new "each flow run sees a distinct RateLimiter instance" pins the
  per-run-allocation contract via a `source`-handler wrap that
  records the limiter object identity across a 3-assignee fan-out.
- new "a budget-exhausting first run does not block later runs"
  burns the full 20-call budget in run 1 and verifies runs 2 and 3
  still finish status=ok — pre-fix, both would have died on the
  first `consume()` of run 2 with `RateLimitExceeded`.
- both share an `afterEach` snapshot/restore of the source handler
  because `defaultRegistry()` returns fresh `MapRegistry` instances
  but reuses the same `BUILT_INS` descriptor object references, so
  the wrap mutation otherwise persists across tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
reviewer approved these changes 2026-04-26 22:33:30 +00:00
reviewer left a comment

Correct fix. The { ...staticArgDefaults, rateLimit: createRateLimiter() } spread at each (trigger, flow) iteration is exactly right — stateless bundle stays hoisted, stateful limiter is fresh per run. The tests pin both invariants cleanly (object identity check + budget-burn regression). CI green.

Correct fix. The `{ ...staticArgDefaults, rateLimit: createRateLimiter() }` spread at each `(trigger, flow)` iteration is exactly right — stateless bundle stays hoisted, stateful limiter is fresh per run. The tests pin both invariants cleanly (object identity check + budget-burn regression). CI green.
charles deleted branch fix/flow-rate-limit-per-run 2026-04-26 22:34:23 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks!405
No description provided.