fix(flows-yaml): merge resolved-agent identity into TaskRequest #1123

Merged
reviewer merged 1 commit from fix/flows-yaml-dispatch-identity into main 2026-05-12 10:33:57 +00:00
Collaborator

Summary

Closes the post-flows-yaml-cutover (#1083) crash loop. The dispatch op constructed a taskPayload without git_author / git_name / git_email / branch_prefix / forgejo_token, and dropped issue_number entirely when the flow had no issue anchor. The Record<string, unknown> boundary cast hid this from TypeScript. At task end, persistTask then tripped NOT NULL constraint failed: task_history.user (or .issue_number), which became an unhandledRejection and crashed the service. systemd restarted, the cycle repeated — 65 crashes since the cutover landed on May 11.

Symptom timeline (May 11 → May 12)

  • 2026-05-10 21:28 UTC#1083 cutover merged; flows-yaml becomes sole dispatch path.
  • 2026-05-11 00:46:27 CEST — first NOT NULL crash. [process] unhandledRejection: SQLiteError: NOT NULL constraint failed: task_history.user.
  • Overnight 2026-05-11 → 12 — service restarts 6×. task_history rows never written for affected tasks → janitor sees "no running task" → stale-idle-assigned bounces → re-dispatch cascade → session-resume invalidation (#1106 path) → multiple competing attempts on the same issue.
  • 2026-05-12 ~10:30 — third attempt on the same issue finally runs to completion in worktree; would have crashed again at end without this fix.

What the fix does

  • ops/dispatch.ts — pull identity fields from the resolved agent envelope (tokenforgejo_token) into every TaskRequest. Default issue_number to 0 for repo-level flows (architect convention). Throw at dispatch time when the agent login is not a configured agent — without identity / token the task cannot do real work, and silently dispatching it just queues a row that crashes at end.
  • domain/dispatch/registry.ts — defense in depth: wrap the persistTask call in onFinish in try/catch (mirrors the existing startTask pattern at onStart) so any future incomplete TaskRequest warns rather than crashing the process.
  • ops/dispatch.test.ts — updated unknown-login test (now asserts throw) + new regression covering the identity merge and issue_number: 0 default.

Why not just revert

  • Reverting #1107 (the startTask audit-row PR) does not fix this — the crash site is persistTask, which existed before #1107. Confirmed by counting NOT NULL constraint lines in journalctl: 65 before #1107 deployed, 26 after. Both periods affected.
  • Reverting the flows-yaml cutover (#1083) would delete an entire milestone (M21 — Node Flows) to fix a 10-line bug. Wrong leverage.

Test plan

  • bun test apps/server/src/domain/flows-yaml/ops/dispatch.test.ts — 6/6 pass (includes new regression for identity merge + updated throw-on-unknown-login).
  • bun test apps/server/src/domain/dispatch/registry.test.ts apps/server/src/infrastructure/database/task-store.test.ts — 38/38 pass.
  • bun run --filter @claude-hooks/server typecheck — clean.
  • just restart on host — service boots clean, no NOT NULL, no unhandled rejection. Boot-sweep dropped one orphaned session row from the prior crash window.
  • Manual: dispatch a real dev task end-to-end, confirm task_history row lands with non-null user + issue_number on completion.

🤖 Generated with Claude Code

## Summary Closes the post-flows-yaml-cutover (#1083) crash loop. The `dispatch` op constructed a `taskPayload` without `git_author` / `git_name` / `git_email` / `branch_prefix` / `forgejo_token`, and dropped `issue_number` entirely when the flow had no issue anchor. The `Record<string, unknown>` boundary cast hid this from TypeScript. At task end, `persistTask` then tripped `NOT NULL constraint failed: task_history.user` (or `.issue_number`), which became an unhandledRejection and crashed the service. systemd restarted, the cycle repeated — **65 crashes since the cutover landed on May 11**. ## Symptom timeline (May 11 → May 12) - `2026-05-10 21:28 UTC` — #1083 cutover merged; flows-yaml becomes sole dispatch path. - `2026-05-11 00:46:27 CEST` — first NOT NULL crash. `[process] unhandledRejection: SQLiteError: NOT NULL constraint failed: task_history.user`. - Overnight `2026-05-11 → 12` — service restarts 6×. `task_history` rows never written for affected tasks → janitor sees "no running task" → stale-idle-assigned bounces → re-dispatch cascade → session-resume invalidation (#1106 path) → multiple competing attempts on the same issue. - `2026-05-12 ~10:30` — third attempt on the same issue finally runs to completion in worktree; would have crashed again at end without this fix. ## What the fix does - `ops/dispatch.ts` — pull identity fields from the resolved agent envelope (`token` → `forgejo_token`) into every `TaskRequest`. Default `issue_number` to `0` for repo-level flows (architect convention). Throw at dispatch time when the agent login is not a configured agent — without identity / token the task cannot do real work, and silently dispatching it just queues a row that crashes at end. - `domain/dispatch/registry.ts` — defense in depth: wrap the `persistTask` call in `onFinish` in `try/catch` (mirrors the existing `startTask` pattern at `onStart`) so any future incomplete `TaskRequest` warns rather than crashing the process. - `ops/dispatch.test.ts` — updated unknown-login test (now asserts throw) + new regression covering the identity merge and `issue_number: 0` default. ## Why not just revert - Reverting `#1107` (the `startTask` audit-row PR) does not fix this — the crash site is `persistTask`, which existed before `#1107`. Confirmed by counting `NOT NULL constraint` lines in journalctl: 65 before `#1107` deployed, 26 after. Both periods affected. - Reverting the flows-yaml cutover (`#1083`) would delete an entire milestone (M21 — Node Flows) to fix a 10-line bug. Wrong leverage. ## Test plan - [x] `bun test apps/server/src/domain/flows-yaml/ops/dispatch.test.ts` — 6/6 pass (includes new regression for identity merge + updated throw-on-unknown-login). - [x] `bun test apps/server/src/domain/dispatch/registry.test.ts apps/server/src/infrastructure/database/task-store.test.ts` — 38/38 pass. - [x] `bun run --filter @claude-hooks/server typecheck` — clean. - [x] `just restart` on host — service boots clean, no NOT NULL, no unhandled rejection. Boot-sweep dropped one orphaned session row from the prior crash window. - [ ] Manual: dispatch a real dev task end-to-end, confirm `task_history` row lands with non-null `user` + `issue_number` on completion. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
fix(flows-yaml): merge resolved-agent identity into TaskRequest
All checks were successful
qa / i18n-string-check (pull_request) Successful in 7s
qa / sql-layer-check (pull_request) Successful in 11s
qa / dockerfile (pull_request) Successful in 11s
qa / db-schema (pull_request) Successful in 12s
qa / qa-1 (pull_request) Successful in 2m16s
qa / qa (pull_request) Successful in 0s
ce46cc315b
Closes the post-flows-yaml-cutover (#1083) crash loop. The `dispatch` op
constructed a `taskPayload` without `git_author` / `git_name` / `git_email`
/ `branch_prefix` / `forgejo_token`, and dropped `issue_number` entirely
when the flow had no issue anchor. The legacy `Record<string, unknown>`
boundary cast hid this from TypeScript. At task end, `persistTask` then
tripped `NOT NULL constraint failed: task_history.user` (or
`.issue_number`), which became an unhandledRejection and crashed the
service. systemd restarted, the cycle repeated — 65 crashes since the
cutover landed on May 11.

- Pull identity fields from the resolved agent envelope (`token` →
  `forgejo_token`) into every `TaskRequest`.
- Default `issue_number` to `0` for repo-level flows (matches the
  architect convention for non-issue dispatches).
- Throw at dispatch time when the agent login is not a configured
  agent — without identity / token the task cannot do real work, and
  silently dispatching it just queued a row that would crash at end.
- Defense in depth: wrap the `persistTask` call in `onFinish` in a
  try/catch (mirrors the existing `startTask` pattern at `onStart`)
  so any future incomplete `TaskRequest` warns rather than crashing
  the process.
- Regression test in `dispatch.test.ts` asserting the identity fields
  land on the payload and `issue_number` defaults to `0`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
reviewer approved these changes 2026-05-12 10:33:47 +00:00
reviewer left a comment

APPROVED — CI green, fix is correct. Ready to squash-merge.

Root cause ✓

The diagnosis is accurate. dispatch.ts was building taskPayload without ever pulling git_author / git_name / git_email / branch_prefix / forgejo_token off the resolved-agent envelope. The Record<string, unknown> boundary hid this from TypeScript, so those fields arrived as undefined at persistTask, which then tripped the NOT NULL constraint on task_history.user and blew up as an unhandledRejection. Confirmed by persistTask being a synchronous void function — the throw propagates straight out of onFinish to whatever called it.

Changes ✓

ops/dispatch.ts — the primary fix:

  • Fail-fast throw when byLogin returns null: correct. An unknown login cannot provide the identity fields that TaskRequest requires; silently queuing the task just deferred the crash. Error message is specific enough to diagnose.
  • Identity merge (git_author / git_name / git_email / branch_prefix / forgejo_token): covers every NOT NULL column that was missing.
  • issue_number: args.issue_number ?? 0: matches the architect convention and the nonneg constraint on issue_number in argsSchema; safe sentinel for repo-level flows.
  • branch_prefix: args.branch_prefix ?? r.branch_prefix: caller override is preserved, agent default fills in. When both are absent the key lands as undefined in the payload rather than being omitted — minor behavioral delta, harmless in practice since all consumers treat it as optional.

domain/dispatch/registry.ts — defense-in-depth:

  • try/catch around persistTask mirrors the identical pattern already on startTask in onStart. Correct. A future dispatcher gap will warn rather than crash.

ops/dispatch.test.ts — regression coverage:

  • New test exercises every identity field that was missing + verifies issue_number: 0 default for the no-issue-anchor path.
  • Updated throw-on-unknown-login test matches the new fail-fast contract.

Minor nits (no blocker)

  1. Redundant cast: const r = resolved as Record<string, unknown>AgentResolverCapability.byLogin already returns Promise<Record<string, unknown> | null>, so after the null guard resolved is already Record<string, unknown>. You can write resolved.git_author directly. One extra line, zero impact.

  2. First existing test still uses a minimal mock: "enqueues via workers and returns task_id + queued_at" passes byLogin: async (login) => ({ login, type: "dev" }) — the resolved envelope is now merged into the payload, so git_author: undefined etc. land in the captured call. Test still passes, but the mock is now slightly misleading. Fine to leave for a follow-up tidy.

Nothing here warrants holding the merge — the service has been in a crash loop since May 11 and this is a targeted, well-tested fix.

**APPROVED** — CI green, fix is correct. Ready to squash-merge. ## Root cause ✓ The diagnosis is accurate. `dispatch.ts` was building `taskPayload` without ever pulling `git_author / git_name / git_email / branch_prefix / forgejo_token` off the resolved-agent envelope. The `Record<string, unknown>` boundary hid this from TypeScript, so those fields arrived as `undefined` at `persistTask`, which then tripped the `NOT NULL constraint` on `task_history.user` and blew up as an `unhandledRejection`. Confirmed by `persistTask` being a synchronous `void` function — the throw propagates straight out of `onFinish` to whatever called it. ## Changes ✓ **`ops/dispatch.ts`** — the primary fix: - Fail-fast `throw` when `byLogin` returns `null`: correct. An unknown login cannot provide the identity fields that `TaskRequest` requires; silently queuing the task just deferred the crash. Error message is specific enough to diagnose. - Identity merge (`git_author / git_name / git_email / branch_prefix / forgejo_token`): covers every `NOT NULL` column that was missing. - `issue_number: args.issue_number ?? 0`: matches the architect convention and the `nonneg` constraint on `issue_number` in `argsSchema`; safe sentinel for repo-level flows. - `branch_prefix: args.branch_prefix ?? r.branch_prefix`: caller override is preserved, agent default fills in. When both are absent the key lands as `undefined` in the payload rather than being omitted — minor behavioral delta, harmless in practice since all consumers treat it as optional. **`domain/dispatch/registry.ts`** — defense-in-depth: - `try/catch` around `persistTask` mirrors the identical pattern already on `startTask` in `onStart`. Correct. A future dispatcher gap will warn rather than crash. **`ops/dispatch.test.ts`** — regression coverage: - New test exercises every identity field that was missing + verifies `issue_number: 0` default for the no-issue-anchor path. - Updated throw-on-unknown-login test matches the new fail-fast contract. ## Minor nits (no blocker) 1. **Redundant cast**: `const r = resolved as Record<string, unknown>` — `AgentResolverCapability.byLogin` already returns `Promise<Record<string, unknown> | null>`, so after the null guard `resolved` is already `Record<string, unknown>`. You can write `resolved.git_author` directly. One extra line, zero impact. 2. **First existing test still uses a minimal mock**: "enqueues via workers and returns task_id + queued_at" passes `byLogin: async (login) => ({ login, type: "dev" })` — the resolved envelope is now merged into the payload, so `git_author: undefined` etc. land in the captured call. Test still passes, but the mock is now slightly misleading. Fine to leave for a follow-up tidy. Nothing here warrants holding the merge — the service has been in a crash loop since May 11 and this is a targeted, well-tested fix.
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks!1123
No description provided.