feat(cursor-adapter): visibility parity, cancel-race fix, stall watchdog #978

Merged
reviewer merged 2 commits from code-lead/950 into main 2026-05-08 13:13:15 +00:00
Collaborator

Closes #950.

  • Map cursor tool_call / status / task SDKMessages to domain tool_progress / tool_summary / system events with per-tool arg + result summarisers; lift RunResult.git.branches[].prUrl into resultText; render cursor_init / cursor_status_* / cursor_tool_error / cursor_stalled and previously-dropped user echo turns through event-log.
  • Replace bare for await (run.stream()) with an abort-aware race so cancellation exits in <100 ms even when the cursor cloud stream is silent; best-effort r.cancel() regardless of supports("cancel"), behind a 10 s hard timeout; synthetic result { subtype: "cancelled" } keeps task_history honest.
  • 5-min idle stall watchdog (service_config.watchdogs.cursor_stall_ms) emits system { subtype: "cursor_stalled" } so operators can tell a hung cloud session from a quiet model.

Test plan

  • just qa clean (3311 pass, 28 new in cursor-sdk-adapter.test.ts).
  • Cancel-while-streaming exits in <100 ms with a synthetic cancelled result.
  • cancelWithTimeout resolves within the 10 s budget when cancel() hangs forever.
  • Stall watchdog emits cursor_stalled after the configured threshold (tested with a 30 ms threshold).
Closes #950. - Map cursor `tool_call` / `status` / `task` SDKMessages to domain `tool_progress` / `tool_summary` / `system` events with per-tool arg + result summarisers; lift `RunResult.git.branches[].prUrl` into `resultText`; render `cursor_init` / `cursor_status_*` / `cursor_tool_error` / `cursor_stalled` and previously-dropped `user` echo turns through `event-log`. - Replace bare `for await (run.stream())` with an abort-aware race so cancellation exits in <100 ms even when the cursor cloud stream is silent; best-effort `r.cancel()` regardless of `supports("cancel")`, behind a 10 s hard timeout; synthetic `result { subtype: "cancelled" }` keeps `task_history` honest. - 5-min idle stall watchdog (`service_config.watchdogs.cursor_stall_ms`) emits `system { subtype: "cursor_stalled" }` so operators can tell a hung cloud session from a quiet model. ## Test plan - [x] `just qa` clean (3311 pass, 28 new in `cursor-sdk-adapter.test.ts`). - [x] Cancel-while-streaming exits in <100 ms with a synthetic cancelled result. - [x] `cancelWithTimeout` resolves within the 10 s budget when `cancel()` hangs forever. - [x] Stall watchdog emits `cursor_stalled` after the configured threshold (tested with a 30 ms threshold).
feat(cursor-adapter): visibility parity, cancel-race fix, stall watchdog
All checks were successful
qa / sql-layer-check (pull_request) Successful in 12s
qa / dockerfile (pull_request) Successful in 15s
qa / i18n-string-check (pull_request) Successful in 15s
qa / db-schema (pull_request) Successful in 39s
qa / qa-1 (pull_request) Successful in 1m45s
qa / qa (pull_request) Successful in 0s
10c60cfd26
Closes #950.

- Map cursor SDKMessage `tool_call` / `status` / `task` to domain
  `tool_progress` / `tool_summary` / `system` events with per-tool
  arg + result summarisers (Edit / Shell / Read / Grep / Mcp / Task / …).
- Lift `RunResult.git.branches[].prUrl` into `resultText` so the
  dashboard surfaces the PR URL natively instead of relying on a regex
  over the result blob.
- Extend `event-log.logTaskEvent` to render `cursor_init`,
  `cursor_status_*`, `cursor_tool_error`, `cursor_stalled`, and the
  previously-dropped `user` echo turns.
- Replace the bare `for await (run.stream())` with an abort-aware race
  so a cancelled task exits in <100 ms even when the cursor cloud
  stream is silent. The synthetic `result { subtype: "cancelled" }`
  feeds `applyOutcome` so `task_history` lands the right row.
- Best-effort `r.cancel()` regardless of `supports("cancel")`, behind
  a 10 s hard timeout that always lets the worker exit and dispose.
- 5-min idle stall watchdog (tuneable via
  `service_config.watchdogs.cursor_stall_ms`) emits
  `system { subtype: "cursor_stalled" }` so operators can tell a hung
  cloud session apart from a quiet model.
reviewer requested changes 2026-05-08 12:58:27 +00:00
Dismissed
reviewer left a comment

All AC met, CI green, 28 tests comprehensive. Two issues:

  • behavior cursor-sdk-adapter.ts nextOrAbort error handler (~line 540): any iter.next() rejection resolves as { aborted: true }, so a cursor SDK error (network drop, auth failure mid-stream) records as "cancelled" in task_history. The re-throw via queueMicrotask(() => { throw err }) is an uncaught synchronous throw in a microtask — in Bun this surfaces as an uncaughtException, not an unhandled-promise-rejection, and depending on the process's error handler may terminate the server. Fix: add a { error: unknown } variant to the return union, yield a system { subtype: "cursor_stream_error", details: { error } } event before breaking, then fall through to the cancelled result path so task_history stays consistent. The queueMicrotask rethrow should be dropped entirely.

  • behavior cursor-sdk-adapter.ts streamRunWithStallAndAbort finally block (~line 590): when the loop breaks on abort, stream (the AsyncGenerator from run.stream()) is never explicitly closed — the finally only calls disarmStallTimer(). The outer cancelWithTimeout(run) handles the run-level cancel, but the generator may hold open an HTTP/2 or WebSocket frame buffer independently. Fix: add await stream.return(undefined).catch(() => {}) (or if (Symbol.asyncDispose in stream) await (stream as AsyncDisposable)[Symbol.asyncDispose]()) to the finally block so the generator's internal resources are freed regardless of exit path.

All AC met, CI green, 28 tests comprehensive. Two issues: - **behavior** `cursor-sdk-adapter.ts` `nextOrAbort` error handler (~line 540): any `iter.next()` rejection resolves as `{ aborted: true }`, so a cursor SDK error (network drop, auth failure mid-stream) records as `"cancelled"` in task_history. The re-throw via `queueMicrotask(() => { throw err })` is an uncaught synchronous throw in a microtask — in Bun this surfaces as an `uncaughtException`, not an unhandled-promise-rejection, and depending on the process's error handler **may terminate the server**. Fix: add a `{ error: unknown }` variant to the return union, yield a `system { subtype: "cursor_stream_error", details: { error } }` event before breaking, then fall through to the cancelled result path so task_history stays consistent. The `queueMicrotask` rethrow should be dropped entirely. - **behavior** `cursor-sdk-adapter.ts` `streamRunWithStallAndAbort` finally block (~line 590): when the loop breaks on abort, `stream` (the `AsyncGenerator` from `run.stream()`) is never explicitly closed — the `finally` only calls `disarmStallTimer()`. The outer `cancelWithTimeout(run)` handles the run-level cancel, but the generator may hold open an HTTP/2 or WebSocket frame buffer independently. Fix: add `await stream.return(undefined).catch(() => {})` (or `if (Symbol.asyncDispose in stream) await (stream as AsyncDisposable)[Symbol.asyncDispose]()`) to the `finally` block so the generator's internal resources are freed regardless of exit path.
fix(cursor-adapter): differentiate stream errors from aborts, close generator on exit
All checks were successful
qa / sql-layer-check (pull_request) Successful in 12s
qa / dockerfile (pull_request) Successful in 13s
qa / i18n-string-check (pull_request) Successful in 13s
qa / db-schema (pull_request) Successful in 39s
qa / qa-1 (pull_request) Successful in 2m5s
qa / qa (pull_request) Successful in 0s
9b87f89d15
Review feedback on #978:

- `nextOrAbort` no longer collapses `iter.next()` rejections into the
  `aborted` bucket. The return union gains an `{ error }` variant and the
  `queueMicrotask(throw err)` rethrow is dropped — that path bypassed the
  promise chain and surfaced as `uncaughtException` in Bun, which can
  terminate the worker depending on the host's error handler.
- `streamRunWithStallAndAbort` handles the new variant by yielding a
  `system { subtype: "cursor_stream_error", details: { error } }` envelope
  before falling through to the cancelled-result path so `task_history`
  stays in a terminal state. Network drops / auth failures mid-stream are
  now visible in event-log instead of masquerading as cancels.
- The same `finally` block now calls `stream.return(undefined).catch(() => {})`
  so the underlying `AsyncGenerator` releases its HTTP/2 / WebSocket frame
  buffer regardless of the exit path (abort, error, normal completion).
  `cancelWithTimeout(run)` continues to handle the run-level cancel; this
  is the generator-level handle.
- event-log renders `cursor_stream_error` as an error row mirroring the
  `api_retry` / `cursor_stalled` shape.
- 5 new tests pin: error → cursor_stream_error + cancelled result, no
  unhandledRejection / uncaughtException leakage, and one-time
  `stream.return()` on every exit path.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Author
Collaborator

Both points addressed in 9b87f89:

  1. nextOrAbort error handling — agreed, the queueMicrotask(throw) was a foot-gun. Return union now has an explicit { error: unknown } variant; the rethrow is gone. The caller emits system { subtype: "cursor_stream_error", details: { error, agent_id, run_id } } and falls through to the cancelled-result path so task_history stays terminal. event-log renders cursor_stream_error as a visible error row (same shape as api_retry / cursor_stalled).

  2. Stream cleanup — agreed. streamRunWithStallAndAbort's finally now calls await stream.return(undefined).catch(() => {}) alongside disarmStallTimer(), regardless of exit path. cancelWithTimeout(run) is still wired from runTask.onAbort for the run-level cancel; this closes the per-call generator handle.

5 new tests pin:

  • iter.next() rejection → cursor_stream_error + cancelled result
  • regression guard: stream error does not raise unhandledRejection / uncaughtException
  • stream.return() is called exactly once on each of {normal completion, cancel, stream error}

just qa clean (3316 pass).

Both points addressed in `9b87f89`: 1. **`nextOrAbort` error handling** — agreed, the `queueMicrotask(throw)` was a foot-gun. Return union now has an explicit `{ error: unknown }` variant; the rethrow is gone. The caller emits `system { subtype: "cursor_stream_error", details: { error, agent_id, run_id } }` and falls through to the cancelled-result path so `task_history` stays terminal. event-log renders `cursor_stream_error` as a visible error row (same shape as `api_retry` / `cursor_stalled`). 2. **Stream cleanup** — agreed. `streamRunWithStallAndAbort`'s `finally` now calls `await stream.return(undefined).catch(() => {})` alongside `disarmStallTimer()`, regardless of exit path. `cancelWithTimeout(run)` is still wired from `runTask.onAbort` for the run-level cancel; this closes the per-call generator handle. 5 new tests pin: - `iter.next()` rejection → `cursor_stream_error` + `cancelled` result - regression guard: stream error does **not** raise `unhandledRejection` / `uncaughtException` - `stream.return()` is called exactly once on each of {normal completion, cancel, stream error} `just qa` clean (3316 pass).
reviewer approved these changes 2026-05-08 13:13:10 +00:00
reviewer left a comment

Both round-1 findings resolved: nextOrAbort now returns { error: err } (no queueMicrotask rethrow); streamRunWithStallAndAbort finally block closes the stream generator via stream.return(undefined).catch(() => {}). Tests cover the error path and verify no uncaught exception escapes. CI green.

Both round-1 findings resolved: `nextOrAbort` now returns `{ error: err }` (no `queueMicrotask` rethrow); `streamRunWithStallAndAbort` finally block closes the stream generator via `stream.return(undefined).catch(() => {})`. Tests cover the error path and verify no uncaught exception escapes. CI green.
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks!978
No description provided.