cursor-sdk-adapter: visibility parity + cancel-race fix + stall watchdog #950
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Blocks
Reference
charles/claude-hooks#950
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
User story
As an operator running cursor-backed agents, I want visibility parity with claude-code (tool calls, status transitions, subagent dispatches surfaced in the dashboard) and a worker that actually unblocks when I cancel a task, so cursor runs are neither black boxes between dispatch and
resultnor zombie sessions that pin a worker forever.Context
Two distinct bugs in
apps/server/src/infrastructure/agent/cursor-sdk-adapter.tscame up the same day; same file, same review surface, single PR:cursorSdkMessageToTaskEventonly mapsassistant/user/thinking/systemfrom the cursor SDK'sSDKMessageunion. The remaining message types are silently dropped:SDKMessage.typetool_callname,status (running/completed/error),args,result(typedToolCalldiscriminator:EditToolCall,ShellToolCall,ReadToolCall,GrepToolCall,WriteToolCall,GlobToolCall,SemSearchToolCall,LsToolCall,McpToolCall,TaskToolCall,CreatePlanToolCall,UpdateTodosToolCall,DeleteToolCall,ReadLintsToolCall)statusRUNNING/FINISHED/ERROR/CANCELLED/EXPIRED+ optional messagetaskrequestPlus secondary gaps in
runResultToResultEvent:RunResult.git.branches[*].prUrlignored — PR URL extraction goes through a regex onresult.resultsummary string.usage,total_cost_usd,num_turnshardcoded toemptyUsage()/0/0— every observability tile reads zero for cursor runs.event-log.tslogTaskEventsystemswitch only handlessubtype: "api_retry", so the existingcursor_init/cursor_systemevents the adapter does emit are dropped on the way to the SSE stream too.Real-world reproduction (2026-05-08, task
6dbb2c28-5d17-4cb2-a44f-496b10bfceaeon issue #949): journal showedacquiring worktree → running agent (provider: cursor, model: composer-2) → resuming session cursor:agent-…then silence for 40+ minutes while the cursor SDK was streamingtool_callevents for the work being done.Cancel deadlock —
for await (run.stream())does not propagateAbortSignalinto the underlying HTTP/2 fetch. The adapter'sonAbortcallsr.cancel()only whenr.supports("cancel") === true; against an unreachable cloud session this no-ops. The for-await body'sif (req.abort.signal.aborted) break;only fires after a stream event arrives. Result:currentAbort.abort()flips the bit, the loop never re-checks because cursor's stream is silent, and the worker is wedged until process restart.Reproduction same day, same task: operator clicked Cancel + dragged tile to triage + unassigned issue. SQLite
task_history.statusflipped tocancelled(becausecancelRunningTaskInWorkerran andpersistAndBroadcastCancellationupdated the row), butworker.currentTaskstayed pinned and the dashboard kept renderingRUNNING. Onlyjust restartfreed the slot.Acceptance criteria
Part A — Visibility
cursor-sdk-adapter.ts— message mappingcase "tool_call"added tocursorSdkMessageToTaskEvent:status === "running"→tool_progress { toolName: msg.name, text: summarizeArgs(msg) }status === "completed"→tool_summary { summary: summarizeResult(msg) }status === "error"→tool_summary { summary: \error: ${msg.name}` }plus asystem { subtype: "cursor_tool_error" }`summarizeArgs(msg)andsummarizeResult(msg)switch on the typedToolCalldiscriminator from@cursor/sdktypes/tool-call-types:Edit:path:Lstart-Lend(orpath (N replacements))Write:path (N bytes)Shell: first line ofargs.command, truncated to 120 charsRead:path+Lstart-Lendif presentGrep/Glob/SemSearch: pattern + cwdLs: pathMcp:mcpProviderId/mcpToolIdTask: subagent type + first 80 chars of promptJSON.stringify(args).slice(0, 120)case "status"added →system { subtype: \cursor_status_${msg.status.toLowerCase()}`, details: { message } }`.case "task"added →tool_progress { toolName: "subagent", text: msg.text ?? msg.status ?? "" }.case "request"either mapped or explicitly silenced with a comment (decide: probably silence, it's noise).cursor-sdk-adapter.ts— result eventrunResultToResultEventreadsRunResult.git.branches.find(b => b.prUrl)?.prUrland prefers it overr.resultforresultText. Alternative if cleaner: extendResultEventwith optionalprUrl?: stringand drop the regex path inextractProgressfor cursor.numTurns/ token accumulator is plumbed (see Phase 2 follow-ups #951–#953), surface it; otherwise leave0and document that cursor doesn't expose it onRunResultdirectly.event-log.ts—logTaskEventsystemswitch extended to render the new subtypes as visible rows:cursor_init→tool_progress { tool_name: "session", summary: \cursor session ${agent_id}` }` (one-line "session started")cursor_status_*→tool_progress { tool_name: "status", summary: <status name> }cursor_tool_error→errorrowcursor_stalled(see Part C) →errorrow withsummary: "no stream events in N min — assuming cloud-side hang"usercase added so cursor's user-echo (and any future claude-code user echoes) actually surface — currently the entirecaseis missing from the switch, silently dropping everyUserTurn.Part B — Cancel-race fix
cursor-sdk-adapter.ts—runTaskbodyfor await (const ev of run.stream())with an abort-aware iteration:next()in aPromise.race(gen.next(), abortPromise)whereabortPromiserejects onreq.abort.signal.AbortError, exit the loop, run the existingfinally(which disposes the agent + drains the prompt generator).onAborthandler, attemptr.cancel()regardless ofr.supports("cancel")(catch the unsupported-op error). Add an aggressive retry: if the cancel promise hasn't resolved in 10 s, log a warning and proceed with disposal anyway — the worker MUST exit even when cursor cloud is unreachable.result; on abort it should yield a syntheticresult { type: "result", ok: false, subtype: "cancelled" }so downstreamapplyOutcome/task_historypersistence does the right thing.agent[Symbol.asyncDispose]()in the outerfinallyalready covers cleanup but verify it's called on the abort path too (currentlyawaitinside finally — make sure abort doesn't bypass it via an exception bubbling out of the async iterator unexpectedly).Tests
Runwhosestream()yields one event then hangs forever. Fireabort.abort(). Assert the for-await exits in <100 ms, a syntheticresultis yielded, the agent is disposed, and the test does not leak timers or open handles.r.supports("cancel") === true,r.cancel()resolves. Assert the adapter callscancel()and exits.r.supports("cancel") === false. Assert the abort-race path still exits the loop in <100 ms (cancel-attempt is best-effort).r.cancel()returns a never-resolving promise. Assert the 10 s hard-timeout fires, the warning is logged, disposal still runs.Part C — Stall watchdog
for awaitloop. Each yielded event resets it. Timer fire = emit asystem { subtype: "cursor_stalled", details: { last_event_at, agent_id, run_id } }event (visible in the timeline per Part A, and feeding the operator decision to cancel).finally.cursor_stalledevent after ~5 min (use fake timers).Out of scope (filed as follow-ups, do NOT bundle into this PR)
Run.conversation()replay on crash — #956.shell_output_deltafor claude-code — #957.References
apps/server/src/infrastructure/agent/cursor-sdk-adapter.tsapps/server/src/infrastructure/agent/claude-port.tsapps/server/src/infrastructure/event-log.tsnode_modules/.bun/@cursor+sdk@1.0.12/node_modules/@cursor/sdk/dist/cjs/messages.d.tsnode_modules/.bun/@cursor+sdk@1.0.12/node_modules/@cursor/sdk/dist/cjs/types/tool-call-types.d.tsnode_modules/.bun/@cursor+sdk@1.0.12/node_modules/@cursor/sdk/dist/cjs/types/delta-types.d.tsapps/server/src/domain/dispatch/cancel.ts(cancelRunningTaskInWorker).cancelledbut worker stayed wedged untiljust restart(2026-05-08).cursor-sdk-adapter: surface tool_call / status / task events for visibility parity with claude-codeto cursor-sdk-adapter: visibility parity + cancel-race fix + stall watchdog