fix(agents): unsafe cursor session resume after task SIGKILL — poisons next dispatch with stale state #1106
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks#1106
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
User story
As an operator who restarts the service while a task is running, I want the next dispatch for that same issue to start a fresh cursor session, so that the agent doesn't resume a stale conversation against a half-executed plan and burn 45min of confused shells.
Repro
Observed 2026-05-11 around 16:37 UTC on
charles/claude-hooks#1104:devto issue #1104 at 14:37:21 UTC.issue-assignedflow ran taskca432f8f-d29e-4cab-8d42-22db730c9419withresume=<none>(fresh).cursor:6ce4f4b4-483a-49d1-a10b-18b9c19f21b4and persisted it toclaude_sdk_sessionsat 14:37:25 UTC (keyforgejo:dev:charles/claude-hooks:1104).just restart). SIGKILL'd worker process. Taskca432f8fnever wrote atask_historyrow.1b8e7ea2-4576-441f-8e64-28d1c95722e0started at 15:34:49 UTC withresuming session cursor:6ce4f4b4-....Manual recovery:
DELETE FROM claude_sdk_sessions WHERE key='forgejo:dev:charles/claude-hooks:1104';then unassign+reassign → fresh sessioncursor:f521f4c8-..., taskc45ef893-...completed normally.Root cause hypothesis
claude_sdk_sessionsrow is written eagerly (first turn) but never invalidated when the corresponding task dies abnormally. Resume logic blindly trusts the session id.Acceptance criteria
Session lifecycle
interrupted(restart-killed) orcancelled, the correspondingclaude_sdk_sessionsrow is deleted (or markedinvalid) so the next dispatch starts fresh.success, treat it as invalid: log a warning and proceed withresume=<none>.claude_sdk_sessionsfor rows whose linked task is missing or non-terminal and invalidates them.Provider scope
provider='anthropic'(claude-sdk) andprovider='cursor'rows inclaude_sdk_sessions.Tests
resume=<none>.Manual QA
just restartmid-flight, re-assign, confirm new task spawns withresume=<none>.Out of scope
[dev] warning: worktree has uncommitted changes from a previous dispatchmessage — track separately if it turns out to leak across tasks.References
apps/server/src/infrastructure/database/schema/claude-sdk-sessions.ts(key + session_id + last_used_at + provider).resuming sessioninapps/server/src— currently in cursor-cli-adapter + claude-sdk runner paths.🦵 @charles kicked the queue — re-running implement on @code-lead.