fix(shutdown): tasks aborted during drain are marked status=success with no output #221
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks#221
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Bug
When the service receives SIGTERM/SIGINT and runs the four-phase drain, any task that "naturally settles" inside phase 2 (drain budget) has its
task_history.statusset tosuccesseven when the SDK bailed because the HTTP listener closing triggered an indirect abort. The worker's promise resolves via the normal.then()path, so the task-store writer treats it as a success.The symptom is a row with:
status = 'success'turns = NULLcost_usd = NULLfinished_at≈ SIGTERM timestamp exactly…and a matching pipeline row rendered with a green checkmark for work that never produced a commit or a PR. The operator can't tell the difference between "task succeeded in 2 minutes" and "task died silently during a restart."
Repro (2026-04-21 incident — #209, #210)
#208PR merged → propagator auto-assigned#209/#210to dev at13:02:10/11.e5bdebea(#209) andf0920b3b(#210) ran for ~120 s each.13:04:10operator ransystemctl --user restart claude-hooksfor an unrelated fix.[shutdown] task e5bdebea settled after 252ms,[shutdown] task f0920b3b settled after 252ms,[shutdown] all 2 task(s) drained cleanly+drained=2 force_aborted=0.task_historyrows: bothstatus=success,turns=NULL,cost_usd=NULL,finished_at=13:04:10on the dot.dev-2: warning: worktree has uncommitted changes from a previous dispatchon the next dispatch) — confirming the agent had started editing but never reached commit/push.Acceptance criteria
Status accuracy
turns = 0/cost_usd = null/ no recordedresultSDK message is marked with a distinct status — proposed:interrupted(new value) rather than overloadingcancelled(which today means/cancelfrom the operator).cancelledsemantics are unchanged — operator-initiated cancels + phase-3 force-aborts stay oncancelled.TaskStatusinpackages/shared/src/task.ts(or wherever it lives) gets the new value.task-store.tspersistence switches on the shutdown flag.Shutdown wiring
shutdown.tssets a shutdown-in-progress flag observable toworker.ts. When a task settles while that flag is set and the SDK didn't emit aresultmessage (i.e. the agent didn't finish producing output), the worker writesinterrupted.cancelledwith reasonshutdown— keep that, or fold both paths intointerruptedand drop the shutdown-reason column. Up to implementer — pick one, document in a code comment.Pipeline rendering
pipeline.tstreatsinterruptedthe same way it treatscancelledfor stage rendering — not a green checkmark. The implement stage reverts topending(or a newinterruptedpill state if that reads clearer).packages/shared/src/pipeline.tsexports the new stage state if one is introduced;<StagePill>inapps/web/src/components/stage-pill.tsxgets a matching visual (suggest reusing thestalledamber instead of inventing a new color).Auto-recovery
task_historyfor rows whosestatus='interrupted'has not been followed by a subsequent completed task on the same issue. Log each as[recovery] issue #N has interrupted task <id> — consider re-dispatch. Do not auto-dispatch — the operator should decide (some tasks may have produced partial PRs worth reviewing first). Print ajust redispatch-interruptedone-liner in the log so there's a clear next step.Verification
apps/server/src/shutdown.test.ts(ortask-store.test.ts) — fires SIGTERM mid-task with a mocked worker, assertstask_history.status === 'interrupted'.systemctl --user restart claude-hooks, inspecttask_history— row should beinterrupted, pipeline stage should NOT show green.Out of scope
shutdown.drain_ms) — orthogonal knob.References
apps/server/src/shutdown.ts— drain machinery.apps/server/src/worker.ts— wheretask_historyrows are written on task finalize.apps/server/src/task-store.ts—TaskRecordshape + persistence.apps/server/src/pipeline.ts— stage-state derivation fromtask_history.status.CLAUDE.md§"Graceful shutdown (issue #182)" — four-phase drain description (status-writing is not explicit today; this ticket makes it so).e5bdebea-14d6-4358-b266-132e69037fe2(#209) andf0920b3b-9cda-4811-920f-76602ad0a947(#210) — evidence of the false-successclassification.