fix(tasks): persist task_history row at task START to survive restart kills #1112
No reviewers
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks!1112
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "dev/1107"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #1107.
Summary
startTask()inserts astatus='running'row intask_historythe moment a worker transitions torunning(inonStart). A SIGKILL between dispatch and completion now leaves an audit row instead of a blank.persistTask()(called fromonFinish) remains an upsert — it overwrites therunningrow with the terminal status,finished_at, token counts, and cost. No duplicate inserts.scanInterruptedAtBootnow callsmarkRunningAsInterrupted()first — every orphanedrunningrow from a prior killed process is transitioned tointerruptedwithfinished_at = now, making it visible tolistUnresolvedInterruptedTasksand the existing session-invalidity sweep (sweepInvalidSessionsalready rejects anyrunningrow as non-success, so no ordering dependency).Schema
Migration
0003_nullable_finished_at_tasks.sqlrebuildstask_history(SQLiteCREATE/INSERT/DROP/RENAMEpattern) to makefinished_atnullable for in-flight rows, and adds astatusindex for the boot-scan query.Test plan
task-store.test.tscover: start-time row content, idempotency, finish-time update (no double-insert), emptymarkRunningAsInterrupted, two-row transition, session-scan visibility, completed rows not re-transitioned.bun x turbo run test --filter=@claude-hooks/server).Review — fix(tasks): persist task_history row at task START (#1107)
CI ✅ green · mergeable ✅
Overview
Adds a two-phase
task_historylifecycle: astatus='running'row is inserted the moment a worker starts (startTask/onStart), and the existingpersistTaskupsert overwrites it at completion. A SIGKILL between those two points now leaves an auditable row rather than a blank. Boot-time recovery is fixed by callingmarkRunningAsInterrupted()before the HTTP server starts, so orphaned rows flow straight intolistUnresolvedInterruptedTasksand the session-invalidity sweep without any ordering dependency.What works well
onConflictDoNothinginstartTaskis the right primitive — double-calls are silently safe.markRunningAsInterrupted()→listUnresolvedInterruptedTasks()→ HTTP server, all before any newstartTaskcalls can fire. The TOCTOU window in the read-then-update pattern insidemarkRunningAsInterruptedis therefore unreachable in production.listTasksForIssuefilter (inArray(status, TERMINAL_STATUSES)) cleanly prevents in-flight rows from polluting the pipeline view. Good defensive move.DROP TABLE IF EXISTS task_history_newguard at the top is good defensive practice against a half-applied previous run.markRunningAsInterrupted, session-scan visibility, and completed-row non-regression. Coverage is solid for the new code paths.hasInFlightTaskaccurately reflects the new DB-level coverage (in-flight tasks are now visible viastartTask) while correctly noting that queued-but-not-yet-started tasks still require the in-memory check.Minor observations (non-blocking)
finished_at: r.finished_at as numbercast inlistTasksForIssue— the cast is safe because theinArray(status, TERMINAL_STATUSES)filter guarantees every returned row has a non-nullfinished_at, but the assertion silently suppresses the nullability at the type level. A short inline comment —// safe: terminal rows always have finished_at set— would make the intent self-documenting for the next reader.finished_at ?? 0fallback inlistUnresolvedInterruptedTasks— using the Unix epoch (1970-01-01) as a sentinel for a timestamp field is a code smell. It's practically unreachable (allinterruptedrows havefinished_atset, either historically under the old NOT NULL constraint or viamarkRunningAsInterrupted), but if it ever fires it silently sorts affected rows to the front of the recovery list.r.started_atorDate.now()would be less surprising as a fallback.ZeroOutputSuccessRow.finished_atwidened tonumber | null— the backing query (listZeroOutputSuccesses) filters exclusively onstatus='success'rows, which always carry a non-nullfinished_at. The type widening is over-broad here; keeping itnumberwould be more precise and would save callers from having to null-guard a field that can never actually be null on that code path.No test for
listTasksForIssueexcluding running rows — the eight new tests are good, but the filtering change inlistTasksForIssue(which is a behaviour change, not just a type fix) has no dedicated coverage. A one-liner that inserts a running row and asserts it doesn't appear inlistTasksForIssueresults would close the gap.None of these are blockers — the core logic is correct and the tests pass. Approving.