B14 — TTL stale session IDs (close F4 — wasted resume attempts) #430
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks#430
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
As an orchestrator,
I want to drop a stored session ID after 1 resume failure (or after
MAX_SESSION_AGE_MSsince last use),so that the next dispatch goes straight to
start freshwithout a wasted resume attempt.Last night 40 resume failures occurred. Each adds ~3–10 s and one failed Anthropic API call. Pure waste — sessions are TTL'd Anthropic-side and won't come back.
Acceptance criteria
TTL on read
last_used_at; if older thanMAX_SESSION_AGE_MS(default 24 h, configurable inconfig/agents.json), drop the row and return null (caller starts fresh, no resume attempt).TTL on failure
resume failed — No conversation found: delete the session ID from the SQLiteagent_sessionstable immediately. Do not retry.Startup sweep
last_used_atolder thanMAX_SESSION_AGE_MS.Metric
claude_session_resume_failures_totalcounter for ops visibility (also surfaced in B15 watchdog tile).Tests
last_used_at25 h ago → read returns null + row deleted.last_used_at1 h ago → read returns ID.Out of scope
References
docs/specs/automation-hardening.md§4 B14.apps/server/src/infrastructure/database/sessions.ts(or wherever the agent_sessions table lives).