feat(sessions): TTL stale session IDs to eliminate wasted resume attempts (B14) #441
No reviewers
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks!441
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "dev/430"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Drops stored session IDs that haven't been used within
MAX_SESSION_AGE_MS(default 24 h) so the next dispatch goes straight to a fresh start instead of wasting an Anthropic API call on a server-side-expired session.Test plan
getSessionwithlast_used_at25 h ago returns null and deletes the rowgetSessionwithlast_used_at1 h ago returns the session IDsweepStaleSessions()drops N stale rows, leaves fresh rows intactsessions.json) treated aslast_used_at=0, immediately expiredincrementSessionResumeFailures()/getSessionResumeFailuresTotal()counter advancesbun test apps/server/src/infrastructure/database/sessions.test.ts— 16 tests passbun x tsc --noEmit— no errorsCloses #430
- sessions.json format upgraded to record objects `{ id, last_used_at }` with backward-compat read of legacy string values (treated as last_used_at=0, immediately expired) - getSession now performs TTL check: drops row + returns null when last_used_at is older than MAX_SESSION_AGE_MS (default 24 h) - setSession always refreshes last_used_at on every write (including same-id updates) so successful resumes reset the clock - sweepStaleSessions() purges all expired rows at startup - setMaxSessionAgeMs()/sessionMaxAgeMs in WebhookConfig allow operators to configure the TTL via session_max_age_ms in config/agents.json - incrementSessionResumeFailures() / getSessionResumeFailuresTotal() expose the claude_session_resume_failures_total counter (B15 hook); counter incremented on "No conversation found" resume failures - sweeper.ts readLiveSessionIds updated to extract IDs from both legacy and new record formats - 16 tests: existing round-trip/concurrency suite + 5 new B14 tests covering TTL-on-read (25h stale / 1h fresh), startup sweep, legacy compat, and failure counter Closes #430 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>All B14 acceptance criteria met, CI green.
TTL-on-read (
getSession), startup sweep (sweepStaleSessions+setMaxSessionAgeMswired inmain.ts), counter (incrementSessionResumeFailures/getSessionResumeFailuresTotal), config (session_max_age_ms→sessionMaxAgeMsdefault 24 h), legacy backward-compat, and 16-test suite all correct.Nit (not blocking):
agent-runner.test.tstest(c) resume failsuses"session expired"as the error text, so theif (msg.includes("No conversation found"))branch inrunWithSessionResumeisn't exercised by a test. Counter unit tests insessions.test.tscover the function itself, but the integration path fromrunWithSessionResume→ counter is untested. Worth adding a(d)case with"No conversation found"in a follow-up.