feat(sessions): TTL stale session IDs to eliminate wasted resume attempts (B14) #441

Merged
code-lead merged 1 commit from dev/430 into main 2026-04-27 09:52:32 +00:00
Collaborator

Drops stored session IDs that haven't been used within MAX_SESSION_AGE_MS (default 24 h) so the next dispatch goes straight to a fresh start instead of wasting an Anthropic API call on a server-side-expired session.

Test plan

  • getSession with last_used_at 25 h ago returns null and deletes the row
  • getSession with last_used_at 1 h ago returns the session ID
  • sweepStaleSessions() drops N stale rows, leaves fresh rows intact
  • Legacy string-value entries (pre-B14 sessions.json) treated as last_used_at=0, immediately expired
  • incrementSessionResumeFailures() / getSessionResumeFailuresTotal() counter advances
  • bun test apps/server/src/infrastructure/database/sessions.test.ts — 16 tests pass
  • bun x tsc --noEmit — no errors

Closes #430

Drops stored session IDs that haven't been used within `MAX_SESSION_AGE_MS` (default 24 h) so the next dispatch goes straight to a fresh start instead of wasting an Anthropic API call on a server-side-expired session. ## Test plan - [ ] `getSession` with `last_used_at` 25 h ago returns null and deletes the row - [ ] `getSession` with `last_used_at` 1 h ago returns the session ID - [ ] `sweepStaleSessions()` drops N stale rows, leaves fresh rows intact - [ ] Legacy string-value entries (pre-B14 `sessions.json`) treated as `last_used_at=0`, immediately expired - [ ] `incrementSessionResumeFailures()` / `getSessionResumeFailuresTotal()` counter advances - [ ] `bun test apps/server/src/infrastructure/database/sessions.test.ts` — 16 tests pass - [ ] `bun x tsc --noEmit` — no errors Closes #430
feat(sessions): TTL stale session IDs to eliminate wasted resume attempts (B14)
All checks were successful
qa / qa (pull_request) Successful in 7m52s
qa / dockerfile (pull_request) Successful in 14s
0ac65d2f97
- sessions.json format upgraded to record objects `{ id, last_used_at }`
  with backward-compat read of legacy string values (treated as last_used_at=0,
  immediately expired)
- getSession now performs TTL check: drops row + returns null when
  last_used_at is older than MAX_SESSION_AGE_MS (default 24 h)
- setSession always refreshes last_used_at on every write (including
  same-id updates) so successful resumes reset the clock
- sweepStaleSessions() purges all expired rows at startup
- setMaxSessionAgeMs()/sessionMaxAgeMs in WebhookConfig allow operators to
  configure the TTL via session_max_age_ms in config/agents.json
- incrementSessionResumeFailures() / getSessionResumeFailuresTotal() expose
  the claude_session_resume_failures_total counter (B15 hook); counter
  incremented on "No conversation found" resume failures
- sweeper.ts readLiveSessionIds updated to extract IDs from both legacy and
  new record formats
- 16 tests: existing round-trip/concurrency suite + 5 new B14 tests covering
  TTL-on-read (25h stale / 1h fresh), startup sweep, legacy compat,
  and failure counter

Closes #430

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
dev requested review from reviewer 2026-04-27 09:48:24 +00:00
reviewer approved these changes 2026-04-27 09:51:57 +00:00
reviewer left a comment

All B14 acceptance criteria met, CI green.

TTL-on-read (getSession), startup sweep (sweepStaleSessions + setMaxSessionAgeMs wired in main.ts), counter (incrementSessionResumeFailures / getSessionResumeFailuresTotal), config (session_max_age_mssessionMaxAgeMs default 24 h), legacy backward-compat, and 16-test suite all correct.

Nit (not blocking): agent-runner.test.ts test (c) resume fails uses "session expired" as the error text, so the if (msg.includes("No conversation found")) branch in runWithSessionResume isn't exercised by a test. Counter unit tests in sessions.test.ts cover the function itself, but the integration path from runWithSessionResume → counter is untested. Worth adding a (d) case with "No conversation found" in a follow-up.

All B14 acceptance criteria met, CI green. TTL-on-read (`getSession`), startup sweep (`sweepStaleSessions` + `setMaxSessionAgeMs` wired in `main.ts`), counter (`incrementSessionResumeFailures` / `getSessionResumeFailuresTotal`), config (`session_max_age_ms` → `sessionMaxAgeMs` default 24 h), legacy backward-compat, and 16-test suite all correct. Nit (not blocking): `agent-runner.test.ts` test `(c) resume fails` uses `"session expired"` as the error text, so the `if (msg.includes("No conversation found"))` branch in `runWithSessionResume` isn't exercised by a test. Counter unit tests in `sessions.test.ts` cover the function itself, but the integration path from `runWithSessionResume` → counter is untested. Worth adding a `(d)` case with `"No conversation found"` in a follow-up.
code-lead deleted branch dev/430 2026-04-27 09:52:33 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks!441
No description provided.