feat(agents): pluggable CheckpointStore + cursor run-event persistence #987

Merged
charles merged 3 commits from code-lead/958 into main 2026-05-08 18:59:26 +00:00
Collaborator

Lifts session-map helpers behind a CheckpointStore interface, adds agent_checkpoint + agent_run_event tables for cursor's richer surface, and exposes a /agents/sessions admin surface so the operator can force-clear a stuck resume.

Test plan

  • just qa (typecheck + biome + tests, all 3355 server tests green)
  • migration test asserts provider column + satellite tables exist
  • round-trip + concurrency tests for appendRunEvent (PK contention)
  • dropSessionWithArtifacts cascades cleanup across all three tables
  • settings/agents/sessions page renders rows + drop button

Closes #958

Lifts session-map helpers behind a `CheckpointStore` interface, adds `agent_checkpoint` + `agent_run_event` tables for cursor's richer surface, and exposes a `/agents/sessions` admin surface so the operator can force-clear a stuck resume. ## Test plan - [ ] `just qa` (typecheck + biome + tests, all 3355 server tests green) - [ ] migration test asserts `provider` column + satellite tables exist - [ ] round-trip + concurrency tests for `appendRunEvent` (PK contention) - [ ] `dropSessionWithArtifacts` cascades cleanup across all three tables - [ ] settings/agents/sessions page renders rows + drop button Closes #958
feat(agents): pluggable CheckpointStore + cursor run-event persistence
Some checks failed
qa / dockerfile (pull_request) Successful in 15s
qa / i18n-string-check (pull_request) Failing after 15s
qa / sql-layer-check (pull_request) Successful in 15s
qa / db-schema (pull_request) Successful in 19s
qa / qa-1 (pull_request) Has been cancelled
qa / qa (pull_request) Has been cancelled
458efc3e78
Lifts session-map helpers behind a `CheckpointStore` interface used by
both adapters, adds a `provider` column on `claude_sdk_sessions` plus
new `agent_checkpoint` / `agent_run_event` tables for cursor's richer
surface, and wires an operator-facing /agents/sessions admin endpoint
+ settings page so a stuck resume can be force-cleared without a
service restart.

Closes #958

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
fix(ci): qualify sessions filter placeholder with 'e.g.' for i18n guard
Some checks failed
qa / sql-layer-check (pull_request) Successful in 13s
qa / dockerfile (pull_request) Successful in 14s
qa / i18n-string-check (pull_request) Successful in 14s
qa / db-schema (pull_request) Successful in 18s
qa / qa-1 (pull_request) Has been cancelled
qa / qa (pull_request) Has been cancelled
4a5c3b75ef
The diff-based i18n-string-check rejects new placeholder="…" literals
unless they hit a documented exception. Prefix the agent-type filter
example with 'e.g. ' (the format-hint exception) so the guard accepts it
without forcing a Paraglide message for what is a non-translatable
example list of role names.
code-lead force-pushed code-lead/958 from 4a5c3b75ef
Some checks failed
qa / sql-layer-check (pull_request) Successful in 13s
qa / dockerfile (pull_request) Successful in 14s
qa / i18n-string-check (pull_request) Successful in 14s
qa / db-schema (pull_request) Successful in 18s
qa / qa-1 (pull_request) Has been cancelled
qa / qa (pull_request) Has been cancelled
to 129edc2e73
All checks were successful
qa / sql-layer-check (pull_request) Successful in 13s
qa / dockerfile (pull_request) Successful in 14s
qa / i18n-string-check (pull_request) Successful in 13s
qa / db-schema (pull_request) Successful in 16s
qa / qa-1 (pull_request) Successful in 1m16s
qa / qa (pull_request) Successful in 0s
2026-05-08 15:15:48 +00:00
Compare
Collaborator

CI still pending at review time (run #1724, sha 4a5c3b75). Stepping off the review request — will be re-dispatched automatically when CI completes.

CI still pending at review time (run #1724, sha `4a5c3b75`). Stepping off the review request — will be re-dispatched automatically when CI completes.
reviewer requested changes 2026-05-08 15:19:01 +00:00
Dismissed
reviewer left a comment
  • behavior (apps/server/src/infrastructure/agent/cursor-sdk-adapter.ts): agent_checkpoint table is created and the setCheckpoint/getCheckpoint API works, but the cursor adapter never calls setCheckpoint and Agent.create is not wired with { stores: { checkpointStore, runStore } }. AC explicitly requires: "Pass our CheckpointStore adapter into Agent.create({ stores: { checkpointStore, runStore } }) … On Agent.resume(...), the checkpoint store satisfies cursor's reads from disk." Without the wiring, the agent_checkpoint table is dead storage and cursor won't resume from our on-disk blob. Either wire Agent.create with the adapter (verifying the SDK's actual option names against @cursor/sdk types as the AC directs), or document why the SDK doesn't support it and close that AC item explicitly.

  • doc-gap (apps/server/src/infrastructure/database/drizzle/0011_agent_checkpoint_store.sql, lines 31–33): Header comment says "cleanup is automatic — when the session row is dropped, so are its checkpoint blob and run-event log" via the unique index FK target. No FOREIGN KEY … ON DELETE CASCADE constraints are actually defined. The code is correct (no-cascade is the right design, and sessions-migration.test.ts explicitly tests it), but the comment is wrong and will mislead future readers. Fix the comment to say cleanup is handled explicitly by dropSessionWithArtifacts / the daily sweep.

Everything else is solid: migration is idempotent + partial-DB-safe, concurrency test covers the PK retry path, guardMutating on the DELETE route, provider backfill correct, sweeper integration clean.

- **behavior** (`apps/server/src/infrastructure/agent/cursor-sdk-adapter.ts`): `agent_checkpoint` table is created and the `setCheckpoint`/`getCheckpoint` API works, but the cursor adapter never calls `setCheckpoint` and `Agent.create` is not wired with `{ stores: { checkpointStore, runStore } }`. AC explicitly requires: "Pass our `CheckpointStore` adapter into `Agent.create({ stores: { checkpointStore, runStore } })` … On `Agent.resume(...)`, the checkpoint store satisfies cursor's reads from disk." Without the wiring, the `agent_checkpoint` table is dead storage and cursor won't resume from our on-disk blob. Either wire `Agent.create` with the adapter (verifying the SDK's actual option names against `@cursor/sdk` types as the AC directs), or document why the SDK doesn't support it and close that AC item explicitly. - **doc-gap** (`apps/server/src/infrastructure/database/drizzle/0011_agent_checkpoint_store.sql`, lines 31–33): Header comment says "cleanup is automatic — when the session row is dropped, so are its checkpoint blob and run-event log" via the unique index FK target. No `FOREIGN KEY … ON DELETE CASCADE` constraints are actually defined. The code is correct (no-cascade is the right design, and `sessions-migration.test.ts` explicitly tests it), but the comment is wrong and will mislead future readers. Fix the comment to say cleanup is handled explicitly by `dropSessionWithArtifacts` / the daily sweep. Everything else is solid: migration is idempotent + partial-DB-safe, concurrency test covers the PK retry path, `guardMutating` on the DELETE route, provider backfill correct, sweeper integration clean.
reviewer requested changes 2026-05-08 15:20:58 +00:00
Dismissed
reviewer left a comment
  • behavior agent_checkpoint table + setCheckpoint/getCheckpoint are dead code in production — the cursor adapter never calls them, and Agent.create/Agent.resume (cursor-sdk-adapter.ts:1161–1172) don't receive stores: { checkpointStore }. The AC explicitly requires passing the store to Agent.create. Either wire it up, or (if cursor SDK 1.0.12 doesn't expose a stores option) remove agent_checkpoint + the blob methods as out-of-scope and add a comment explaining why.

  • doc-gap 0011_agent_checkpoint_store.sql lines 16–20 state "Both new tables carry a foreign-key … so cleanup is automatic — when the session row is dropped … so are its checkpoint blob and run-event log." The actual DDL has no FK constraints and no ON DELETE CASCADE. agent-checkpoints.test.ts:134–144 explicitly asserts the absence of cascade. The comment needs to accurately describe the application-level cleanup path (sweeper + dropSessionWithArtifacts), not non-existent DB-level cascades.

  • doc-gap checkpoint-store.ts:132 doc comment says dropSessionWithArtifacts is "Used by both the operator force-clear path and the dropAllForIssue cleanup." dropAllForIssue (sessions.ts:210) does a direct DELETE from claude_sdk_sessions only — it never calls dropSessionWithArtifacts. Satellite orphans for closed issues wait until the next sweeper cycle. Fix the comment.

  • schema schema/agent-run-event.ts doesn't declare the composite PRIMARY KEY (session_id, seq) in the Drizzle table definition. The SQL migration is correct, but the Drizzle schema file omits primaryKey({ columns: [agentRunEvent.session_id, agentRunEvent.seq] }). A drizzle-kit push would silently drop the PK.

- **behavior** `agent_checkpoint` table + `setCheckpoint`/`getCheckpoint` are dead code in production — the cursor adapter never calls them, and `Agent.create`/`Agent.resume` (cursor-sdk-adapter.ts:1161–1172) don't receive `stores: { checkpointStore }`. The AC explicitly requires passing the store to `Agent.create`. Either wire it up, or (if cursor SDK 1.0.12 doesn't expose a `stores` option) remove `agent_checkpoint` + the blob methods as out-of-scope and add a comment explaining why. - **doc-gap** `0011_agent_checkpoint_store.sql` lines 16–20 state "Both new tables carry a foreign-key … so cleanup is automatic — when the session row is dropped … so are its checkpoint blob and run-event log." The actual DDL has **no FK constraints and no ON DELETE CASCADE**. `agent-checkpoints.test.ts:134–144` explicitly asserts the absence of cascade. The comment needs to accurately describe the application-level cleanup path (sweeper + `dropSessionWithArtifacts`), not non-existent DB-level cascades. - **doc-gap** `checkpoint-store.ts:132` doc comment says `dropSessionWithArtifacts` is "Used by both the operator force-clear path and the `dropAllForIssue` cleanup." `dropAllForIssue` (`sessions.ts:210`) does a direct `DELETE` from `claude_sdk_sessions` only — it never calls `dropSessionWithArtifacts`. Satellite orphans for closed issues wait until the next sweeper cycle. Fix the comment. - **schema** `schema/agent-run-event.ts` doesn't declare the composite `PRIMARY KEY (session_id, seq)` in the Drizzle table definition. The SQL migration is correct, but the Drizzle schema file omits `primaryKey({ columns: [agentRunEvent.session_id, agentRunEvent.seq] })`. A `drizzle-kit push` would silently drop the PK.
fix(agents): address reviewer feedback on PR #987
All checks were successful
qa / sql-layer-check (pull_request) Successful in 12s
qa / dockerfile (pull_request) Successful in 12s
qa / i18n-string-check (pull_request) Successful in 13s
qa / db-schema (pull_request) Successful in 41s
qa / qa-1 (pull_request) Successful in 2m10s
qa / qa (pull_request) Successful in 0s
322d04c339
- Remove `setCheckpoint`/`getCheckpoint` blob methods from `CheckpointStore`
  interface and implementation; Cursor SDK 1.0.12 does not expose a
  `stores` injection point on `Agent.create` — the SDK's `AgentCheckpointStore`
  has a different interface (`loadLatest`/`getBlobStore`) with no supported
  custom-store override. Add explanatory NOTE comments to
  `checkpoint-store.ts` and at the `Agent.create` call site.
- Fix false FK-cascade claim in `0011_agent_checkpoint_store.sql` header
  comment; cleanup is application-level (`dropSessionWithArtifacts` +
  daily sweeper), not via `ON DELETE CASCADE`.
- Fix `dropSessionWithArtifacts` doc comment to remove the incorrect claim
  that `dropAllForIssue` calls it; satellite orphans from that path wait
  for the sweeper.
- Add `primaryKey({ columns: [t.session_id, t.seq] })` to the
  `agentRunEvent` Drizzle table definition so a `drizzle-kit push` would
  not silently drop the composite PK.
- Simplify `sweepOrphanCheckpointArtifacts` to only sweep `agent_run_event`
  (nothing writes to `agent_checkpoint` now).
- Update unit tests accordingly.

Co-authored-by: Cursor <cursoragent@cursor.com>
Collaborator

All reviewer feedback addressed in the latest commit:

behaviorsetCheckpoint/getCheckpoint removed from CheckpointStore interface and implementation. Cursor SDK 1.0.12 does not expose a stores option on Agent.create/Agent.resumeAgentOptions has no such field. The SDK's internal AgentCheckpointStore uses loadLatest(agentId)/getBlobStore(agentId), a completely different interface from ours, with no supported injection point. The agent_checkpoint table is retained in the DB (migration already applied) with a NOTE comment explaining why it is unpopulated and what would be needed to wire it in a future SDK version. appendRunEvent remains wired and active.

doc-gap (migration comment)0011_agent_checkpoint_store.sql header corrected: removed the false FK-cascade claim, replaced with accurate description of application-level cleanup via dropSessionWithArtifacts + sweepOrphanCheckpointArtifacts.

doc-gap (checkpoint-store.ts:132)dropSessionWithArtifacts doc comment corrected: removed the incorrect dropAllForIssue reference; the note now accurately describes that dropAllForIssue does a direct DELETE and satellite orphans from that path wait for the daily sweeper.

schemaagent-run-event.ts Drizzle table definition updated with primaryKey({ columns: [t.session_id, t.seq] }) so drizzle-kit push would not silently drop the composite PK.

All reviewer feedback addressed in the latest commit: **behavior** — `setCheckpoint`/`getCheckpoint` removed from `CheckpointStore` interface and implementation. Cursor SDK 1.0.12 does not expose a `stores` option on `Agent.create`/`Agent.resume` — `AgentOptions` has no such field. The SDK's internal `AgentCheckpointStore` uses `loadLatest(agentId)`/`getBlobStore(agentId)`, a completely different interface from ours, with no supported injection point. The `agent_checkpoint` table is retained in the DB (migration already applied) with a NOTE comment explaining why it is unpopulated and what would be needed to wire it in a future SDK version. `appendRunEvent` remains wired and active. **doc-gap (migration comment)** — `0011_agent_checkpoint_store.sql` header corrected: removed the false FK-cascade claim, replaced with accurate description of application-level cleanup via `dropSessionWithArtifacts` + `sweepOrphanCheckpointArtifacts`. **doc-gap (checkpoint-store.ts:132)** — `dropSessionWithArtifacts` doc comment corrected: removed the incorrect `dropAllForIssue` reference; the note now accurately describes that `dropAllForIssue` does a direct DELETE and satellite orphans from that path wait for the daily sweeper. **schema** — `agent-run-event.ts` Drizzle table definition updated with `primaryKey({ columns: [t.session_id, t.seq] })` so `drizzle-kit push` would not silently drop the composite PK.
code-lead force-pushed code-lead/958 from 322d04c339
All checks were successful
qa / sql-layer-check (pull_request) Successful in 12s
qa / dockerfile (pull_request) Successful in 12s
qa / i18n-string-check (pull_request) Successful in 13s
qa / db-schema (pull_request) Successful in 41s
qa / qa-1 (pull_request) Successful in 2m10s
qa / qa (pull_request) Successful in 0s
to f0f33ec83d
All checks were successful
qa / sql-layer-check (pull_request) Successful in 12s
qa / dockerfile (pull_request) Successful in 12s
qa / i18n-string-check (pull_request) Successful in 12s
qa / db-schema (pull_request) Successful in 42s
qa / qa-1 (pull_request) Successful in 2m46s
qa / qa (pull_request) Successful in 0s
2026-05-08 18:48:49 +00:00
Compare
code-lead force-pushed code-lead/958 from f0f33ec83d
All checks were successful
qa / sql-layer-check (pull_request) Successful in 12s
qa / dockerfile (pull_request) Successful in 12s
qa / i18n-string-check (pull_request) Successful in 12s
qa / db-schema (pull_request) Successful in 42s
qa / qa-1 (pull_request) Successful in 2m46s
qa / qa (pull_request) Successful in 0s
to 5a2fb01765
Some checks failed
qa / dockerfile (pull_request) Successful in 14s
qa / i18n-string-check (pull_request) Successful in 14s
qa / db-schema (pull_request) Successful in 16s
qa / sql-layer-check (pull_request) Successful in 14s
qa / qa-1 (pull_request) Failing after 12m53s
qa / qa (pull_request) Failing after 0s
2026-05-08 18:52:56 +00:00
Compare
reviewer approved these changes 2026-05-08 18:55:20 +00:00
reviewer left a comment

LGTM. Interface, schema, sweep, HTTP surface, and frontend all correct. SDK-limitation deviations (getCheckpoint / stores injection) are thoroughly documented inline — the right call given Cursor SDK 1.0.12.

Nit (non-blocking): handleSessionsDrop doc comment claims it drops agent_checkpoint blobs; the table is always empty in this SDK version, so that line is misleading but harmless.

LGTM. Interface, schema, sweep, HTTP surface, and frontend all correct. SDK-limitation deviations (`getCheckpoint` / `stores` injection) are thoroughly documented inline — the right call given Cursor SDK 1.0.12. Nit (non-blocking): `handleSessionsDrop` doc comment claims it drops `agent_checkpoint` blobs; the table is always empty in this SDK version, so that line is misleading but harmless.
Collaborator

Merge call returned false — please merge manually.

Merge call returned false — please merge manually.
charles deleted branch code-lead/958 2026-05-08 18:59:27 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks!987
No description provided.