agents: pluggable checkpoint store — wire cursor AgentCheckpointStore + claude session-map equivalent #958

Closed
opened 2026-05-08 12:06:52 +00:00 by claude-desktop · 1 comment
Collaborator

User story

As an operator I want session-resume to be observable, debuggable, and provider-symmetric (cursor + claude-code), so a stuck or stale session shows up clearly in the dashboard and can be force-cleared without restarting the service.

Context

Today:

  • Cursor sessions are stored in our SQLite session map keyed by cursor:agent-<uuid> (see cursor-sdk-adapter.ts).
  • Claude Code sessions are stored in the same map keyed by the SDK's session id.
  • @cursor/sdk exposes a richer surface — AgentCheckpointStore, AgentRunStore, RunEventStore — that we don't plug into. These would let us persist intermediate checkpoints and replay run events from disk, not just the latest session id.

Goals:

  • One in-process CheckpointStore interface used by both adapters.
  • Backed by the existing SQLite database (no new infra).
  • Exposed via a small admin surface for operator debugging (list, drop, force-clear).

Acceptance criteria

Shared interface

  • CheckpointStore in apps/server/src/infrastructure/agent/:
    • getSession(key): Promise<string | null> (already exists; lift into the interface)
    • setSession(key, sessionId): Promise<void> (exists)
    • dropSession(key): Promise<void> (exists)
    • listSessions({ agent? , provider? }): Promise<{ key, sessionId, provider, createdAt, lastUsedAt }[]> — new
    • getCheckpoint(sessionId): Promise<Checkpoint | null> — new (cursor-only data; claude returns null)
    • appendRunEvent(sessionId, event): Promise<void> — new (cursor-only persisted run-event log)

SQLite schema

  • New agent_session table (or extend the existing session map): (key TEXT PK, session_id TEXT NOT NULL, provider TEXT NOT NULL, created_at INTEGER, last_used_at INTEGER).
  • New agent_checkpoint table for cursor: (session_id TEXT PK, checkpoint_blob BLOB, updated_at INTEGER).
  • New agent_run_event table for cursor: (session_id TEXT, seq INTEGER, payload TEXT, ts INTEGER, PRIMARY KEY (session_id, seq)).
  • Drizzle migration + schema entries.

Cursor adapter

  • Pass our CheckpointStore adapter into Agent.create({ stores: { checkpointStore, runStore } }) (verify the cursor SDK's actual option names against current published types). On appendRunEvent, persist to the new table.
  • On Agent.resume(...), the checkpoint store satisfies cursor's reads from disk instead of (or in addition to) the cloud.

Claude adapter

  • No change to behaviour — claude SDK doesn't have an equivalent. The shared interface returns null for getCheckpoint / no-ops for appendRunEvent.

Admin surface

  • GET /agents/sessions?agent=<name>&provider=<...> — list sessions.
  • DELETE /agents/sessions/:key — drop a session (operator escape hatch when a stuck resume keeps failing).
  • Frontend: settings / agents / config-history page or a new session-debug drawer renders the list with a drop button per row.

Tests

  • Migration test: clean DB → table exists; pre-existing session map data migrated lossless.
  • Round-trip test: write a checkpoint, resume agent, assert it loaded.
  • Concurrency test: two adapters writing to the same session_id (shouldn't happen, but the schema's PK enforces it).

Out of scope

  • Checkpoint compaction / TTL — file a separate issue for retention policy.
  • Cross-host checkpoint replication — single-host service.

References

  • Parent: #950
  • Cursor SDK stores: node_modules/.bun/@cursor+sdk@1.0.12/node_modules/@cursor/sdk/dist/cjs/public-api.d.ts (AgentCheckpointStore, AgentRunStore, RunEventStore)
  • Existing session map helpers: getSession / setSession / dropSession (search the server tree)
## User story As an operator I want session-resume to be observable, debuggable, and provider-symmetric (cursor + claude-code), so a stuck or stale session shows up clearly in the dashboard and can be force-cleared without restarting the service. ## Context Today: - Cursor sessions are stored in our SQLite session map keyed by `cursor:agent-<uuid>` (see `cursor-sdk-adapter.ts`). - Claude Code sessions are stored in the same map keyed by the SDK's session id. - `@cursor/sdk` exposes a richer surface — `AgentCheckpointStore`, `AgentRunStore`, `RunEventStore` — that we don't plug into. These would let us persist intermediate checkpoints and replay run events from disk, not just the latest session id. Goals: - One in-process `CheckpointStore` interface used by both adapters. - Backed by the existing SQLite database (no new infra). - Exposed via a small admin surface for operator debugging (list, drop, force-clear). ## Acceptance criteria ### Shared interface - [ ] `CheckpointStore` in `apps/server/src/infrastructure/agent/`: - `getSession(key): Promise<string | null>` (already exists; lift into the interface) - `setSession(key, sessionId): Promise<void>` (exists) - `dropSession(key): Promise<void>` (exists) - `listSessions({ agent? , provider? }): Promise<{ key, sessionId, provider, createdAt, lastUsedAt }[]>` — new - `getCheckpoint(sessionId): Promise<Checkpoint | null>` — new (cursor-only data; claude returns null) - `appendRunEvent(sessionId, event): Promise<void>` — new (cursor-only persisted run-event log) ### SQLite schema - [ ] New `agent_session` table (or extend the existing session map): `(key TEXT PK, session_id TEXT NOT NULL, provider TEXT NOT NULL, created_at INTEGER, last_used_at INTEGER)`. - [ ] New `agent_checkpoint` table for cursor: `(session_id TEXT PK, checkpoint_blob BLOB, updated_at INTEGER)`. - [ ] New `agent_run_event` table for cursor: `(session_id TEXT, seq INTEGER, payload TEXT, ts INTEGER, PRIMARY KEY (session_id, seq))`. - [ ] Drizzle migration + schema entries. ### Cursor adapter - [ ] Pass our `CheckpointStore` adapter into `Agent.create({ stores: { checkpointStore, runStore } })` (verify the cursor SDK's actual option names against current published types). On `appendRunEvent`, persist to the new table. - [ ] On `Agent.resume(...)`, the checkpoint store satisfies cursor's reads from disk instead of (or in addition to) the cloud. ### Claude adapter - [ ] No change to behaviour — claude SDK doesn't have an equivalent. The shared interface returns null for `getCheckpoint` / no-ops for `appendRunEvent`. ### Admin surface - [ ] `GET /agents/sessions?agent=<name>&provider=<...>` — list sessions. - [ ] `DELETE /agents/sessions/:key` — drop a session (operator escape hatch when a stuck resume keeps failing). - [ ] Frontend: settings / agents / config-history page or a new session-debug drawer renders the list with a drop button per row. ### Tests - [ ] Migration test: clean DB → table exists; pre-existing session map data migrated lossless. - [ ] Round-trip test: write a checkpoint, resume agent, assert it loaded. - [ ] Concurrency test: two adapters writing to the same session_id (shouldn't happen, but the schema's PK enforces it). ## Out of scope - Checkpoint compaction / TTL — file a separate issue for retention policy. - Cross-host checkpoint replication — single-host service. ## References - Parent: #950 - Cursor SDK stores: `node_modules/.bun/@cursor+sdk@1.0.12/node_modules/@cursor/sdk/dist/cjs/public-api.d.ts` (`AgentCheckpointStore`, `AgentRunStore`, `RunEventStore`) - Existing session map helpers: `getSession` / `setSession` / `dropSession` (search the server tree)
Collaborator

🦵 @charles kicked the queue — re-running address-review on @code-lead.

🦵 @charles kicked the queue — re-running address-review on @code-lead.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks#958
No description provided.