Stats: GET /stats endpoint — per-agent + per-repo cost/turns/success aggregates #123

New issue

Closed

opened 2026-04-20 10:22:14 +00:00 by claude-desktop · 0 comments

claude-desktop commented

2026-04-20 10:22:14 +00:00

Collaborator

User story

As the operator, I want a GET /stats HTTP endpoint that returns cost / turn / success-rate / throughput aggregates grouped by agent, by repo, and by day so that the dashboard (current and the #70 rework) can show spend trends without every page recomputing from raw /history on the client.

Context

/history currently returns up to 50 most-recent TaskRecords with cost_usd, turns, status, started_at, finished_at. Clients derive everything from that raw list.

Two problems at scale:

50-task cap cuts off older data — no month-long trend.
Every dashboard tab recomputes the same aggregates client-side on every reload.

This ticket adds a server-side aggregation that reads from SQLite (task history beyond the in-memory 50 cap once we persist it) and returns a compact summary. The aggregates are defined so any future dashboard surface can consume them without re-deriving.

Acceptance criteria

Endpoint shape

GET /stats — default window = last 30 days — returns:

{
  "window": { "from": "<iso>", "to": "<iso>", "days": 30 },
  "totals": { "tasks": 123, "cost_usd": 4.56, "turns": 789, "success_rate": 0.92 },
  "by_agent": [
    {
      "name": "boss-default",
      "type": "boss",
      "tasks": 40,
      "cost_usd": 2.10,
      "turns": 320,
      "success_rate": 0.95,
      "avg_turns_per_task": 8.0,
      "avg_cost_per_task": 0.0525
    }
  ],
  "by_repo": [
    { "repo": "charles/claude-hooks", "tasks": 120, "cost_usd": 4.50, "success_rate": 0.93 }
  ],
  "by_day": [
    { "day": "2026-04-20", "tasks": 12, "cost_usd": 0.42, "success_rate": 0.83 }
  ]
}

Query params:
- ?window=7d / 30d / 90d / all — controls the time range. Default 30d.
- ?agent=<name> — filter to one instance.
- ?repo=<owner/name> — filter to one repo.
All aggregates are computed from TaskRecords where status is one of success, failure, cancelled. running / queued are excluded (they have no finished_at and no final cost).
success_rate = successful / (successful + failed) — excludes cancelled (operator chose to abort, not an agent failure). Document this in the endpoint's response docstring.

Data source

If SQLite already persists history rows beyond the 50-task in-memory cap, read from there. If it doesn't, extend src/storage.ts to persist each finalized TaskRecord on onFinish. The in-memory 50-task list remains for fast SSE-driven UI refreshes; SQLite becomes the source of truth for aggregates.
The new SQLite table (if added) owns these columns: id TEXT PRIMARY KEY, repo TEXT, issue_number INT, user TEXT, model TEXT, status TEXT, cost_usd REAL, turns INT, started_at INT, finished_at INT.

Performance

30-day aggregation over 10k+ rows must return in <100ms. SQL is fine; an index on finished_at and a filter by finished_at BETWEEN ? handles this.
No in-memory scan over the full history. Queries must be bounded by the window.

Tests

src/stats.test.ts — seed SQLite with a fixture of ~20 tasks across 3 agents and 2 repos, call /stats with various filters, assert the aggregates.
Edge cases: empty window returns zeros (not NaN for success_rate), single task window, all-cancelled window, filter to a non-existent agent returns empty by_agent without 404.

Docs

Update CLAUDE.md's Modules table with the new storage.ts additions (if applicable).
Brief section in README.md on the /stats endpoint (URL + query params + response shape).

Out of scope

Dashboard UI consuming /stats — that's the monitor rework ticket (#70's follow-up). This ticket just ships the endpoint.
Historical backfill — when SQLite persistence lands, it only captures tasks going forward. No backfill from /history's in-memory list.
Cost forecasting / anomaly detection — raw aggregates only. Any trend math is UI / future-ticket territory.
Auth on /stats — inherits whatever the dashboard has today (none).

References

Current history source: src/main.ts::handleHistory reads taskHistory (in-memory array of TaskRecord).
TaskRecord shape: src/main.ts lines ~60-80.
SQLite helper: src/db.ts (was added in #48).
Storage module: src/storage.ts.

Dependencies

Blocked by: nothing.
Blocks: nothing (consumer UI is in a separate ticket).
Branch off: main.

## User story As the **operator**, I want a `GET /stats` HTTP endpoint that returns cost / turn / success-rate / throughput aggregates grouped by agent, by repo, and by day so that the dashboard (current and the #70 rework) can show spend trends without every page recomputing from raw `/history` on the client. ## Context `/history` currently returns up to 50 most-recent `TaskRecord`s with `cost_usd`, `turns`, `status`, `started_at`, `finished_at`. Clients derive everything from that raw list. Two problems at scale: 1. 50-task cap cuts off older data — no month-long trend. 2. Every dashboard tab recomputes the same aggregates client-side on every reload. This ticket adds a server-side aggregation that reads from SQLite (task history beyond the in-memory 50 cap once we persist it) and returns a compact summary. The aggregates are defined so any future dashboard surface can consume them without re-deriving. ## Acceptance criteria ### Endpoint shape - [ ] `GET /stats` — default window = last 30 days — returns: ```json { "window": { "from": "<iso>", "to": "<iso>", "days": 30 }, "totals": { "tasks": 123, "cost_usd": 4.56, "turns": 789, "success_rate": 0.92 }, "by_agent": [ { "name": "boss-default", "type": "boss", "tasks": 40, "cost_usd": 2.10, "turns": 320, "success_rate": 0.95, "avg_turns_per_task": 8.0, "avg_cost_per_task": 0.0525 } ], "by_repo": [ { "repo": "charles/claude-hooks", "tasks": 120, "cost_usd": 4.50, "success_rate": 0.93 } ], "by_day": [ { "day": "2026-04-20", "tasks": 12, "cost_usd": 0.42, "success_rate": 0.83 } ] } ``` - [ ] Query params: - `?window=7d` / `30d` / `90d` / `all` — controls the time range. Default `30d`. - `?agent=<name>` — filter to one instance. - `?repo=<owner/name>` — filter to one repo. - [ ] All aggregates are computed from `TaskRecord`s where `status` is one of `success`, `failure`, `cancelled`. `running` / `queued` are excluded (they have no `finished_at` and no final cost). - [ ] `success_rate` = `successful / (successful + failed)` — excludes cancelled (operator chose to abort, not an agent failure). Document this in the endpoint's response docstring. ### Data source - [ ] If SQLite already persists history rows beyond the 50-task in-memory cap, read from there. If it doesn't, extend `src/storage.ts` to persist each finalized `TaskRecord` on `onFinish`. The in-memory 50-task list remains for fast SSE-driven UI refreshes; SQLite becomes the source of truth for aggregates. - [ ] The new SQLite table (if added) owns these columns: `id TEXT PRIMARY KEY`, `repo TEXT`, `issue_number INT`, `user TEXT`, `model TEXT`, `status TEXT`, `cost_usd REAL`, `turns INT`, `started_at INT`, `finished_at INT`. ### Performance - [ ] 30-day aggregation over 10k+ rows must return in <100ms. SQL is fine; an index on `finished_at` and a filter by `finished_at BETWEEN ?` handles this. - [ ] No in-memory scan over the full history. Queries must be bounded by the window. ### Tests - [ ] `src/stats.test.ts` — seed SQLite with a fixture of ~20 tasks across 3 agents and 2 repos, call `/stats` with various filters, assert the aggregates. - [ ] Edge cases: empty window returns zeros (not `NaN` for `success_rate`), single task window, all-cancelled window, filter to a non-existent agent returns empty `by_agent` without 404. ### Docs - [ ] Update CLAUDE.md's Modules table with the new `storage.ts` additions (if applicable). - [ ] Brief section in `README.md` on the `/stats` endpoint (URL + query params + response shape). ## Out of scope - **Dashboard UI consuming `/stats`** — that's the monitor rework ticket (#70's follow-up). This ticket just ships the endpoint. - **Historical backfill** — when SQLite persistence lands, it only captures tasks going forward. No backfill from `/history`'s in-memory list. - **Cost forecasting / anomaly detection** — raw aggregates only. Any trend math is UI / future-ticket territory. - **Auth on `/stats`** — inherits whatever the dashboard has today (none). ## References - Current history source: `src/main.ts::handleHistory` reads `taskHistory` (in-memory array of `TaskRecord`). - `TaskRecord` shape: `src/main.ts` lines ~60-80. - SQLite helper: `src/db.ts` (was added in #48). - Storage module: `src/storage.ts`. ## Dependencies - **Blocked by:** nothing. - **Blocks:** nothing (consumer UI is in a separate ticket). - **Branch off:** `main`.

claude-desktop added the

area:dashboard

type:user-story

labels

2026-04-20 10:22:56 +00:00