M26-4: audit + spec out real usage_threshold_tokens enforcement #551

Closed
opened 2026-04-29 12:25:05 +00:00 by claude-desktop · 1 comment
Collaborator

As an operator, I want usage_threshold_tokens to actually halt or fail-over an agent when its token budget is exhausted, so that the dashboard ring (today purely cosmetic) reflects a real safety brake and the failover chain (M26-1) can include token_budget as a trigger.

Background

Audit performed during M26 design (PR #547 review):

  • usage_threshold_tokens in config/service.json is parsed by the loader → webhook-config.ts exposes it as usageThresholdTokens: number | null.
  • apps/server/src/main.ts::handleUsage echoes it back on GET /usage as threshold_tokens.
  • The dashboard's usage ring uses it for 50/80/95 % colour tiers.
  • No code path enforces it. Crossing the threshold has zero effect on dispatch, the worker pool, or container lifecycle.

M26-1's failover.triggers enum deliberately excludes token_budget because the trigger has nothing to detect today. This ticket is the prerequisite that brings it back.

Acceptance criteria

Audit (deliverable: docs/specs/token-budget.md)

  • Confirm exhaustively: every read of usageThresholdTokens in the codebase. Document the count and call sites.
  • Confirm: no enforcement gate currently exists. (If one is found, surprise — write it up.)
  • Catalogue what computeUsage() already tracks: window granularity (week/day/all), per-agent vs. global, token type breakdown (input/output/cache).
  • Decision: scope is global (current shape, single threshold) or per-type (lift threshold onto type config). Recommendation in the spec.
  • Decision: budget reset cadence honours existing usage_reset knob ("week" | "day" | "all") or gets a new one.
  • Decision: action on threshold cross — pause all agents (current dashboard semantics), pause specific types, fail-over to a cheaper tier (ties into M26-1), or warn-only with override.

Implementation spec (in the same doc)

  • Sketch the enforcement gate location: post-task hook, dispatcher pre-flight, or background sweeper. Trade-offs documented.
  • Sketch the integration with M26-1's failover.triggers: when token_budget is enabled and the active window's usage ≥ threshold, fire the same tier-bump path as auth_error so failover state is unified.
  • Sketch the operator UX: how does the ring colour change at 100 %? What does the dashboard say happened? Where does the unpause button live?

Follow-up tickets

  • Open M26-5 / 6 / … for actual implementation, scoped from the audit's recommendation. This ticket only ships the audit + spec.

Out of scope

  • Implementation work — gated behind the audit's conclusion.
  • Per-instance budget (today the threshold is global; M26-2 wizard already excludes per-tier budget knobs).

References

  • PR #547 review thread (decision to defer): #547
  • M26-1 (#TBD) — failover.triggers enum that this story re-enables.
  • Existing call sites:
    • apps/server/src/shared/config/webhook-config.ts:469 (declaration)
    • apps/server/src/main.ts:1390 (handleUsage)
    • apps/server/src/shared/config/service-config-schema.ts (Zod field)
As an **operator**, I want `usage_threshold_tokens` to actually halt or fail-over an agent when its token budget is exhausted, so that the dashboard ring (today purely cosmetic) reflects a real safety brake and the failover chain (M26-1) can include `token_budget` as a trigger. ### Background Audit performed during M26 design (PR #547 review): - `usage_threshold_tokens` in `config/service.json` is parsed by the loader → `webhook-config.ts` exposes it as `usageThresholdTokens: number | null`. - `apps/server/src/main.ts::handleUsage` echoes it back on `GET /usage` as `threshold_tokens`. - The dashboard's usage ring uses it for 50/80/95 % colour tiers. - **No code path enforces it.** Crossing the threshold has zero effect on dispatch, the worker pool, or container lifecycle. M26-1's `failover.triggers` enum deliberately excludes `token_budget` because the trigger has nothing to detect today. This ticket is the prerequisite that brings it back. ### Acceptance criteria #### Audit (deliverable: `docs/specs/token-budget.md`) - [ ] Confirm exhaustively: every read of `usageThresholdTokens` in the codebase. Document the count and call sites. - [ ] Confirm: no enforcement gate currently exists. (If one is found, surprise — write it up.) - [ ] Catalogue what `computeUsage()` already tracks: window granularity (week/day/all), per-agent vs. global, token type breakdown (input/output/cache). - [ ] Decision: scope is **global** (current shape, single threshold) or **per-type** (lift threshold onto type config). Recommendation in the spec. - [ ] Decision: budget reset cadence honours existing `usage_reset` knob ("week" | "day" | "all") or gets a new one. - [ ] Decision: action on threshold cross — pause all agents (current dashboard semantics), pause specific types, fail-over to a cheaper tier (ties into M26-1), or warn-only with override. #### Implementation spec (in the same doc) - [ ] Sketch the enforcement gate location: post-task hook, dispatcher pre-flight, or background sweeper. Trade-offs documented. - [ ] Sketch the integration with M26-1's `failover.triggers`: when `token_budget` is enabled and the active window's usage ≥ threshold, fire the same tier-bump path as `auth_error` so failover state is unified. - [ ] Sketch the operator UX: how does the ring colour change at 100 %? What does the dashboard say happened? Where does the unpause button live? #### Follow-up tickets - [ ] Open M26-5 / 6 / … for actual implementation, scoped from the audit's recommendation. This ticket only ships the audit + spec. ### Out of scope - Implementation work — gated behind the audit's conclusion. - Per-instance budget (today the threshold is global; M26-2 wizard already excludes per-tier budget knobs). ### References - PR #547 review thread (decision to defer): https://forge.jacquin.app/charles/claude-hooks/pulls/547 - M26-1 (#TBD) — `failover.triggers` enum that this story re-enables. - Existing call sites: - `apps/server/src/shared/config/webhook-config.ts:469` (declaration) - `apps/server/src/main.ts:1390` (`handleUsage`) - `apps/server/src/shared/config/service-config-schema.ts` (Zod field)
Author
Collaborator

Audit shipped in PR #552 (docs/specs/token-budget.md). Reference cleanups in PR #552 review fixup commit. M26-5/6/7 implementation also shipped in PR #552 follow-up commit (operator opted to skip filing separate tickets and bundle them):

  • M26-5 — per-type usage_threshold_tokens + post-task hook + recordExternalTrigger glue. Loader rejects negative/non-int, strips token_budget from single-tier chains, accepts on multi-tier. Audit's usage_reset: "all" caveat respected — gate skips when window is "all".
  • M26-6 — token-budget input row in the wizard. (Per-type Stats ring deferred — needs Stats page redesign separate from M26.)
  • M26-7 icon on the tier badge when last_failure_kind === "token_budget". Distinct from auth/rate-limit failures at a glance.

Audit-doc cleanup landed in PR #552 review fixup: corrected file:line refs, fixed UsageResult shape snippet (totals not total_tokens, window is object), called out usage_reset: "all" edge case, dual-cap UX caveat, repointed post-task hook reference from agent-runner.ts:489 (helper) to :981 (M26-1 classifier hook).

Closing — audit + spec + implementation all on main.

Audit shipped in PR #552 (`docs/specs/token-budget.md`). Reference cleanups in PR #552 review fixup commit. M26-5/6/7 implementation also shipped in PR #552 follow-up commit (operator opted to skip filing separate tickets and bundle them): - **M26-5** — per-type `usage_threshold_tokens` + post-task hook + `recordExternalTrigger` glue. Loader rejects negative/non-int, strips `token_budget` from single-tier chains, accepts on multi-tier. Audit's `usage_reset: "all"` caveat respected — gate skips when window is `"all"`. - **M26-6** — token-budget input row in the wizard. (Per-type Stats ring deferred — needs Stats page redesign separate from M26.) - **M26-7** — ⛽ icon on the tier badge when `last_failure_kind === "token_budget"`. Distinct from auth/rate-limit failures at a glance. Audit-doc cleanup landed in PR #552 review fixup: corrected file:line refs, fixed `UsageResult` shape snippet (`totals` not `total_tokens`, `window` is object), called out `usage_reset: "all"` edge case, dual-cap UX caveat, repointed post-task hook reference from `agent-runner.ts:489` (helper) to `:981` (M26-1 classifier hook). Closing — audit + spec + implementation all on main.
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks#551
No description provided.