charles/claude-hooks

Fork

You've already forked claude-hooks

Code Issues 10 Pull requests Projects Releases Packages 1 Wiki Activity Actions

feat(agents): token economy — caveman mode, prompt caching, model tiering, cost cap as last resort #231

New issue

Closed

opened 2026-04-21 12:48:55 +00:00 by claude-desktop · 0 comments

claude-desktop commented

2026-04-21 12:48:55 +00:00

Collaborator

Copy link

User story

As an operator paying for the agent fleet, I want the system to spend less tokens by default via proven community techniques, with a hard cost cap only as a final safety net, so that routine work (chore tickets, simple reviews) doesn't burn opus-class budget and a runaway loop gets killed before it hurts.

Context: #224 (mid-flight steering impl) cost $15.86 on boss-2 alone. That's normal for an architecture ticket on opus-4.7[1m], but no infrastructure exists today to keep less-ambitious work out of the opus lane, cache expensive system prompts, or abort a genuinely runaway task.

Acceptance criteria

Phase 1 — Investigation + spec

Survey community token-economy techniques. At minimum address:
- Caveman mode — terse system-prompt appendix (Respond in minimal shorthand. No commentary, no emojis, no recap.) for type:chore tickets and routine reviews. Opt-in per instance via prompt_appendix or a type-level caveman: true flag.
- Prompt caching (Anthropic cache_control: ephemeral) on the long static chunks of the system prompt + on large read files. The SDK supports it; we probably don't use it yet.
- Model tiering — route type:chore tickets to haiku-4.5 by default (one of our pool already runs it for reviewer-fast). Escalate to sonnet/opus only when the skill template or operator override requests it.
- Tool result trimming — cap Bash stdout at N lines (configurable), send a <truncated> marker to the SDK instead of the full 100k-line output. Agents re-run with | head / | tail when they need more.
- Context compaction — lean on the SDK's auto-compact more aggressively; tune the threshold down for type:chore sessions.
- Read-file dedup — the SDK already caches re-reads in the same turn, but chained-resume sessions re-fetch. Cache by (repo, path, commit_sha) across turns.
Write the findings into docs/token-economy.md with a table: technique → expected savings → implementation cost → rollout risk.
Decide what to ship in phase 2 — don't try to land everything at once.

Phase 2 — Implementation

Ship the top 2–3 techniques from phase 1's rollout plan.
Each technique is opt-in per agent type via config/agents.json (new fields under types.<type>.token_economy).
Caveman mode ships as a prompt-appendix string in skills/ that any type can reference. Bundled default for type:chore dispatches.
Prompt caching is silent — wire it, measure it, expose the hit rate in the /usage endpoint.

Phase 3 — Cost cap (last resort)

New field types.<type>.max_cost_usd_per_task in agents.json. Defaults: boss: 20, dev: 5, reviewer: 3, designer: 10, design-reviewer: 3, foreman: 15.
agent-runner.ts checks accumulated cost at each turn boundary. At 50% of cap → emit SSE cost_warning envelope (UI toast). At 100% → currentAbort.abort() + task_history row marked cost_capped with reason.
The interrupted status from #221 is a sibling state; cost_capped is its own status.
Operator override: a /raise-cap <issue_num> <new_cap> slash command on the issue bumps the cap for the current task.

Verification

Measurable: after phase 2 ships, the /usage endpoint shows a ≥20% drop in average cost per type:chore task over the following week.
Unit tests for the cost-cap path (mock SDK, assert abort fires at threshold).
Manual smoke: dispatch a type:chore with caveman mode on, confirm the assistant text is terse, the turn count drops.

Out of scope

Switching to a different provider / model family — we stay on Anthropic.
Rewriting skill prompts for verbosity — that's implicit in caveman mode; major skill rewrites are separate.
Per-operator daily budget alerts — file a follow-up if the basic cap isn't enough.

References

apps/server/src/agent-runner.ts — where the SDK query is built + where cost accumulates.
apps/server/src/task-store.ts — task-level cost persistence.
packages/shared/src/task.ts — TaskStatus (add cost_capped).
skills/*.md — existing skill templates, where caveman appendix would plug in.
Anthropic prompt caching docs — cache_control: ephemeral on system/tools/messages.

## User story As an operator paying for the agent fleet, I want the system to **spend less tokens by default** via proven community techniques, with a hard cost cap only as a final safety net, so that routine work (chore tickets, simple reviews) doesn't burn opus-class budget and a runaway loop gets killed before it hurts. Context: #224 (mid-flight steering impl) cost **$15.86 on boss-2 alone**. That's normal for an architecture ticket on opus-4.7[1m], but no infrastructure exists today to keep less-ambitious work out of the opus lane, cache expensive system prompts, or abort a genuinely runaway task. ## Acceptance criteria ### Phase 1 — Investigation + spec - [ ] Survey community token-economy techniques. At minimum address: - **Caveman mode** — terse system-prompt appendix (`Respond in minimal shorthand. No commentary, no emojis, no recap.`) for `type:chore` tickets and routine reviews. Opt-in per instance via `prompt_appendix` or a type-level `caveman: true` flag. - **Prompt caching** (Anthropic `cache_control: ephemeral`) on the long static chunks of the system prompt + on large read files. The SDK supports it; we probably don't use it yet. - **Model tiering** — route `type:chore` tickets to haiku-4.5 by default (one of our pool already runs it for `reviewer-fast`). Escalate to sonnet/opus only when the skill template or operator override requests it. - **Tool result trimming** — cap Bash stdout at N lines (configurable), send a `<truncated>` marker to the SDK instead of the full 100k-line output. Agents re-run with `| head` / `| tail` when they need more. - **Context compaction** — lean on the SDK's auto-compact more aggressively; tune the threshold down for `type:chore` sessions. - **Read-file dedup** — the SDK already caches re-reads in the same turn, but chained-resume sessions re-fetch. Cache by `(repo, path, commit_sha)` across turns. - [ ] Write the findings into `docs/token-economy.md` with a table: technique → expected savings → implementation cost → rollout risk. - [ ] Decide what to ship in phase 2 — don't try to land everything at once. ### Phase 2 — Implementation - [ ] Ship the top 2–3 techniques from phase 1's rollout plan. - [ ] Each technique is opt-in per agent type via `config/agents.json` (new fields under `types.<type>.token_economy`). - [ ] Caveman mode ships as a prompt-appendix string in `skills/` that any type can reference. Bundled default for `type:chore` dispatches. - [ ] Prompt caching is silent — wire it, measure it, expose the hit rate in the `/usage` endpoint. ### Phase 3 — Cost cap (last resort) - [ ] New field `types.<type>.max_cost_usd_per_task` in `agents.json`. Defaults: `boss: 20`, `dev: 5`, `reviewer: 3`, `designer: 10`, `design-reviewer: 3`, `foreman: 15`. - [ ] `agent-runner.ts` checks accumulated cost at each turn boundary. At 50% of cap → emit SSE `cost_warning` envelope (UI toast). At 100% → `currentAbort.abort()` + task_history row marked `cost_capped` with reason. - [ ] The `interrupted` status from #221 is a sibling state; `cost_capped` is its own status. - [ ] Operator override: a `/raise-cap <issue_num> <new_cap>` slash command on the issue bumps the cap for the current task. ### Verification - [ ] Measurable: after phase 2 ships, the `/usage` endpoint shows a ≥20% drop in average cost per `type:chore` task over the following week. - [ ] Unit tests for the cost-cap path (mock SDK, assert abort fires at threshold). - [ ] Manual smoke: dispatch a `type:chore` with caveman mode on, confirm the assistant text is terse, the turn count drops. ## Out of scope - Switching to a different provider / model family — we stay on Anthropic. - Rewriting skill prompts for verbosity — that's implicit in caveman mode; major skill rewrites are separate. - Per-operator daily budget alerts — file a follow-up if the basic cap isn't enough. ## References - `apps/server/src/agent-runner.ts` — where the SDK query is built + where cost accumulates. - `apps/server/src/task-store.ts` — task-level cost persistence. - `packages/shared/src/task.ts` — `TaskStatus` (add `cost_capped`). - `skills/*.md` — existing skill templates, where caveman appendix would plug in. - Anthropic prompt caching docs — `cache_control: ephemeral` on system/tools/messages.

claude-desktop added the

area:agents

type:user-story

labels

2026-04-21 12:50:56 +00:00

code-lead was assigned by claude-desktop

2026-04-21 12:51:05 +00:00

code-lead referenced this issue from a commit

2026-04-21 13:14:20 +00:00

feat(agents): token economy — caveman mode, cost cap, /raise-cap

code-lead referenced this issue from a pull request that will close it,

2026-04-21 13:14:52 +00:00

feat(agents): token economy — caveman mode, cost cap, /raise-cap #244

code-lead referenced this issue from a commit

2026-04-21 13:35:22 +00:00

feat(agents): token economy — caveman mode, cost cap, /raise-cap

charles referenced this issue from a commit

2026-04-21 14:18:54 +00:00

feat(agents): token economy — caveman mode, cost cap, /raise-cap

code-lead referenced this issue from a commit

2026-04-21 14:22:54 +00:00

feat(agents): token economy — caveman mode, cost cap, /raise-cap

charles closed this issue

2026-04-21 14:37:20 +00:00

charles referenced this issue from a commit

2026-04-21 14:37:21 +00:00

feat(agents): token economy — caveman mode, cost cap, /raise-cap (#244)

charles referenced this issue from a commit

2026-04-21 18:13:58 +00:00

chore(agents): make cost cap opt-in (#231 follow-up)

charles referenced this issue from a commit

2026-04-21 18:30:40 +00:00

fix(monitor): task-detail page falls back to SQLite + re-dispatch covers interrupted/cost_capped

architect referenced this issue

2026-04-21 20:23:10 +00:00

docs(specs): add ui-consolidation spec #260

charles referenced this issue from a commit

2026-04-21 20:25:54 +00:00

docs(specs): add ui-consolidation spec (#260)

claude-desktop referenced this issue

2026-04-21 20:29:21 +00:00

chore(skills): terse Forgejo artifacts — shared communication-style rules across every skill #261

code-lead referenced this issue

2026-04-21 20:31:19 +00:00

feat(web): merge Usage route into Stats as tab (UC-3) #264

dev referenced this issue from a commit

2026-04-21 20:36:01 +00:00

chore(skills): add artifact-style fragment and wire to all skill dispatches

dev referenced this issue

2026-04-21 20:36:11 +00:00

chore(skills): terse Forgejo artifacts via shared artifact-style fragment #265

code-lead referenced this issue from a commit

2026-04-21 20:43:38 +00:00

chore(skills): terse Forgejo artifacts via shared artifact-style fragment

dev referenced this issue

2026-04-21 20:50:14 +00:00

feat(web): merge Usage route into Stats as tab (UC-3) #268

code-lead referenced this issue from a commit

2026-04-21 21:02:36 +00:00

feat(web): merge Usage route into Stats as tab (UC-3)

charles referenced this issue from a commit

2026-04-26 11:11:46 +00:00

fix(flows): NF-6 default flow parity — restore prompt_appendix + token_economy

claude-desktop referenced this issue

2026-04-26 19:59:31 +00:00

M22 Tracking — UI consolidation #400

claude-desktop referenced this issue

2026-04-28 20:52:33 +00:00

M22 Tracking — UI consolidation #400

No Branch/Tag specified

main

chore/sync-pre-push-from-forge-base

fix/flows-yaml-dispatch-identity

feat/board-tap-to-assign

dev/1107

code-lead/1106

code-lead/1108

dev/1104

code-lead/1103

code-lead/1080

dev/1087

feat/flows-yaml-ci-events

chore/board-drop-stalled-and-density-controls

fix/flows-yaml-routes-always-register

flows-yaml/api-defaults

dev/1023

fix/event-log-history-bleed

fix/janitor-fix-ci-logs-and-cap

dev/1022

fix/board-card-provider

code-lead/1036

dev/1025

code-lead/1020

dev/1017

code-lead/1026

feat/web-shortcut-registry-1018

dev/1015

code-lead/1009

code-lead/1008

dev/975

dev/969

dev/973

dev/967

code-lead/968

code-lead/953

dev/970

dev/976

code-lead/966

code-lead/956

code-lead/951

dev/962

dev/963

dev/977

dev/955

dev/983

dev/961

dev/974

code-lead/950

code-lead/939

dev/941

dev/940

dev/937

dev/938

dev/936

dev/935

feat/web-i18n-fr-locale

feat/spec-editor-ui-polish

chore/drop-legacy-compat

fix/skills-drop-preview-pane

fix/882-skills-safety-rail

dev/911

dev/909

dev/923

dev/917

dev/915

feat/879-sr11-m2-drop-legacy-skill

code-lead/873

dev/881

code-lead/869

dev/867

code-lead/845

code-lead/843

code-lead/844

dev/837

dev/861

dev/849

code-lead/837

code-lead/842

fix/dedup-rebase-inflight

dev/838

code-lead/847

dev/833

code-lead/848

pr/838

code-lead/841

feat/settings-save-bar/836

code-lead/840

dev/846

code-lead/839

dev/832

fix/board-sse-stale-cache

dev/834

dev/835

feat/settings-breadcrumbs

feat/forge-oauth-credentials

refactor/service-config-consolidation

feat/agent-tokens-to-secrets

feat/gitlab-oauth-to-db

feat/authelia-rip-and-voice-fixes

fix/rebase-storm-and-dead-letter

code-lead/797

code-lead/796

dev/811

code-lead/798

dev/810

code-lead/795

dev/808

code-lead/794

dev/805

dev/802

dev/803

feat/avatar-menu-settings-entry

feat/per-agent-token-tracking

dev/793

dev/747

dev/752

code-lead/790

code-lead/759

dev/756

dev/760

dev/741

dev/767

dev/740

dev/709

dev/644

dev/637

boss/614

dev/600

dev/611

dev/585

fix/login-bonus-fixes

boss/544

dev/542

refactor/api-prefix-and-session-gate

dev/489

boss/531

boss/518

dev/499

boss/516

dev/530

dev/517

dev/519

dev/515

dev/522

dev/503

dev/471

boss/329

dev/417

dev/418

dev/402

boss/327

dev/334

dev/332

boss/326

boss/325

dev/331

boss/324

boss/323

boss/322

dev/294

test/s11-task-analytics

dev/262

boss/270

dev/268

foreman/ui-consolidation-spec

dev/234

boss/196

boss/176

boss/164

fix/124-session-persist-bind

boss/52

dev/87

boss/73

dev/77

dev/81

dev/82

boss/79

dev/42

dev/35

boss/7

No results found.

Labels

Clear labels

area:agents

Agent types, pool scheduling, per-instance config

area:dashboard

Dashboard UI and observability surfaces

area:database

DB layer — schema, migrations, ORM, raw SQL

area:design

UI/UX mockup work — routes to designer agent

area:design-review

Design review dispatch — routes to design-reviewer agent

area:flows

Flow runner — YAML loader, executor, op registry, expression eval

area:infra

Deployment, isolation, containers, systemd units

area:meta

Tracking, scaffolding, project setup

area:security

Security — routes to reviewer-security (opus)

area:sessions

Session-id store, Claude SDK resume logic

area:webhook

Forgejo webhook routing and handlers

area:workdir

Clone cache, worktrees, git identity

security

Security-sensitive issue

Tracking or decisions, not implementation work

No labels

Milestone

Clear milestone

No items

No milestone

Projects

Clear projects

No items

No project

Assignees

Clear assignees

No assignees

code-lead

1 participant

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

charles/claude-hooks#231

Reference in a new issue

Repository

charles/claude-hooks

Title

Body

No description provided.

Delete branch "%!s()"

Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?

Rows
Columns