feat(agents): token economy — caveman mode, prompt caching, model tiering, cost cap as last resort #231
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks#231
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
User story
As an operator paying for the agent fleet, I want the system to spend less tokens by default via proven community techniques, with a hard cost cap only as a final safety net, so that routine work (chore tickets, simple reviews) doesn't burn opus-class budget and a runaway loop gets killed before it hurts.
Context: #224 (mid-flight steering impl) cost $15.86 on boss-2 alone. That's normal for an architecture ticket on opus-4.7[1m], but no infrastructure exists today to keep less-ambitious work out of the opus lane, cache expensive system prompts, or abort a genuinely runaway task.
Acceptance criteria
Phase 1 — Investigation + spec
Respond in minimal shorthand. No commentary, no emojis, no recap.) fortype:choretickets and routine reviews. Opt-in per instance viaprompt_appendixor a type-levelcaveman: trueflag.cache_control: ephemeral) on the long static chunks of the system prompt + on large read files. The SDK supports it; we probably don't use it yet.type:choretickets to haiku-4.5 by default (one of our pool already runs it forreviewer-fast). Escalate to sonnet/opus only when the skill template or operator override requests it.<truncated>marker to the SDK instead of the full 100k-line output. Agents re-run with| head/| tailwhen they need more.type:choresessions.(repo, path, commit_sha)across turns.docs/token-economy.mdwith a table: technique → expected savings → implementation cost → rollout risk.Phase 2 — Implementation
config/agents.json(new fields undertypes.<type>.token_economy).skills/that any type can reference. Bundled default fortype:choredispatches./usageendpoint.Phase 3 — Cost cap (last resort)
types.<type>.max_cost_usd_per_taskinagents.json. Defaults:boss: 20,dev: 5,reviewer: 3,designer: 10,design-reviewer: 3,foreman: 15.agent-runner.tschecks accumulated cost at each turn boundary. At 50% of cap → emit SSEcost_warningenvelope (UI toast). At 100% →currentAbort.abort()+ task_history row markedcost_cappedwith reason.interruptedstatus from #221 is a sibling state;cost_cappedis its own status./raise-cap <issue_num> <new_cap>slash command on the issue bumps the cap for the current task.Verification
/usageendpoint shows a ≥20% drop in average cost pertype:choretask over the following week.type:chorewith caveman mode on, confirm the assistant text is terse, the turn count drops.Out of scope
References
apps/server/src/agent-runner.ts— where the SDK query is built + where cost accumulates.apps/server/src/task-store.ts— task-level cost persistence.packages/shared/src/task.ts—TaskStatus(addcost_capped).skills/*.md— existing skill templates, where caveman appendix would plug in.cache_control: ephemeralon system/tools/messages.