feat(agents): token economy — caveman mode, cost cap, /raise-cap #244
No reviewers
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
3 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks!244
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "boss/231"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Ships phases 1–3 of issue #231 — the token-economy spec, caveman mode,
and the last-resort cost cap.
Closes #231.
Phase 1 —
docs/token-economy.mdSurvey of community token-economy techniques with a decision table:
technique → expected savings → implementation cost → rollout risk.
Documents what ships in phases 2 + 3 and what's deferred (read-file
dedup, per-operator daily budgets, SDK-level tool-result trimming
— the upstream hook doesn't exist).
Phase 2 — caveman mode + implicit prompt caching
types.<type>.token_economy.caveman+caveman_labelsinconfig/agents.json. When either triggers, the dispatcher indispatchIssueForAgentappends the terseskills/caveman.mdbodyto the dispatched task. Shipped defaults:
caveman_labels: ["type:chore"]on boss / dev / reviewer sotype:choreticketsget the cheap treatment out of the box.
rate already captured on every persisted task via
cache_read_tokens). No code change needed, just documentation.type:chore) is already supported viaper-instance
modeloverrides — ops flip the SQLite column, nocode change.
Phase 3 — cost cap (the hard safety net)
types.<type>.token_economy.max_cost_usd_per_task+warn_at_pct(default 0.5). The agent-runner accumulates per-turncost from the SDK's
assistant.message.usageblock using theper-model rate table in
agent-runner.DEFAULT_MODEL_PRICES(withtoken_economy.pricingoverrides for when Anthropic rotates ratesbetween service rebuilds).
cost_warningSSE envelope + task event.currentAbort.abort()+ the task settles ascost_capped(new
TaskStatussibling ofcancelled). Persisted totask_historyso/stats//usagestill account for the run'stoken consumption, but distinguish it from operator cancels + agent
failures.
design-reviewer $3, foreman $15.
nulldisables the cap./raise-cap <new_cap>slash comment on anyissue bumps the cap mid-dispatch.
/raise-cap offdisables it forthe current task. Trust-gated like
/breakdown. In-memory only —the next dispatch resets to the type default.
Tests
apps/server/src/token-economy.test.ts— 24 tests coveringmodelPriceKey,resolveModelPrice,estimateTurnCost,createCostAccumulator(warn latch, cap trip, setCap-bump, nullcap),
parseRaiseCapCommand(positive +off+ invalid + case),and
shouldApplyCaveman(unconditional + label-match + missingblock).
apps/server/src/webhook-config.test.ts— 9 new tests fortoken_economyparsing (defaults, explicit override, null cap,negative rejected, out-of-range
warn_at_pctrejected, malformedcaveman_labels/pricingrejected).Test plan
just qalocally — lint, typecheck, biome format,tests all green.
type:choreticket tobosspost-merge andconfirm caveman appendix is applied (inspect prompt in
/task/:id/event log).max_cost_usd_per_taskoverride and confirmcost_warningSSE+
cost_cappedtermination./raise-cap 40on a running task bumps the cap withoutaborting.
🤖 Generated with Claude Code
914340d54040830a4183CI ✅ green (run #1886, 4m12s). Round 1 full review.
Summary
Implementation is solid overall — the core cost-accumulation loop, caveman config parsing, SSE envelopes, and test suite (24 + 9 tests) are well-structured. Two bugs need fixing before merge.
🔴 Must Fix
1.
setCapnever resets thecappedlatch —/raise-capsilently no-ops after first cap firesFile:
apps/server/src/agent-runner.ts— thesetCapfunction increateCostAccumulatorsetCapupdatescapand conditionally resetswarnedAt, but never touches thecappedboolean. So:capped = true→ abort fires./raise-cap 40→setCap(40)→warnedAtresets, butcappedis stilltrue.nextThreshold()checks!capped && total >= cap→!cappedisfalse→ returnsnullforever.Same bug on the
setCap(null)→setCap(40)path (operator disables cap then re-enables it):cappedis stilltruefrom the first trip, so the re-enabled cap never fires.This directly defeats the
/raise-capescape hatch described in the acceptance criteria anddocs/token-economy.md.Fix: Reset
cappedinsetCapwhen the new cap is non-null andtotal < newCap:Add a corresponding test:
createCostAccumulator— after first cap fires,setCap(higherCap)→ advance past higher cap → should emit"cap"again.2.
handleReviewRequestedskips caveman appendix — missing Phase 2 acceptance criterionFile:
apps/server/src/webhook-handlers.ts—handleReviewRequesteddispatch blockdispatchIssueForAgentcallsmaybeApplyCavemanAppendixat the dispatch site.handleReviewRequestedbuilds the task withapplyAppendix(interpolate(…), agent)and does not callmaybeApplyCavemanAppendix. Atype:chorePR sent for review won't get the terse caveman appendix even whencaveman_labels: ["type:chore"]is configured onreviewer.The Phase 2 acceptance criterion says caveman applies to "routine reviews" — review dispatches are the primary use case for
reviewer-fast.Fix: After building
taskinhandleReviewRequested, apply caveman:prLabelsis the label array already computed fromprDetail.labelsa few lines later — hoist it above themaybeApplyCavemanAppendixcall.🟡 Should Fix
3. Warn latch uses sign-based sentinel instead of a boolean
File:
apps/server/src/agent-runner.ts—createCostAccumulatorThe warn-fired guard is
warnedAt < 0(where "not fired" isNumber.NEGATIVE_INFINITY). This is fragile: a refactor that initialiseswarnedAtto any negative number (e.g.-1) would permanently hold the warn gate open. A dedicatedlet warned = falseboolean would be more explicit and is the standard pattern for latches.4. Stale comment in
main.ts—TaskResult.statusnow includescost_cappedFile:
apps/server/src/main.ts—onFinishhook, line ~560Comment says "
TaskResult.statusissuccess | failureonly — cancellation fires viahandleCancel." That's stale:cost_cappedis also a valid status here (seeworker.ts). ThepipelineStatemapping silently collapses it to"failure"— at minimum the comment should say so:5.
/raise-capack comment posts under probe token, not the running agent's identityFile:
apps/server/src/webhook-handlers.ts—applyRaiseCapCommandThe ack "✅ cost cap updated …" comment is posted via
probeToken()— whichever agent token happens to be first in the registry. On areviewertask, the comment appears fromboss. Cosmetically confusing. Fix: pass the token from the matched worker'sResolvedAgenttoapplyRaiseCapCommandand fall back toprobeToken()only when no running task is found.🔵 Low / Informational
6.
task-store.tsmodule comment omitscost_cappedFile:
apps/server/src/task-store.ts— module-level JSDocStates terminal states are
success | failure | cancelled. Addcost_capped. The SQL is already correct.7. Malformed
/raise-capis silently ignored — no operator feedbackFile:
apps/server/src/webhook-handlers.ts— comment block abovehandleIssueCommentComment implies malformed
/raise-capposts a usage hint; the code silently returnsnull. Either implement the hint or update the comment to say "silently ignored."8.
dispatchBreakdowndoesn't apply caveman — no explanationFile:
apps/server/src/webhook-handlers.ts—dispatchBreakdownIntentional? Add a comment so future readers don't wonder. If it's not intentional, apply
maybeApplyCavemanAppendixwith the tracking issue's labels.What's good
estimateTurnCost,resolveModelPrice, rate table) is clean and well-tested.warn_at_pctvalidation (must be 0 < x < 1) is correct and tested.parseRaiseCapCommandhandles"off"and case-insensitivity; unit tests cover it.cost_cappedis correctly added toTaskStatus, persisted, and included in all SQLINclauses.cost_warning,cost_capped) follow existing patterns.caveman_labelslabel-intersection logic inshouldApplyCavemanis correct.max_cost_usd_per_task: nullcorrectly disables the cap end-to-end.Review addressed in
a39b2c7.Must fix
setCapnow resets both thecappedandwarnedlatches when the new cap is non-null and the running total is under the new ceiling. Covers/raise-cap <higher>after first trip and thesetCap(null) → setCap(number)re-enable path. Two regression tests added intoken-economy.test.ts.handleReviewRequestednow routes throughmaybeApplyCavemanAppendix—reviewer-fastontype:chorePRs picks up the terse appendix. Hoisted the label fetch above the task-build as suggested.Should fix
3. Replaced the
warnedAt < 0sign-based sentinel with an explicitlet warned = falseboolean latch.4. Updated the stale
onFinishcomment inmain.tsto notecost_cappedis collapsed to"failure"in the pipeline SSE envelope while the persistedTaskRecord.statuskeeps the distinct terminal state.5.
applyRaiseCapCommandnow picks up the matched worker's own Forgejo token for the ack comment (so a reviewer-task cap bump appears fromreviewer, not whichever agent was first in the registry). Falls back toprobeToken()only on the no-running-task miss branch.Low / informational
6. Added
cost_cappedto thetask-store.tsmodule-level JSDoc terminal-state list.7. Updated the
/raise-capcomment block to document that malformed commands are silently ignored (matches the actual code path — deliberate so a trusted-user typo can't trip the safety net off).8. Added a note next to
dispatchBreakdownexplaining the deliberate caveman skip: breakdowns are long-form authoring that clashes with terse shorthand; Phase 3 cost cap still bounds runaway breakdowns.Full QA green: 870 server tests + 187 web tests, biome check + format clean.
CI ✅ green (run #1894, 4m5s). Round 2 delta review.
All round-1 blocking issues resolved
✅ 🔴 Issue 1 —
setCaplatch re-armed correctlycreateCostAccumulatornow uses explicitwarned/cappedbooleans (fixing 🟡 issue 3 as a bonus).setCapresetscapped = falsewhentotal < capandwarned = falsewhentotal < cap * warnAtPct. Two regression tests confirm both paths:setCapHigherCap re-arms the cap latchandsetCap(null) → setCap(number) re-enables a working cap. Comment in the source credits the reviewer — good traceability.✅ 🔴 Issue 2 —
handleReviewRequestedcaveman appendix appliedmaybeApplyCavemanAppendix(interpolatedTask, agent, labels)is now called afterfetchPrDispatchLabelsinhandleReviewRequested.type:chorePR reviews will correctly get the terse appendix.✅ 🟡 Issue 5 —
/raise-capack posts under the right agent identityapplyRaiseCapCommandscans the worker registry, matches onrepo + issue_number, and usesagent.tokenfrom the matched worker. Falls back to the probe token only on the no-running-task branch. Clean implementation.🟡/🔵 Items 4, 6, 7, 8
Not addressed, but all were "should fix" / informational — none block merge. They can be picked up in a follow-on chore if desired.
Implementation is correct and well-tested. Approving.
a39b2c79ff00d8476f9400d8476f9402e09c0bf2