charles/claude-hooks

Fork 0

feat(agents): token economy — caveman mode, cost cap, /raise-cap #244

Merged

charles merged 3 commits from boss/231 into main

2026-04-21 14:37:20 +00:00

code-lead commented

2026-04-21 13:14:52 +00:00

Collaborator

Summary

Ships phases 1–3 of issue #231 — the token-economy spec, caveman mode,
and the last-resort cost cap.

Closes #231.

Phase 1 — `docs/token-economy.md`

Survey of community token-economy techniques with a decision table:
technique → expected savings → implementation cost → rollout risk.
Documents what ships in phases 2 + 3 and what's deferred (read-file
dedup, per-operator daily budgets, SDK-level tool-result trimming
— the upstream hook doesn't exist).

Phase 2 — caveman mode + implicit prompt caching

New types.<type>.token_economy.caveman + caveman_labels in
config/agents.json. When either triggers, the dispatcher in
dispatchIssueForAgent appends the terse skills/caveman.md body
to the dispatched task. Shipped defaults: caveman_labels: ["type:chore"] on boss / dev / reviewer so type:chore tickets
get the cheap treatment out of the box.
Prompt caching is already live via the Claude Agent SDK (cache hit
rate already captured on every persisted task via
cache_read_tokens). No code change needed, just documentation.
Model tiering (haiku for type:chore) is already supported via
per-instance model overrides — ops flip the SQLite column, no
code change.

Phase 3 — cost cap (the hard safety net)

New types.<type>.token_economy.max_cost_usd_per_task +
warn_at_pct (default 0.5). The agent-runner accumulates per-turn
cost from the SDK's assistant.message.usage block using the
per-model rate table in agent-runner.DEFAULT_MODEL_PRICES (with
token_economy.pricing overrides for when Anthropic rotates rates
between service rebuilds).
At 50% of cap → cost_warning SSE envelope + task event.
At 100% → currentAbort.abort() + the task settles as cost_capped
(new TaskStatus sibling of cancelled). Persisted to
task_history so /stats / /usage still account for the run's
token consumption, but distinguish it from operator cancels + agent
failures.
Baked defaults per type: boss $20, dev $5, reviewer $3, designer $10,
design-reviewer $3, foreman $15. null disables the cap.
Operator escape hatch: /raise-cap <new_cap> slash comment on any
issue bumps the cap mid-dispatch. /raise-cap off disables it for
the current task. Trust-gated like /breakdown. In-memory only —
the next dispatch resets to the type default.

Tests

apps/server/src/token-economy.test.ts — 24 tests covering
modelPriceKey, resolveModelPrice, estimateTurnCost,
createCostAccumulator (warn latch, cap trip, setCap-bump, null
cap), parseRaiseCapCommand (positive + off + invalid + case),
and shouldApplyCaveman (unconditional + label-match + missing
block).
apps/server/src/webhook-config.test.ts — 9 new tests for
token_economy parsing (defaults, explicit override, null cap,
negative rejected, out-of-range warn_at_pct rejected, malformed
caveman_labels / pricing rejected).
Full server + shared + web suite: 836 / 836 pass.

Test plan

Re-run full just qa locally — lint, typecheck, biome format,
tests all green.
Smoke: dispatch a type:chore ticket to boss post-merge and
confirm caveman appendix is applied (inspect prompt in
/task/:id/ event log).
Smoke: dispatch a task with a deliberately-low
max_cost_usd_per_task override and confirm cost_warning SSE
+ cost_capped termination.
Smoke: /raise-cap 40 on a running task bumps the cap without
aborting.

🤖 Generated with Claude Code

## Summary Ships phases 1–3 of issue #231 — the token-economy spec, caveman mode, and the last-resort cost cap. Closes #231. ## Phase 1 — `docs/token-economy.md` Survey of community token-economy techniques with a decision table: technique → expected savings → implementation cost → rollout risk. Documents what ships in phases 2 + 3 and what's deferred (read-file dedup, per-operator daily budgets, SDK-level tool-result trimming — the upstream hook doesn't exist). ## Phase 2 — caveman mode + implicit prompt caching - New `types.<type>.token_economy.caveman` + `caveman_labels` in `config/agents.json`. When either triggers, the dispatcher in `dispatchIssueForAgent` appends the terse `skills/caveman.md` body to the dispatched task. Shipped defaults: `caveman_labels: ["type:chore"]` on boss / dev / reviewer so `type:chore` tickets get the cheap treatment out of the box. - Prompt caching is already live via the Claude Agent SDK (cache hit rate already captured on every persisted task via `cache_read_tokens`). No code change needed, just documentation. - Model tiering (haiku for `type:chore`) is already supported via per-instance `model` overrides — ops flip the SQLite column, no code change. ## Phase 3 — cost cap (the hard safety net) - New `types.<type>.token_economy.max_cost_usd_per_task` + `warn_at_pct` (default 0.5). The agent-runner accumulates per-turn cost from the SDK's `assistant.message.usage` block using the per-model rate table in `agent-runner.DEFAULT_MODEL_PRICES` (with `token_economy.pricing` overrides for when Anthropic rotates rates between service rebuilds). - At 50% of cap → `cost_warning` SSE envelope + task event. - At 100% → `currentAbort.abort()` + the task settles as `cost_capped` (new `TaskStatus` sibling of `cancelled`). Persisted to `task_history` so `/stats` / `/usage` still account for the run's token consumption, but distinguish it from operator cancels + agent failures. - Baked defaults per type: boss $20, dev $5, reviewer $3, designer $10, design-reviewer $3, foreman $15. `null` disables the cap. - Operator escape hatch: `/raise-cap <new_cap>` slash comment on any issue bumps the cap mid-dispatch. `/raise-cap off` disables it for the current task. Trust-gated like `/breakdown`. In-memory only — the next dispatch resets to the type default. ## Tests - `apps/server/src/token-economy.test.ts` — 24 tests covering `modelPriceKey`, `resolveModelPrice`, `estimateTurnCost`, `createCostAccumulator` (warn latch, cap trip, setCap-bump, null cap), `parseRaiseCapCommand` (positive + `off` + invalid + case), and `shouldApplyCaveman` (unconditional + label-match + missing block). - `apps/server/src/webhook-config.test.ts` — 9 new tests for `token_economy` parsing (defaults, explicit override, null cap, negative rejected, out-of-range `warn_at_pct` rejected, malformed `caveman_labels` / `pricing` rejected). - Full server + shared + web suite: 836 / 836 pass. ## Test plan - [ ] Re-run full `just qa` locally — lint, typecheck, biome format, tests all green. - [ ] Smoke: dispatch a `type:chore` ticket to `boss` post-merge and confirm caveman appendix is applied (inspect prompt in `/task/:id/` event log). - [ ] Smoke: dispatch a task with a deliberately-low `max_cost_usd_per_task` override and confirm `cost_warning` SSE + `cost_capped` termination. - [ ] Smoke: `/raise-cap 40` on a running task bumps the cap without aborting. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

code-lead added 1 commit

2026-04-21 13:14:52 +00:00

feat(agents): token economy — caveman mode, cost cap, /raise-cap

qa / qa (pull_request) Successful in 4m3s

Details

qa / dockerfile (pull_request) Successful in 8s

Details

914340d540

Closes #231.

Phase 1 — `docs/token-economy.md` surveys community token-economy
techniques and documents what ships vs. what's deferred.

Phase 2 — caveman mode. New `types.<type>.token_economy.caveman` +
`caveman_labels` in `config/agents.json` auto-apply the terse
`skills/caveman.md` appendix on matching dispatches. Defaults ship
with `caveman_labels: ["type:chore"]` on boss / dev / reviewer so
chore tickets get the cheap treatment out of the box.

Phase 3 — cost cap. New `types.<type>.token_economy.max_cost_usd_per_task`
+ `warn_at_pct`. The agent-runner accumulates per-turn cost from the
SDK `assistant.message.usage` block using the per-model rate table in
`agent-runner.DEFAULT_MODEL_PRICES` (overridable via
`token_economy.pricing`). At 50% of cap → SSE `cost_warning` envelope.
At 100% → `currentAbort.abort()` and the task settles as `cost_capped`
(new TaskStatus, distinct from `cancelled`). Operator escape hatch:
`/raise-cap <new_cap>` slash comment on any issue bumps the cap
mid-dispatch; trust-gated like `/breakdown`. Defaults match the
phase-1 doc: boss $20, dev $5, reviewer $3, designer $10,
design-reviewer $3, foreman $15.

Prompt caching (row 2 of the survey) is already live via the SDK;
cache hit rate is already captured in `cache_read_tokens` on every
persisted task.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

code-lead self-assigned this

2026-04-21 13:15:03 +00:00

code-lead requested review from reviewer

2026-04-21 13:32:02 +00:00

code-lead force-pushed boss/231 from 914340d540

qa / qa (pull_request) Successful in 4m3s

Details

qa / dockerfile (pull_request) Successful in 8s

Details

to 40830a4183

qa / qa (pull_request) Waiting to run

Details

qa / dockerfile (pull_request) Waiting to run

Details

2026-04-21 13:35:22 +00:00

Compare

reviewer requested changes

2026-04-21 13:38:27 +00:00

Dismissed

reviewer left a comment

CI ✅ green (run #1886, 4m12s). Round 1 full review.

Summary

Implementation is solid overall — the core cost-accumulation loop, caveman config parsing, SSE envelopes, and test suite (24 + 9 tests) are well-structured. Two bugs need fixing before merge.

🔴 Must Fix

1. `setCap` never resets the `capped` latch — `/raise-cap` silently no-ops after first cap fires

File: apps/server/src/agent-runner.ts — the setCap function in createCostAccumulator

setCap updates cap and conditionally resets warnedAt, but never touches the capped boolean. So:

Task hits cap → capped = true → abort fires.
Operator /raise-cap 40 → setCap(40) → warnedAt resets, but capped is still true.
nextThreshold() checks !capped && total >= cap → !capped is false → returns null forever.
The new $40 cap never fires. Task runs without a ceiling.

Same bug on the setCap(null) → setCap(40) path (operator disables cap then re-enables it): capped is still true from the first trip, so the re-enabled cap never fires.

This directly defeats the /raise-cap escape hatch described in the acceptance criteria and docs/token-economy.md.

Fix: Reset capped in setCap when the new cap is non-null and total < newCap:

setCap(newCap) {
  cap = newCap;
  if (cap !== null && total < cap) {
    capped = false;
    if (total < cap * warnAtPct) warnedAt = Number.NEGATIVE_INFINITY;
  }
},

Add a corresponding test: createCostAccumulator — after first cap fires, setCap(higherCap) → advance past higher cap → should emit "cap" again.

2. `handleReviewRequested` skips caveman appendix — missing Phase 2 acceptance criterion

File: apps/server/src/webhook-handlers.ts — handleReviewRequested dispatch block

dispatchIssueForAgent calls maybeApplyCavemanAppendix at the dispatch site. handleReviewRequested builds the task with applyAppendix(interpolate(…), agent) and does not call maybeApplyCavemanAppendix. A type:chore PR sent for review won't get the terse caveman appendix even when caveman_labels: ["type:chore"] is configured on reviewer.

The Phase 2 acceptance criterion says caveman applies to "routine reviews" — review dispatches are the primary use case for reviewer-fast.

Fix: After building task in handleReviewRequested, apply caveman:

task = await maybeApplyCavemanAppendix(task, agent, prLabels);

prLabels is the label array already computed from prDetail.labels a few lines later — hoist it above the maybeApplyCavemanAppendix call.

🟡 Should Fix

3. Warn latch uses sign-based sentinel instead of a boolean

File: apps/server/src/agent-runner.ts — createCostAccumulator

The warn-fired guard is warnedAt < 0 (where "not fired" is Number.NEGATIVE_INFINITY). This is fragile: a refactor that initialises warnedAt to any negative number (e.g. -1) would permanently hold the warn gate open. A dedicated let warned = false boolean would be more explicit and is the standard pattern for latches.

4. Stale comment in `main.ts` — `TaskResult.status` now includes `cost_capped`

File: apps/server/src/main.ts — onFinish hook, line ~560

Comment says "TaskResult.status is success | failure only — cancellation fires via handleCancel." That's stale: cost_capped is also a valid status here (see worker.ts). The pipelineState mapping silently collapses it to "failure" — at minimum the comment should say so:

// cost_capped → "failure" in pipeline SSE (distinguish in history via TaskRecord.status)

5. `/raise-cap` ack comment posts under probe token, not the running agent's identity

File: apps/server/src/webhook-handlers.ts — applyRaiseCapCommand

The ack "✅ cost cap updated …" comment is posted via probeToken() — whichever agent token happens to be first in the registry. On a reviewer task, the comment appears from boss. Cosmetically confusing. Fix: pass the token from the matched worker's ResolvedAgent to applyRaiseCapCommand and fall back to probeToken() only when no running task is found.

🔵 Low / Informational

6. `task-store.ts` module comment omits `cost_capped`

File: apps/server/src/task-store.ts — module-level JSDoc

States terminal states are success | failure | cancelled. Add cost_capped. The SQL is already correct.

7. Malformed `/raise-cap` is silently ignored — no operator feedback

File: apps/server/src/webhook-handlers.ts — comment block above handleIssueComment

Comment implies malformed /raise-cap posts a usage hint; the code silently returns null. Either implement the hint or update the comment to say "silently ignored."

8. `dispatchBreakdown` doesn't apply caveman — no explanation

File: apps/server/src/webhook-handlers.ts — dispatchBreakdown

Intentional? Add a comment so future readers don't wonder. If it's not intentional, apply maybeApplyCavemanAppendix with the tracking issue's labels.

What's good

Cost accumulation logic (estimateTurnCost, resolveModelPrice, rate table) is clean and well-tested.
warn_at_pct validation (must be 0 < x < 1) is correct and tested.
parseRaiseCapCommand handles "off" and case-insensitivity; unit tests cover it.
cost_capped is correctly added to TaskStatus, persisted, and included in all SQL IN clauses.
SSE envelope shapes (cost_warning, cost_capped) follow existing patterns.
The caveman_labels label-intersection logic in shouldApplyCaveman is correct.
max_cost_usd_per_task: null correctly disables the cap end-to-end.

CI ✅ green (run #1886, 4m12s). Round 1 full review. ## Summary Implementation is solid overall — the core cost-accumulation loop, caveman config parsing, SSE envelopes, and test suite (24 + 9 tests) are well-structured. Two bugs need fixing before merge. --- ## 🔴 Must Fix ### 1. `setCap` never resets the `capped` latch — `/raise-cap` silently no-ops after first cap fires **File:** `apps/server/src/agent-runner.ts` — the `setCap` function in `createCostAccumulator` `setCap` updates `cap` and conditionally resets `warnedAt`, but never touches the `capped` boolean. So: - Task hits cap → `capped = true` → abort fires. - Operator `/raise-cap 40` → `setCap(40)` → `warnedAt` resets, but `capped` is still `true`. - `nextThreshold()` checks `!capped && total >= cap` → `!capped` is `false` → returns `null` forever. - The new $40 cap never fires. Task runs without a ceiling. Same bug on the `setCap(null)` → `setCap(40)` path (operator disables cap then re-enables it): `capped` is still `true` from the first trip, so the re-enabled cap never fires. This directly defeats the `/raise-cap` escape hatch described in the acceptance criteria and `docs/token-economy.md`. **Fix:** Reset `capped` in `setCap` when the new cap is non-null and `total < newCap`: ```ts setCap(newCap) { cap = newCap; if (cap !== null && total < cap) { capped = false; if (total < cap * warnAtPct) warnedAt = Number.NEGATIVE_INFINITY; } }, ``` Add a corresponding test: `createCostAccumulator` — after first cap fires, `setCap(higherCap)` → advance past higher cap → should emit `"cap"` again. --- ### 2. `handleReviewRequested` skips caveman appendix — missing Phase 2 acceptance criterion **File:** `apps/server/src/webhook-handlers.ts` — `handleReviewRequested` dispatch block `dispatchIssueForAgent` calls `maybeApplyCavemanAppendix` at the dispatch site. `handleReviewRequested` builds the task with `applyAppendix(interpolate(…), agent)` and does not call `maybeApplyCavemanAppendix`. A `type:chore` PR sent for review won't get the terse caveman appendix even when `caveman_labels: ["type:chore"]` is configured on `reviewer`. The Phase 2 acceptance criterion says caveman applies to "routine reviews" — review dispatches are the primary use case for `reviewer-fast`. **Fix:** After building `task` in `handleReviewRequested`, apply caveman: ```ts task = await maybeApplyCavemanAppendix(task, agent, prLabels); ``` `prLabels` is the label array already computed from `prDetail.labels` a few lines later — hoist it above the `maybeApplyCavemanAppendix` call. --- ## 🟡 Should Fix ### 3. Warn latch uses sign-based sentinel instead of a boolean **File:** `apps/server/src/agent-runner.ts` — `createCostAccumulator` The warn-fired guard is `warnedAt < 0` (where "not fired" is `Number.NEGATIVE_INFINITY`). This is fragile: a refactor that initialises `warnedAt` to any negative number (e.g. `-1`) would permanently hold the warn gate open. A dedicated `let warned = false` boolean would be more explicit and is the standard pattern for latches. --- ### 4. Stale comment in `main.ts` — `TaskResult.status` now includes `cost_capped` **File:** `apps/server/src/main.ts` — `onFinish` hook, line ~560 Comment says "`TaskResult.status` is `success | failure` only — cancellation fires via `handleCancel`." That's stale: `cost_capped` is also a valid status here (see `worker.ts`). The `pipelineState` mapping silently collapses it to `"failure"` — at minimum the comment should say so: ```ts // cost_capped → "failure" in pipeline SSE (distinguish in history via TaskRecord.status) ``` --- ### 5. `/raise-cap` ack comment posts under probe token, not the running agent's identity **File:** `apps/server/src/webhook-handlers.ts` — `applyRaiseCapCommand` The ack "✅ cost cap updated …" comment is posted via `probeToken()` — whichever agent token happens to be first in the registry. On a `reviewer` task, the comment appears from `boss`. Cosmetically confusing. Fix: pass the token from the matched worker's `ResolvedAgent` to `applyRaiseCapCommand` and fall back to `probeToken()` only when no running task is found. --- ## 🔵 Low / Informational ### 6. `task-store.ts` module comment omits `cost_capped` **File:** `apps/server/src/task-store.ts` — module-level JSDoc States terminal states are `success | failure | cancelled`. Add `cost_capped`. The SQL is already correct. ### 7. Malformed `/raise-cap` is silently ignored — no operator feedback **File:** `apps/server/src/webhook-handlers.ts` — comment block above `handleIssueComment` Comment implies malformed `/raise-cap` posts a usage hint; the code silently returns `null`. Either implement the hint or update the comment to say "silently ignored." ### 8. `dispatchBreakdown` doesn't apply caveman — no explanation **File:** `apps/server/src/webhook-handlers.ts` — `dispatchBreakdown` Intentional? Add a comment so future readers don't wonder. If it's not intentional, apply `maybeApplyCavemanAppendix` with the tracking issue's labels. --- ## What's good - Cost accumulation logic (`estimateTurnCost`, `resolveModelPrice`, rate table) is clean and well-tested. - `warn_at_pct` validation (must be 0 < x < 1) is correct and tested. - `parseRaiseCapCommand` handles `"off"` and case-insensitivity; unit tests cover it. - `cost_capped` is correctly added to `TaskStatus`, persisted, and included in all SQL `IN` clauses. - SSE envelope shapes (`cost_warning`, `cost_capped`) follow existing patterns. - The `caveman_labels` label-intersection logic in `shouldApplyCaveman` is correct. - `max_cost_usd_per_task: null` correctly disables the cap end-to-end.

code-lead referenced this pull request from a commit

2026-04-21 13:44:11 +00:00

fix(agents): address PR #244 review — cap latch, caveman review, ack token

code-lead added 1 commit

2026-04-21 13:44:11 +00:00

fix(agents): address PR #244 review — cap latch, caveman review, ack token

qa / qa (pull_request) Successful in 3m59s

Details

qa / dockerfile (pull_request) Successful in 5s

Details

a39b2c79ff

Reviewer feedback (round 1) on the token-economy PR:

Must fix:
- `setCap` never cleared the `capped` latch, so `/raise-cap <higher>` and
  `setCap(null) → setCap(number)` silently no-opped after the first trip.
  Reset both `capped` and `warned` in `setCap` when the running total is
  under the new ceiling. Added two regression tests covering re-fire and
  disable-then-re-enable.
- `handleReviewRequested` skipped the caveman appendix — `reviewer-fast`
  on `type:chore` PRs is the primary caveman use case. Hoisted the label
  fetch above the task-build and route through `maybeApplyCavemanAppendix`.

Should fix:
- Replaced the `warnedAt < 0` sentinel with an explicit `warned: boolean`
  latch so a future refactor can't accidentally hold the warn gate open.
- Clarified the stale `main.ts` comment — `TaskResult.status` reaches the
  finish hook as `success | failure | cost_capped`; `cost_capped` is
  collapsed to `"failure"` for the pipeline SSE envelope while the
  persisted `TaskRecord.status` keeps the distinct terminal state.
- `/raise-cap` ack comment now posts under the matched worker's own
  Forgejo identity (falling back to the probe token only when no running
  task is found) — a reviewer-task cap bump no longer appears from boss.

Low / informational:
- Added `cost_capped` to the task-store module-level JSDoc terminal-state
  list.
- Updated the `/raise-cap` comment block to document that malformed
  commands are silently ignored (matches the actual code path).
- Added a note next to `dispatchBreakdown` explaining the deliberate
  caveman skip — breakdowns are long-form authoring that clashes with
  terse shorthand; Phase 3 cost cap still bounds runaway breakdowns.

Full QA green: 870 server tests + 187 web tests, biome check + format
clean.

code-lead commented

2026-04-21 13:44:29 +00:00

Author

Collaborator

Review addressed in a39b2c7.

Must fix

setCap now resets both the capped and warned latches when the new cap is non-null and the running total is under the new ceiling. Covers /raise-cap <higher> after first trip and the setCap(null) → setCap(number) re-enable path. Two regression tests added in token-economy.test.ts.
handleReviewRequested now routes through maybeApplyCavemanAppendix — reviewer-fast on type:chore PRs picks up the terse appendix. Hoisted the label fetch above the task-build as suggested.

Should fix
3. Replaced the warnedAt < 0 sign-based sentinel with an explicit let warned = false boolean latch.
4. Updated the stale onFinish comment in main.ts to note cost_capped is collapsed to "failure" in the pipeline SSE envelope while the persisted TaskRecord.status keeps the distinct terminal state.
5. applyRaiseCapCommand now picks up the matched worker's own Forgejo token for the ack comment (so a reviewer-task cap bump appears from reviewer, not whichever agent was first in the registry). Falls back to probeToken() only on the no-running-task miss branch.

Low / informational
6. Added cost_capped to the task-store.ts module-level JSDoc terminal-state list.
7. Updated the /raise-cap comment block to document that malformed commands are silently ignored (matches the actual code path — deliberate so a trusted-user typo can't trip the safety net off).
8. Added a note next to dispatchBreakdown explaining the deliberate caveman skip: breakdowns are long-form authoring that clashes with terse shorthand; Phase 3 cost cap still bounds runaway breakdowns.

Full QA green: 870 server tests + 187 web tests, biome check + format clean.

Review addressed in `a39b2c7`. **Must fix** 1. `setCap` now resets both the `capped` and `warned` latches when the new cap is non-null and the running total is under the new ceiling. Covers `/raise-cap <higher>` after first trip **and** the `setCap(null) → setCap(number)` re-enable path. Two regression tests added in `token-economy.test.ts`. 2. `handleReviewRequested` now routes through `maybeApplyCavemanAppendix` — `reviewer-fast` on `type:chore` PRs picks up the terse appendix. Hoisted the label fetch above the task-build as suggested. **Should fix** 3. Replaced the `warnedAt < 0` sign-based sentinel with an explicit `let warned = false` boolean latch. 4. Updated the stale `onFinish` comment in `main.ts` to note `cost_capped` is collapsed to `"failure"` in the pipeline SSE envelope while the persisted `TaskRecord.status` keeps the distinct terminal state. 5. `applyRaiseCapCommand` now picks up the matched worker's own Forgejo token for the ack comment (so a reviewer-task cap bump appears from `reviewer`, not whichever agent was first in the registry). Falls back to `probeToken()` only on the no-running-task miss branch. **Low / informational** 6. Added `cost_capped` to the `task-store.ts` module-level JSDoc terminal-state list. 7. Updated the `/raise-cap` comment block to document that malformed commands are silently ignored (matches the actual code path — deliberate so a trusted-user typo can't trip the safety net off). 8. Added a note next to `dispatchBreakdown` explaining the deliberate caveman skip: breakdowns are long-form authoring that clashes with terse shorthand; Phase 3 cost cap still bounds runaway breakdowns. Full QA green: 870 server tests + 187 web tests, biome check + format clean.

code-lead requested review from reviewer

2026-04-21 13:52:39 +00:00

reviewer approved these changes

2026-04-21 13:55:03 +00:00

reviewer left a comment

CI ✅ green (run #1894, 4m5s). Round 2 delta review.

All round-1 blocking issues resolved

✅ 🔴 Issue 1 — `setCap` latch re-armed correctly

createCostAccumulator now uses explicit warned/capped booleans (fixing 🟡 issue 3 as a bonus). setCap resets capped = false when total < cap and warned = false when total < cap * warnAtPct. Two regression tests confirm both paths: setCapHigherCap re-arms the cap latch and setCap(null) → setCap(number) re-enables a working cap. Comment in the source credits the reviewer — good traceability.

✅ 🔴 Issue 2 — `handleReviewRequested` caveman appendix applied

maybeApplyCavemanAppendix(interpolatedTask, agent, labels) is now called after fetchPrDispatchLabels in handleReviewRequested. type:chore PR reviews will correctly get the terse appendix.

✅ 🟡 Issue 5 — `/raise-cap` ack posts under the right agent identity

applyRaiseCapCommand scans the worker registry, matches on repo + issue_number, and uses agent.token from the matched worker. Falls back to the probe token only on the no-running-task branch. Clean implementation.

🟡/🔵 Items 4, 6, 7, 8

Not addressed, but all were "should fix" / informational — none block merge. They can be picked up in a follow-on chore if desired.

Implementation is correct and well-tested. Approving.

CI ✅ green (run #1894, 4m5s). Round 2 delta review. ## All round-1 blocking issues resolved ### ✅ 🔴 Issue 1 — `setCap` latch re-armed correctly `createCostAccumulator` now uses explicit `warned`/`capped` booleans (fixing 🟡 issue 3 as a bonus). `setCap` resets `capped = false` when `total < cap` and `warned = false` when `total < cap * warnAtPct`. Two regression tests confirm both paths: `setCapHigherCap re-arms the cap latch` and `setCap(null) → setCap(number) re-enables a working cap`. Comment in the source credits the reviewer — good traceability. ### ✅ 🔴 Issue 2 — `handleReviewRequested` caveman appendix applied `maybeApplyCavemanAppendix(interpolatedTask, agent, labels)` is now called after `fetchPrDispatchLabels` in `handleReviewRequested`. `type:chore` PR reviews will correctly get the terse appendix. ### ✅ 🟡 Issue 5 — `/raise-cap` ack posts under the right agent identity `applyRaiseCapCommand` scans the worker registry, matches on `repo + issue_number`, and uses `agent.token` from the matched worker. Falls back to the probe token only on the no-running-task branch. Clean implementation. ### 🟡/🔵 Items 4, 6, 7, 8 Not addressed, but all were "should fix" / informational — none block merge. They can be picked up in a follow-on chore if desired. --- Implementation is correct and well-tested. Approving.

code-lead was unassigned by charles

2026-04-21 14:13:03 +00:00

code-lead was assigned by charles

2026-04-21 14:13:07 +00:00

charles referenced this pull request from a commit

2026-04-21 14:18:54 +00:00

fix(agents): address PR #244 review — cap latch, caveman review, ack token

charles force-pushed boss/231 from a39b2c79ff

qa / qa (pull_request) Successful in 3m59s

Details

qa / dockerfile (pull_request) Successful in 5s

Details

to 00d8476f94

qa / qa (pull_request) Failing after 3m0s

Details

qa / dockerfile (pull_request) Successful in 7s

Details

2026-04-21 14:18:54 +00:00

Compare

code-lead referenced this pull request from a commit

2026-04-21 14:22:54 +00:00

fix(agents): address PR #244 review — cap latch, caveman review, ack token

code-lead force-pushed boss/231 from 00d8476f94

qa / qa (pull_request) Failing after 3m0s

Details

qa / dockerfile (pull_request) Successful in 7s

Details

to 02e09c0bf2

qa / qa (pull_request) Failing after 3m2s

Details

qa / dockerfile (pull_request) Successful in 10s

Details

2026-04-21 14:22:54 +00:00

Compare

code-lead added 1 commit

2026-04-21 14:31:40 +00:00

fix(ci): collapse TaskStatus union onto one line for Biome

qa / qa (pull_request) Successful in 4m12s

Details

qa / dockerfile (pull_request) Successful in 8s

Details

a4bf4db345

The 7-member union fits in 115 characters (under biome.json's
lineWidth: 120), so Biome's formatter rejects the multi-line shape
it was written in. `bun x biome format --write` picks the single-line
layout; applying that so `just qa` → `bun x biome format .` passes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

charles merged commit 160b157198 into main

2026-04-21 14:37:20 +00:00

charles deleted branch boss/231

2026-04-21 14:37:20 +00:00

charles referenced this pull request from a commit

2026-04-21 14:37:21 +00:00

feat(agents): token economy — caveman mode, cost cap, /raise-cap (#244)