Tracking: agent pool + customization #47

Closed
opened 2026-04-18 14:58:51 +00:00 by claude-desktop · 1 comment
Collaborator

Purpose

Replace the hardcoded boss/dev/reviewer trio with a pool-based architecture so the operator can:

  • Run multiple agents of the same type in parallel (unblock serial dev/boss/reviewer queues).
  • Specialize instances by model, prompt appendix, and target-label (e.g. reviewer-security on opus 4.7 for PRs labeled security, reviewer-default on sonnet 4.6 for everything else).
  • CRUD instances from the dashboard (SQLite-backed) without a service restart.
  • Auto-generate issues from specs/*.md via a new breakdown skill that can apply the same routing labels the reviewer pool consumes.

Non-goals for this milestone

  • Dynamic Forgejo user / token / GPG provisioning. Agents of the same type share one Forgejo account, one token file, and one GPG identity. No new security surface.
  • Editing skills through the UI. Skill files stay in skills/*.md, reviewable via PR. The UI displays them read-only; per-agent customisation is an append-only prompt_appendix stored in SQLite.
  • Cross-repo breakdown orchestration. The breakdown skill operates on a single repo at a time.
  • Auto-classification beyond label matching. The reviewer-routing signal is the issue's Forgejo labels; we don't re-implement a diff classifier.
  • UI-managed routing rules. Label matching is configured via the CRUD form's match_labels field on the agent instance itself. No separate rule table.

Dependency graph

Layer 0 — foundation
  #A1  SQLite + type-defaults config refactor

Layer 1 — routing
  #A2  Pool scheduler (webhook dispatches by type)    <-- A1
  #A3  Label-aware instance selection                  <-- A1, A2

Layer 2 — per-instance state
  #A4  Prompt appendix per agent                       <-- A1
  #A5  Per-instance container reconciliation           <-- A1

Layer 3 — UX
  #A6  Dashboard agents CRUD                           <-- A1, A2, A4

Layer 4 — capability
  #A7  Breakdown skill — generate issues from specs/   <-- A3

Critical path: A1 → A2 → A3 → A7, with A4 / A5 / A6 branching off A1 and A2.

Execution order

  1. A1 — foundation; unblocks everything. Seed SQLite with one default instance per type so first boot is behaviour-identical to today.
  2. A2 + A4 — in parallel, both depend only on A1.
  3. A3 — sits on top of A2; enables label-based reviewer specialisation.
  4. A5 — container lifecycle from SQLite diff; v1 can be just agents-sync manually invoked, service-managed later.
  5. A6 — dashboard CRUD over the foundation + scheduler state.
  6. A7 — stretch story for the milestone close; applies labels consumed by A3.

Data model preview

  • config/agents.json becomes type defaults (Forgejo user, token path, git identity, GPG key, default model, default container image) — one row per type.
  • SQLite agents table — per-instance overrides:
    name           TEXT PRIMARY KEY     -- e.g. "reviewer-security"
    type           TEXT                 -- "boss" | "dev" | "reviewer"
    model          TEXT NULL            -- overrides type default
    prompt_appendix TEXT NULL           -- concat'd to base skill at dispatch time
    match_labels   TEXT NULL            -- JSON array, e.g. '["security","audit"]'
    notes          TEXT NULL
    created_at     INTEGER
    updated_at     INTEGER
    
  • Session key becomes <type>:<repo>:<issueOrPr> so any pool member can resume a prior session.
  • Container naming becomes claude-hooks-<instance-name> (e.g. claude-hooks-dev-default, claude-hooks-dev-frontend).

Out of scope for this milestone

  • Per-instance disk-usage breakdown on the dashboard.
  • Auth / RBAC on the admin CRUD endpoints (inherits whatever the dashboard has today).
  • Container hot-reload — at worst we recreate a container on CRUD mutation.
  • Migrating historical tasks/sessions under the new <type>:<repo>:<issueOrPr> key (new key takes effect for new dispatches; old sessions stay keyed on the old scheme and expire via sweeper).

References

  • Ideated 2026-04-18 after the post-CI routing and auto-rebase work.
  • Existing hardcoded config: config/agents.json.
  • Existing skills: skills/{implement,address-review,review,rebase,merge,fix-ci}.md — all stay, used as-is.

Milestone

Agent pool + customization (#16).

## Purpose Replace the hardcoded boss/dev/reviewer trio with a pool-based architecture so the operator can: - Run multiple agents of the same type in parallel (unblock serial dev/boss/reviewer queues). - Specialize instances by model, prompt appendix, and target-label (e.g. `reviewer-security` on opus 4.7 for PRs labeled `security`, `reviewer-default` on sonnet 4.6 for everything else). - CRUD instances from the dashboard (SQLite-backed) without a service restart. - Auto-generate issues from `specs/*.md` via a new `breakdown` skill that can apply the same routing labels the reviewer pool consumes. ## Non-goals for this milestone - **Dynamic Forgejo user / token / GPG provisioning.** Agents of the same type share one Forgejo account, one token file, and one GPG identity. No new security surface. - **Editing skills through the UI.** Skill files stay in `skills/*.md`, reviewable via PR. The UI displays them read-only; per-agent customisation is an append-only `prompt_appendix` stored in SQLite. - **Cross-repo breakdown orchestration.** The `breakdown` skill operates on a single repo at a time. - **Auto-classification beyond label matching.** The reviewer-routing signal is the issue's Forgejo labels; we don't re-implement a diff classifier. - **UI-managed routing rules.** Label matching is configured via the CRUD form's `match_labels` field on the agent instance itself. No separate rule table. ## Dependency graph ```text Layer 0 — foundation #A1 SQLite + type-defaults config refactor Layer 1 — routing #A2 Pool scheduler (webhook dispatches by type) <-- A1 #A3 Label-aware instance selection <-- A1, A2 Layer 2 — per-instance state #A4 Prompt appendix per agent <-- A1 #A5 Per-instance container reconciliation <-- A1 Layer 3 — UX #A6 Dashboard agents CRUD <-- A1, A2, A4 Layer 4 — capability #A7 Breakdown skill — generate issues from specs/ <-- A3 ``` Critical path: `A1 → A2 → A3 → A7`, with `A4 / A5 / A6` branching off `A1` and `A2`. ## Execution order 1. **A1** — foundation; unblocks everything. Seed SQLite with one default instance per type so first boot is behaviour-identical to today. 2. **A2 + A4** — in parallel, both depend only on A1. 3. **A3** — sits on top of A2; enables label-based reviewer specialisation. 4. **A5** — container lifecycle from SQLite diff; v1 can be `just agents-sync` manually invoked, service-managed later. 5. **A6** — dashboard CRUD over the foundation + scheduler state. 6. **A7** — stretch story for the milestone close; applies labels consumed by A3. ## Data model preview - **`config/agents.json`** becomes *type defaults* (Forgejo user, token path, git identity, GPG key, default model, default container image) — one row per type. - **SQLite `agents` table** — per-instance overrides: ``` name TEXT PRIMARY KEY -- e.g. "reviewer-security" type TEXT -- "boss" | "dev" | "reviewer" model TEXT NULL -- overrides type default prompt_appendix TEXT NULL -- concat'd to base skill at dispatch time match_labels TEXT NULL -- JSON array, e.g. '["security","audit"]' notes TEXT NULL created_at INTEGER updated_at INTEGER ``` - **Session key** becomes `<type>:<repo>:<issueOrPr>` so any pool member can resume a prior session. - **Container naming** becomes `claude-hooks-<instance-name>` (e.g. `claude-hooks-dev-default`, `claude-hooks-dev-frontend`). ## Out of scope for this milestone - Per-instance disk-usage breakdown on the dashboard. - Auth / RBAC on the admin CRUD endpoints (inherits whatever the dashboard has today). - Container hot-reload — at worst we recreate a container on CRUD mutation. - Migrating historical tasks/sessions under the new `<type>:<repo>:<issueOrPr>` key (new key takes effect for new dispatches; old sessions stay keyed on the old scheme and expire via sweeper). ## References - Ideated 2026-04-18 after the post-CI routing and auto-rebase work. - Existing hardcoded config: `config/agents.json`. - Existing skills: `skills/{implement,address-review,review,rebase,merge,fix-ci}.md` — all stay, used as-is. ## Milestone `Agent pool + customization` (#16).
Author
Collaborator

Milestone 16 complete — closing tracker

All seven stories landed on 2026-04-20. Summary:

Story Title PR
A1 SQLite + type-defaults config refactor #48
A2 Pool scheduler — webhook dispatches by type #49
A3 Label-aware instance selection #50
A4 Per-instance prompt appendix merged
A5 Per-instance container reconciliation #52
A6 Dashboard agents CRUD #53 / #116
A7 Breakdown skill — generate issues from specs/*.md #147

Beyond the planned scope, we also shipped during the same push:

  • Pool scaling in practice: 2× dev, 2× boss, 2× reviewer pool members exercised against real traffic.
  • Force-merge terminator on MAX_ROUNDS (#137) — the review-loop dedline story.
  • Force-merge dashboard badge + surfacing (#141/#143).
  • Session-history persistence fix (#125) + sweeper for old JSONLs (#131).
  • SSE heartbeat (#128) + dashboard /stats panel (#127/#133).
  • Container watchdog for silent-disappearance recovery (#132#134).
  • Reviewer pending-CI carve-out (#148) — the fix after watching #147's force-merge fire on a CI-waiting-not-code-issue loop.

The only non-shipped item in the milestone was the upstream Penpot mockup story #55 — A6 implemented ad-hoc without it, so I'm closing that one too as retroactively-obsolete.

Closing the milestone.

## Milestone 16 complete — closing tracker All seven stories landed on 2026-04-20. Summary: | Story | Title | PR | |---|---|---| | A1 | SQLite + type-defaults config refactor | #48 | | A2 | Pool scheduler — webhook dispatches by type | #49 | | A3 | Label-aware instance selection | #50 | | A4 | Per-instance prompt appendix | merged | | A5 | Per-instance container reconciliation | #52 | | A6 | Dashboard agents CRUD | #53 / #116 | | A7 | Breakdown skill — generate issues from `specs/*.md` | #147 | Beyond the planned scope, we also shipped during the same push: - Pool scaling in practice: 2× dev, 2× boss, 2× reviewer pool members exercised against real traffic. - Force-merge terminator on `MAX_ROUNDS` (#137) — the review-loop dedline story. - Force-merge dashboard badge + surfacing (#141/#143). - Session-history persistence fix (#125) + sweeper for old JSONLs (#131). - SSE heartbeat (#128) + dashboard `/stats` panel (#127/#133). - Container watchdog for silent-disappearance recovery (#132 → #134). - Reviewer pending-CI carve-out (#148) — the fix after watching #147's force-merge fire on a CI-waiting-not-code-issue loop. The only non-shipped item in the milestone was the upstream Penpot mockup story #55 — A6 implemented ad-hoc without it, so I'm closing that one too as retroactively-obsolete. Closing the milestone.
Sign in to join this conversation.
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks#47
No description provided.