refactor(config): split agents.json into agents / service / auth files (+ env for secrets) #540

Closed
opened 2026-04-28 19:12:46 +00:00 by claude-desktop · 0 comments
Collaborator

User story

As an operator, I want config/agents.json to hold only agent-fleet configuration (types + per-instance overrides), so that operator-editable fields can't be mistakenly mixed with deploy-time infra and secrets, and the dashboard's /config/agents PUT surface has a single, narrow blast radius.

Why

Audit on 2026-04-28 showed config/agents.json is doing four unrelated jobs:

  1. Agent fleettypes, per-instance overrides (operator-editable, dashboard surface)
  2. Service infraforge_mcp_command, container_image, container_image_default, forgejo_url, webhook_secret_file, node_flows, watchdogs, penpot, ui_version
  3. Auth / secretsauth.operator_user, auth.trust_proxy, auth.authelia_logout_url, forgejo_oauth_client_id/_secret, github_oauth_client_id/_secret, public_base_url
  4. Repo bindingsrepos (already partly migrated to the watched_repos SQLite table, F4)

A single editable file mixing all four creates real risk: an operator hand-editing a type definition can fat-finger an oauth secret, accidentally widen trust_proxy, or break the gate. The dashboard's PUT handler already has guards but the file shape itself is the problem.

Acceptance criteria

Loader

  • New file config/service.json carries the service-infra group (item 2 above). Same JSON schema for those fields as today's agents.json
  • Operator-only secret fields (item 3) move to environment variables preferentially:
    • FORGEJO_OAUTH_CLIENT_ID / FORGEJO_OAUTH_CLIENT_SECRET (env override already exists, make it the only path)
    • GITHUB_OAUTH_CLIENT_ID / GITHUB_OAUTH_CLIENT_SECRET
    • GITLAB_OAUTH_CLIENT_ID / GITLAB_OAUTH_CLIENT_SECRET
    • PUBLIC_BASE_URL
    • WEBHOOK_SECRET (or keep WEBHOOK_SECRET_FILE env pointing at a file, like the rest of the secret-file pattern)
  • config/agents.json retains only: types + their per-instance overrides
  • auth block contents — keep operator_user + trust_proxy only if a follow-up audit confirms either is still doing useful work post-Authelia rip; otherwise drop the block entirely. authelia_logout_url already dead (today's logout rewiring), drop unconditionally
  • repos field — assess whether the watched_repos SQLite table fully replaces it; if yes, drop. If migration incomplete, leave for a follow-up issue

Migration mode

  • If a deployed agents.json still contains service / auth / repo fields, loader copies them into the resolved config (legacy fallback) AND prints a deprecation warning per field at boot
  • One-shot helper just config-split reads the legacy agents.json and writes config/service.json + a .env.split.example for the secret env vars; backs up the original to agents.json.pre-split.bak
  • After one stable release with the warning live, the fallback is removed in a follow-up

Tests

  • webhook-config.test.ts: load a fresh-style split set (agents.json types-only + service.json + env vars) and assert resolved config matches the legacy single-file equivalent
  • Loader test: legacy agents.json with service fields → warning printed, config still resolves
  • Loader test: missing OAUTH_ENCRYPTION_KEY and missing OAuth client env vars → service still boots, OAuth init routes return 503 (current behaviour preserved)

Dashboard

  • PUT /config/agents schema validation rejects any non-types field, returns 400 with a pointer to the new home (env var or service.json)

Docs

  • docs/configuration.md (new or existing) lists every field, its file/env home, and the operator vs ops ownership

Out of scope

  • Removing the auth block entirely (depends on the post-Authelia audit, separate issue)
  • Migrating the repos field out of agents.json (depends on F4 watched_repos rollout)
  • A web UI for editing service.json — these are deploy-time, file-edit-only by design

References

  • Audit findings (2026-04-28, this conversation)
  • Today's PR #539 (/api/* boundary refactor) — same spirit, different layer
  • F4 watched_repos table (already replaces repos: for new bindings)
  • apps/server/src/shared/config/webhook-config.ts — the single loader that needs the surgery
  • apps/server/src/shared/config/agents-config-schema.ts — Zod schema split target
## User story As an operator, I want `config/agents.json` to hold **only** agent-fleet configuration (types + per-instance overrides), so that operator-editable fields can't be mistakenly mixed with deploy-time infra and secrets, and the dashboard's `/config/agents` PUT surface has a single, narrow blast radius. ## Why Audit on 2026-04-28 showed `config/agents.json` is doing four unrelated jobs: 1. **Agent fleet** — `types`, per-instance overrides (operator-editable, dashboard surface) 2. **Service infra** — `forge_mcp_command`, `container_image`, `container_image_default`, `forgejo_url`, `webhook_secret_file`, `node_flows`, `watchdogs`, `penpot`, `ui_version` 3. **Auth / secrets** — `auth.operator_user`, `auth.trust_proxy`, `auth.authelia_logout_url`, `forgejo_oauth_client_id`/`_secret`, `github_oauth_client_id`/`_secret`, `public_base_url` 4. **Repo bindings** — `repos` (already partly migrated to the `watched_repos` SQLite table, F4) A single editable file mixing all four creates real risk: an operator hand-editing a type definition can fat-finger an oauth secret, accidentally widen `trust_proxy`, or break the gate. The dashboard's PUT handler already has guards but the file shape itself is the problem. ## Acceptance criteria ### Loader - [ ] New file `config/service.json` carries the **service-infra** group (item 2 above). Same JSON schema for those fields as today's agents.json - [ ] Operator-only secret fields (item 3) move to **environment variables** preferentially: - `FORGEJO_OAUTH_CLIENT_ID` / `FORGEJO_OAUTH_CLIENT_SECRET` (env override already exists, make it the only path) - `GITHUB_OAUTH_CLIENT_ID` / `GITHUB_OAUTH_CLIENT_SECRET` - `GITLAB_OAUTH_CLIENT_ID` / `GITLAB_OAUTH_CLIENT_SECRET` - `PUBLIC_BASE_URL` - `WEBHOOK_SECRET` (or keep `WEBHOOK_SECRET_FILE` env pointing at a file, like the rest of the secret-file pattern) - [ ] `config/agents.json` retains **only**: `types` + their per-instance overrides - [ ] `auth` block contents — keep `operator_user` + `trust_proxy` only if a follow-up audit confirms either is still doing useful work post-Authelia rip; otherwise drop the block entirely. `authelia_logout_url` already dead (today's logout rewiring), drop unconditionally - [ ] `repos` field — assess whether the `watched_repos` SQLite table fully replaces it; if yes, drop. If migration incomplete, leave for a follow-up issue ### Migration mode - [ ] If a deployed `agents.json` still contains service / auth / repo fields, loader copies them into the resolved config (legacy fallback) AND prints a deprecation warning per field at boot - [ ] One-shot helper `just config-split` reads the legacy `agents.json` and writes `config/service.json` + a `.env.split.example` for the secret env vars; backs up the original to `agents.json.pre-split.bak` - [ ] After one stable release with the warning live, the fallback is removed in a follow-up ### Tests - [ ] `webhook-config.test.ts`: load a fresh-style split set (`agents.json` types-only + `service.json` + env vars) and assert resolved config matches the legacy single-file equivalent - [ ] Loader test: legacy `agents.json` with service fields → warning printed, config still resolves - [ ] Loader test: missing `OAUTH_ENCRYPTION_KEY` and missing OAuth client env vars → service still boots, OAuth init routes return 503 (current behaviour preserved) ### Dashboard - [ ] `PUT /config/agents` schema validation rejects any non-types field, returns 400 with a pointer to the new home (env var or `service.json`) ### Docs - [ ] `docs/configuration.md` (new or existing) lists every field, its file/env home, and the operator vs ops ownership ## Out of scope - Removing the `auth` block entirely (depends on the post-Authelia audit, separate issue) - Migrating the `repos` field out of `agents.json` (depends on F4 watched_repos rollout) - A web UI for editing `service.json` — these are deploy-time, file-edit-only by design ## References - Audit findings (2026-04-28, this conversation) - Today's PR #539 (`/api/*` boundary refactor) — same spirit, different layer - F4 watched_repos table (already replaces `repos:` for new bindings) - `apps/server/src/shared/config/webhook-config.ts` — the single loader that needs the surgery - `apps/server/src/shared/config/agents-config-schema.ts` — Zod schema split target
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks#540
No description provided.