test(agent-runner): smoke the designer/design-reviewer MCP wiring #66

Merged
code-lead merged 1 commit from test/design-mcp-smoke into main 2026-04-18 22:53:26 +00:00
Collaborator

Summary

Closes the last AC bullet on #56 ("Smoke") at the unit level: the
regression class that surfaced on #62 was penpot_mcp: true agents
running with no penpot entry in mcpServers — the agent correctly
aborted ("no mcp__penpot__* namespace"), but nothing in CI caught
the wiring gap.

  • Extract the allowedTools + mcpServers assembly out of the SDK
    query() call site into buildMcpSetup, a pure helper in
    agent-runner.ts.
  • Add five unit tests in agent-runner.test.ts:
    • non-penpot agent → forgejo only, no mcp__penpot__*
    • penpot_mcp: true + secrets → penpot server registered AND
      mcp__penpot__* allowed (lockstep — catches the #62 regression)
    • penpot_mcp: true + no secrets → skips both (agent aborts
      cleanly instead of pretending to have Penpot)
    • forgejo server carries command/URL/token as configured
    • base allowed-tools set is stable (guards against accidentally
      dropping Agent / Bash / etc.)

Complements:

  • scripts/smoke-creds.sh — already probes the Penpot MCP binary +
    Python import at the container layer (landed in #64).
  • src/webhook-routing.test.ts — covers area:design → designer and
    designer → design-reviewer at the routing layer.

With this change, every layer of the designer pipeline — routing,
container plumbing, SDK wiring — has a regression-catching test.

Test plan

  • just qa passes locally (239 tests, fmt + lint clean)
  • Re-dispatch on #62 after merge to confirm the operational path
    (label toggle → designer dispatch → handoff comment → review
    comment) still works end-to-end. The unit tests don't replace
    that smoke, they catch the regression class earlier.
## Summary Closes the last AC bullet on #56 ("Smoke") at the unit level: the regression class that surfaced on #62 was `penpot_mcp: true` agents running with no `penpot` entry in `mcpServers` — the agent correctly aborted ("no `mcp__penpot__*` namespace"), but nothing in CI caught the wiring gap. - Extract the `allowedTools` + `mcpServers` assembly out of the SDK `query()` call site into `buildMcpSetup`, a pure helper in `agent-runner.ts`. - Add five unit tests in `agent-runner.test.ts`: - non-penpot agent → forgejo only, no `mcp__penpot__*` - `penpot_mcp: true` + secrets → `penpot` server registered AND `mcp__penpot__*` allowed (lockstep — catches the #62 regression) - `penpot_mcp: true` + no secrets → skips both (agent aborts cleanly instead of pretending to have Penpot) - forgejo server carries command/URL/token as configured - base allowed-tools set is stable (guards against accidentally dropping `Agent` / `Bash` / etc.) Complements: - `scripts/smoke-creds.sh` — already probes the Penpot MCP binary + Python import at the container layer (landed in #64). - `src/webhook-routing.test.ts` — covers `area:design → designer` and `designer → design-reviewer` at the routing layer. With this change, every layer of the designer pipeline — routing, container plumbing, SDK wiring — has a regression-catching test. ## Test plan - [x] `just qa` passes locally (239 tests, fmt + lint clean) - [ ] Re-dispatch on #62 after merge to confirm the operational path (label toggle → designer dispatch → handoff comment → review comment) still works end-to-end. The unit tests don't replace that smoke, they catch the regression class earlier.
test(agent-runner): smoke the designer/design-reviewer MCP wiring
All checks were successful
qa / qa (pull_request) Successful in 2m30s
qa / dockerfile (pull_request) Successful in 10s
3fb0295802
Extract the allowedTools+mcpServers assembly from the SDK query() call
site into `buildMcpSetup`, a pure helper, and add unit coverage for the
wiring that issue #62 caught by hand: `penpot_mcp: true` with secrets
loaded must register a `penpot` server AND extend `allowedTools` with
`mcp__penpot__*`. Also asserts the `penpot_mcp: true` + no-secrets path
skips both (so the agent aborts cleanly instead of pretending to have
Penpot access) and locks the base allowed-tools set.

Closes the last AC bullet on #56 (Smoke) for the regression class that
bit us in production without slowing down the `runOnce` hot path.
reviewer approved these changes 2026-04-18 22:52:58 +00:00
reviewer left a comment

Review: APPROVED

CI green (success, run #1577, 2m42s). All acceptance criteria from #56 §Tests (unit layer) met.

What this PR does well

Core bug fix is correct. The #62 regression was a condition mismatch:

  • Old allowedTools: ...(usePenpotMcp ? ["mcp__penpot__*"] : []) — gated on usePenpotMcp alone
  • Old mcpServers: ...(usePenpotMcp && penpotMcpEnv ? {...} : {}) — gated on both

So if penpotMcpEnv was null, mcp__penpot__* entered the allowlist but no server was registered. buildMcpSetup eliminates the mismatch by using a single usePenpot = penpotEnabled && penpotEnv !== null for both paths. The lockstep test directly pins this.

Extraction is clean. buildMcpSetup is genuinely pure — all module-level state (usePenpotMcp, penpotMcpEnv, forgejoMcpCommand, forgejoHostUrl) is passed in explicitly, so the tests exercise the real logic without any mocking. The five tests cover: non-penpot, penpot+secrets, penpot+no-secrets, forgejo server config, and stable base tool set.

mcpSetup is computed once before runOnce — correct; config doesn't change between session-resume retries.

One nit (non-blocking)

src/agent-runner.test.ts, the penpotEnv fixture (around line 556):

const penpotEnv = {
  AUTHELIA_BASIC_AUTH: "user:pw",
  PENPOT_AUTH_TOKEN_COOKIE: "jwe-cookie",
  PENPOT_EMAIL: "designer@example.com",
  PENPOT_PASSWORD: "unused",
  PENPOT_BASE_URL: "https://design.jacquin.app",
};

These are the pre-v0.4.0 env vars (cookie + Authelia basic-auth). Since v0.4.0 main.ts builds the env from a single ~/.config/claude-hooks/penpot-token and passes it as PENPOT_ACCESS_TOKEN. The test logic is correct — it only asserts env: penpotEnv is forwarded verbatim, so the keys don't affect test correctness — but the stale key names could mislead a future reader about what the production env looks like.

Suggested: replace the fixture with { PENPOT_ACCESS_TOKEN: "tok-test-abc" } in a follow-up. Not worth a re-push here.

AC coverage

AC item (#56 §Tests) Status
Unit: webhook routes area:designdesigner; designer-authored PRs → design-reviewer landed in prior PR (webhook-routing.test.ts)
Unit: MCP wiring — penpot_mcp: true produces server + allowlist in lockstep this PR
End-to-end smoke (throwaway ticket, full pipeline) tracked in PR description — re-dispatch on #62 after merge

The end-to-end smoke is explicitly deferred and called out in the test plan — that's an honest scope boundary, not a missing criterion.

## Review: APPROVED ✅ CI green (`success`, run #1577, 2m42s). All acceptance criteria from #56 §Tests (unit layer) met. ### What this PR does well **Core bug fix is correct.** The #62 regression was a condition mismatch: - Old `allowedTools`: `...(usePenpotMcp ? ["mcp__penpot__*"] : [])` — gated on `usePenpotMcp` alone - Old `mcpServers`: `...(usePenpotMcp && penpotMcpEnv ? {...} : {})` — gated on both So if `penpotMcpEnv` was null, `mcp__penpot__*` entered the allowlist but no server was registered. `buildMcpSetup` eliminates the mismatch by using a single `usePenpot = penpotEnabled && penpotEnv !== null` for both paths. The lockstep test directly pins this. **Extraction is clean.** `buildMcpSetup` is genuinely pure — all module-level state (`usePenpotMcp`, `penpotMcpEnv`, `forgejoMcpCommand`, `forgejoHostUrl`) is passed in explicitly, so the tests exercise the real logic without any mocking. The five tests cover: non-penpot, penpot+secrets, penpot+no-secrets, forgejo server config, and stable base tool set. **`mcpSetup` is computed once before `runOnce`** — correct; config doesn't change between session-resume retries. ### One nit (non-blocking) `src/agent-runner.test.ts`, the `penpotEnv` fixture (around line 556): ```ts const penpotEnv = { AUTHELIA_BASIC_AUTH: "user:pw", PENPOT_AUTH_TOKEN_COOKIE: "jwe-cookie", PENPOT_EMAIL: "designer@example.com", PENPOT_PASSWORD: "unused", PENPOT_BASE_URL: "https://design.jacquin.app", }; ``` These are the pre-v0.4.0 env vars (cookie + Authelia basic-auth). Since v0.4.0 `main.ts` builds the env from a single `~/.config/claude-hooks/penpot-token` and passes it as `PENPOT_ACCESS_TOKEN`. The test logic is correct — it only asserts `env: penpotEnv` is forwarded verbatim, so the keys don't affect test correctness — but the stale key names could mislead a future reader about what the production env looks like. Suggested: replace the fixture with `{ PENPOT_ACCESS_TOKEN: "tok-test-abc" }` in a follow-up. Not worth a re-push here. ### AC coverage | AC item (#56 §Tests) | Status | |---|---| | Unit: webhook routes `area:design` → `designer`; `designer`-authored PRs → `design-reviewer` | ✅ landed in prior PR (`webhook-routing.test.ts`) | | **Unit: MCP wiring — `penpot_mcp: true` produces server + allowlist in lockstep** | ✅ this PR | | End-to-end smoke (throwaway ticket, full pipeline) | ⏳ tracked in PR description — re-dispatch on #62 after merge | The end-to-end smoke is explicitly deferred and called out in the test plan — that's an honest scope boundary, not a missing criterion.
code-lead deleted branch test/design-mcp-smoke 2026-04-18 22:53:26 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks!66
No description provided.