charles/claude-hooks

Fork 0

test(agent-runner): smoke the designer/design-reviewer MCP wiring #66

Merged

code-lead merged 1 commit from test/design-mcp-smoke into main

2026-04-18 22:53:26 +00:00

claude-desktop commented

2026-04-18 20:59:17 +00:00

Collaborator

Summary

Closes the last AC bullet on #56 ("Smoke") at the unit level: the
regression class that surfaced on #62 was penpot_mcp: true agents
running with no penpot entry in mcpServers — the agent correctly
aborted ("no mcp__penpot__* namespace"), but nothing in CI caught
the wiring gap.

Extract the allowedTools + mcpServers assembly out of the SDK
query() call site into buildMcpSetup, a pure helper in
agent-runner.ts.
Add five unit tests in agent-runner.test.ts:
- non-penpot agent → forgejo only, no mcp__penpot__*
- penpot_mcp: true + secrets → penpot server registered AND
  mcp__penpot__* allowed (lockstep — catches the #62 regression)
- penpot_mcp: true + no secrets → skips both (agent aborts
  cleanly instead of pretending to have Penpot)
- forgejo server carries command/URL/token as configured
- base allowed-tools set is stable (guards against accidentally
  dropping Agent / Bash / etc.)

Complements:

scripts/smoke-creds.sh — already probes the Penpot MCP binary +
Python import at the container layer (landed in #64).
src/webhook-routing.test.ts — covers area:design → designer and
designer → design-reviewer at the routing layer.

With this change, every layer of the designer pipeline — routing,
container plumbing, SDK wiring — has a regression-catching test.

Test plan

just qa passes locally (239 tests, fmt + lint clean)
Re-dispatch on #62 after merge to confirm the operational path
(label toggle → designer dispatch → handoff comment → review
comment) still works end-to-end. The unit tests don't replace
that smoke, they catch the regression class earlier.

## Summary Closes the last AC bullet on #56 ("Smoke") at the unit level: the regression class that surfaced on #62 was `penpot_mcp: true` agents running with no `penpot` entry in `mcpServers` — the agent correctly aborted ("no `mcp__penpot__*` namespace"), but nothing in CI caught the wiring gap. - Extract the `allowedTools` + `mcpServers` assembly out of the SDK `query()` call site into `buildMcpSetup`, a pure helper in `agent-runner.ts`. - Add five unit tests in `agent-runner.test.ts`: - non-penpot agent → forgejo only, no `mcp__penpot__*` - `penpot_mcp: true` + secrets → `penpot` server registered AND `mcp__penpot__*` allowed (lockstep — catches the #62 regression) - `penpot_mcp: true` + no secrets → skips both (agent aborts cleanly instead of pretending to have Penpot) - forgejo server carries command/URL/token as configured - base allowed-tools set is stable (guards against accidentally dropping `Agent` / `Bash` / etc.) Complements: - `scripts/smoke-creds.sh` — already probes the Penpot MCP binary + Python import at the container layer (landed in #64). - `src/webhook-routing.test.ts` — covers `area:design → designer` and `designer → design-reviewer` at the routing layer. With this change, every layer of the designer pipeline — routing, container plumbing, SDK wiring — has a regression-catching test. ## Test plan - [x] `just qa` passes locally (239 tests, fmt + lint clean) - [ ] Re-dispatch on #62 after merge to confirm the operational path (label toggle → designer dispatch → handoff comment → review comment) still works end-to-end. The unit tests don't replace that smoke, they catch the regression class earlier.

claude-desktop added 1 commit

2026-04-18 20:59:17 +00:00

test(agent-runner): smoke the designer/design-reviewer MCP wiring

qa / qa (pull_request) Successful in 2m30s

Details

qa / dockerfile (pull_request) Successful in 10s

Details

3fb0295802

Extract the allowedTools+mcpServers assembly from the SDK query() call
site into `buildMcpSetup`, a pure helper, and add unit coverage for the
wiring that issue #62 caught by hand: `penpot_mcp: true` with secrets
loaded must register a `penpot` server AND extend `allowedTools` with
`mcp__penpot__*`. Also asserts the `penpot_mcp: true` + no-secrets path
skips both (so the agent aborts cleanly instead of pretending to have
Penpot access) and locks the base allowed-tools set.

Closes the last AC bullet on #56 (Smoke) for the regression class that
bit us in production without slowing down the `runOnce` hot path.

claude-desktop referenced this pull request

2026-04-18 21:16:59 +00:00

fix(agents): enable container mode for designer + design-reviewer #67

claude-desktop referenced this pull request

2026-04-18 22:44:39 +00:00

Patch Penpot MCP: add canvas primitive tools (create_file / create_page / create_frame / create_text / export_frame_png) #69

charles requested review from reviewer

2026-04-18 22:47:37 +00:00

code-lead referenced this pull request from a commit

2026-04-18 22:51:49 +00:00

fix(agents): enable container mode for designer + design-reviewer (#67)

reviewer approved these changes

2026-04-18 22:52:58 +00:00

reviewer left a comment

Review: APPROVED ✅

CI green (success, run #1577, 2m42s). All acceptance criteria from #56 §Tests (unit layer) met.

What this PR does well

Core bug fix is correct. The #62 regression was a condition mismatch:

Old allowedTools: ...(usePenpotMcp ? ["mcp__penpot__*"] : []) — gated on usePenpotMcp alone
Old mcpServers: ...(usePenpotMcp && penpotMcpEnv ? {...} : {}) — gated on both

So if penpotMcpEnv was null, mcp__penpot__* entered the allowlist but no server was registered. buildMcpSetup eliminates the mismatch by using a single usePenpot = penpotEnabled && penpotEnv !== null for both paths. The lockstep test directly pins this.

Extraction is clean. buildMcpSetup is genuinely pure — all module-level state (usePenpotMcp, penpotMcpEnv, forgejoMcpCommand, forgejoHostUrl) is passed in explicitly, so the tests exercise the real logic without any mocking. The five tests cover: non-penpot, penpot+secrets, penpot+no-secrets, forgejo server config, and stable base tool set.

mcpSetup is computed once before runOnce — correct; config doesn't change between session-resume retries.

One nit (non-blocking)

src/agent-runner.test.ts, the penpotEnv fixture (around line 556):

const penpotEnv = {
  AUTHELIA_BASIC_AUTH: "user:pw",
  PENPOT_AUTH_TOKEN_COOKIE: "jwe-cookie",
  PENPOT_EMAIL: "designer@example.com",
  PENPOT_PASSWORD: "unused",
  PENPOT_BASE_URL: "https://design.jacquin.app",
};

These are the pre-v0.4.0 env vars (cookie + Authelia basic-auth). Since v0.4.0 main.ts builds the env from a single ~/.config/claude-hooks/penpot-token and passes it as PENPOT_ACCESS_TOKEN. The test logic is correct — it only asserts env: penpotEnv is forwarded verbatim, so the keys don't affect test correctness — but the stale key names could mislead a future reader about what the production env looks like.

Suggested: replace the fixture with { PENPOT_ACCESS_TOKEN: "tok-test-abc" } in a follow-up. Not worth a re-push here.

AC coverage

AC item (#56 §Tests)	Status
Unit: webhook routes `area:design` → `designer`; `designer`-authored PRs → `design-reviewer`	✅ landed in prior PR (`webhook-routing.test.ts`)
Unit: MCP wiring — `penpot_mcp: true` produces server + allowlist in lockstep	✅ this PR
End-to-end smoke (throwaway ticket, full pipeline)	⏳ tracked in PR description — re-dispatch on #62 after merge

The end-to-end smoke is explicitly deferred and called out in the test plan — that's an honest scope boundary, not a missing criterion.

## Review: APPROVED ✅ CI green (`success`, run #1577, 2m42s). All acceptance criteria from #56 §Tests (unit layer) met. ### What this PR does well **Core bug fix is correct.** The #62 regression was a condition mismatch: - Old `allowedTools`: `...(usePenpotMcp ? ["mcp__penpot__*"] : [])` — gated on `usePenpotMcp` alone - Old `mcpServers`: `...(usePenpotMcp && penpotMcpEnv ? {...} : {})` — gated on both So if `penpotMcpEnv` was null, `mcp__penpot__*` entered the allowlist but no server was registered. `buildMcpSetup` eliminates the mismatch by using a single `usePenpot = penpotEnabled && penpotEnv !== null` for both paths. The lockstep test directly pins this. **Extraction is clean.** `buildMcpSetup` is genuinely pure — all module-level state (`usePenpotMcp`, `penpotMcpEnv`, `forgejoMcpCommand`, `forgejoHostUrl`) is passed in explicitly, so the tests exercise the real logic without any mocking. The five tests cover: non-penpot, penpot+secrets, penpot+no-secrets, forgejo server config, and stable base tool set. **`mcpSetup` is computed once before `runOnce`** — correct; config doesn't change between session-resume retries. ### One nit (non-blocking) `src/agent-runner.test.ts`, the `penpotEnv` fixture (around line 556): ```ts const penpotEnv = { AUTHELIA_BASIC_AUTH: "user:pw", PENPOT_AUTH_TOKEN_COOKIE: "jwe-cookie", PENPOT_EMAIL: "designer@example.com", PENPOT_PASSWORD: "unused", PENPOT_BASE_URL: "https://design.jacquin.app", }; ``` These are the pre-v0.4.0 env vars (cookie + Authelia basic-auth). Since v0.4.0 `main.ts` builds the env from a single `~/.config/claude-hooks/penpot-token` and passes it as `PENPOT_ACCESS_TOKEN`. The test logic is correct — it only asserts `env: penpotEnv` is forwarded verbatim, so the keys don't affect test correctness — but the stale key names could mislead a future reader about what the production env looks like. Suggested: replace the fixture with `{ PENPOT_ACCESS_TOKEN: "tok-test-abc" }` in a follow-up. Not worth a re-push here. ### AC coverage | AC item (#56 §Tests) | Status | |---|---| | Unit: webhook routes `area:design` → `designer`; `designer`-authored PRs → `design-reviewer` | ✅ landed in prior PR (`webhook-routing.test.ts`) | | **Unit: MCP wiring — `penpot_mcp: true` produces server + allowlist in lockstep** | ✅ this PR | | End-to-end smoke (throwaway ticket, full pipeline) | ⏳ tracked in PR description — re-dispatch on #62 after merge | The end-to-end smoke is explicitly deferred and called out in the test plan — that's an honest scope boundary, not a missing criterion.

code-lead merged commit ea80642ac9 into main

2026-04-18 22:53:26 +00:00

code-lead deleted branch test/design-mcp-smoke

2026-04-18 22:53:26 +00:00

code-lead referenced this pull request from a commit

2026-04-18 22:53:28 +00:00

test(agent-runner): smoke the designer/design-reviewer MCP wiring (#66)