M17-4: Design-reviewer end-to-end loop #155

Closed
opened 2026-04-20 14:50:22 +00:00 by code-lead · 1 comment
Collaborator

As an operator, I want the design-reviewer agent to actually review design outputs (Penpot frames + dashboard PRs) with the same diff-scoped rigor as code-reviewer, so that we close the designer → design-reviewer loop instead of letting the agent sit idle (observation: /health shows no task in 24h).

Acceptance criteria

Triggers

  • design-reviewer runs on PRs labeled area:dashboard OR PRs authored by designer (latter already wired in webhook-routing)
  • Add design-reviewer to the pool for area:dashboard PRs via match_labels: ["area:dashboard"]

Skill behaviour

  • Document the handoff in skills/design-review.md: read the PR's dashboard.html diff, compare to the Penpot frame linked in the PR body, report layout discrepancies + token-taxonomy violations
  • Low-tech visual regression check: skill verifies every colour hex in the PR diff is one of the documented design/tokens.json values; flag any raw hex

End-to-end

  • Dispatch design-reviewer on a live dashboard PR, verify it submits APPROVED or REQUEST_CHANGES via create_pull_review — NOT just a comment

Docs

  • README + CLAUDE.md — "Design review flow" section

Out of scope

  • GPU-heavy screenshot diffing (can't do locally — Penpot MCP export is lossy)
  • Binary image comparison
  • Auto-closing design PRs

Dependencies

  • Independent of #M17-5 / #M17-3 / #M17-2. Can parallel with #M17-1.

References

  • Spec: specs/m17-multi-repo-and-observability.md §Story M17-4
  • Design-reviewer idle observation: config/agents.json + service /health shows no task in 24h
  • Design tokens source of truth: design/tokens.json (DTCG, Tokyo Night)
As an operator, I want the `design-reviewer` agent to actually review design outputs (Penpot frames + dashboard PRs) with the same diff-scoped rigor as code-reviewer, so that we close the designer → design-reviewer loop instead of letting the agent sit idle (observation: `/health` shows no task in 24h). ## Acceptance criteria ### Triggers - [ ] `design-reviewer` runs on PRs labeled `area:dashboard` OR PRs authored by `designer` (latter already wired in webhook-routing) - [ ] Add `design-reviewer` to the pool for `area:dashboard` PRs via `match_labels: ["area:dashboard"]` ### Skill behaviour - [ ] Document the handoff in `skills/design-review.md`: read the PR's `dashboard.html` diff, compare to the Penpot frame linked in the PR body, report layout discrepancies + token-taxonomy violations - [ ] Low-tech visual regression check: skill verifies every colour hex in the PR diff is one of the documented `design/tokens.json` values; flag any raw hex ### End-to-end - [ ] Dispatch `design-reviewer` on a live dashboard PR, verify it submits APPROVED or REQUEST_CHANGES via `create_pull_review` — NOT just a comment ### Docs - [ ] README + CLAUDE.md — "Design review flow" section ## Out of scope - GPU-heavy screenshot diffing (can't do locally — Penpot MCP export is lossy) - Binary image comparison - Auto-closing design PRs ## Dependencies - Independent of #M17-5 / #M17-3 / #M17-2. Can parallel with #M17-1. ## References - Spec: `specs/m17-multi-repo-and-observability.md` §Story M17-4 - Design-reviewer idle observation: `config/agents.json` + service `/health` shows no task in 24h - Design tokens source of truth: `design/tokens.json` (DTCG, Tokyo Night)
Collaborator

🔍 Design Review — issue #155

Reviewed by: design-reviewer
Date: 2026-04-20
Penpot file: claude-hooks — dashboard (file id 689d7fa4-f94b-81d4-8007-e39c5c82f66c, revn 1500, last modified 2026-04-20T08:48:26Z)


⚠️ Blocked: two preconditions for visual review are not met

1. No designer handoff comment

Issue #155 has zero comments. The expected workflow requires the designer agent to post a handoff comment with a Penpot deep-link before design-reviewer is dispatched. Without that comment this reviewer has no canonical frame list to inspect and no link to confirm it is reviewing the right revision.

Fix: when designer attaches area:design-review, it must also post a handoff comment in the format:

🎨 Design ready for review
Penpot: https://design.jacquin.app/view/…?page-id=…&frame-id=…
Frames: Dashboard / AgentCard, Dashboard / QueuePanel, …

This is itself a gap in the end-to-end loop described by this story's acceptance criteria.

2. export_frame_png not available in current Penpot MCP

The Penpot MCP installed in this container (penpot-mcp-server v0.4.0 fork) does not expose an export tool. Tools available: list_teams, list_projects, list_files, get_file_info, get_design_tokens, and shape/token creation tools. No export_frame_png or equivalent.

Without PNG export the multimodal visual inspection step cannot execute.

Fix (infra): the fork at penpot-mcp-server/ needs an export_frame_png tool wired to Penpot's export-file-object RPC (returns a PNG byte stream). This is a prerequisite for design-reviewer to do any real work.


Findings from what was inspectable

typography · claude-hooks — dashboard · file-level

Finding: design tokens are empty (get_design_tokens{}). The file contains no declared colour, dimension, or typography tokens. According to the M17-4 acceptance criteria, design-reviewer must flag any raw hex in the PR diff that isn't in design/tokens.json. If the Penpot file itself carries no tokens, designers are working without a token reference and raw-hex violations will proliferate undetected.

Fix: import the design/tokens.json DTCG token set into the Penpot file via import_tokens_dtcg and ensure the designer agent does this on first file creation. Add a check in the design-review.md skill: if get_design_tokens returns {}, flag it as a blocker before proceeding.


Summary

Category Findings
overflow — (could not inspect)
contrast — (could not inspect)
alignment — (could not inspect)
typography ⚠️ Empty token set in Penpot file
UX ⚠️ No handoff comment → reviewer has no frame list
suggestion Add export_frame_png to penpot-mcp-server fork; enforce handoff comment in designer skill

Verdict: cannot issue APPROVED / REQUEST_CHANGES — visual inspection blocked. Two infrastructure gaps must be closed first (MCP export tool, designer handoff comment). The empty-token finding stands independently and should be addressed before the next design iteration.

## 🔍 Design Review — issue #155 **Reviewed by:** design-reviewer **Date:** 2026-04-20 **Penpot file:** `claude-hooks — dashboard` (file id `689d7fa4-f94b-81d4-8007-e39c5c82f66c`, revn 1500, last modified 2026-04-20T08:48:26Z) --- ### ⚠️ Blocked: two preconditions for visual review are not met #### 1. No designer handoff comment Issue #155 has **zero comments**. The expected workflow requires the `designer` agent to post a handoff comment with a Penpot deep-link before `design-reviewer` is dispatched. Without that comment this reviewer has no canonical frame list to inspect and no link to confirm it is reviewing the right revision. **Fix:** when `designer` attaches `area:design-review`, it must also post a handoff comment in the format: ``` 🎨 Design ready for review Penpot: https://design.jacquin.app/view/…?page-id=…&frame-id=… Frames: Dashboard / AgentCard, Dashboard / QueuePanel, … ``` This is itself a gap in the end-to-end loop described by this story's acceptance criteria. #### 2. `export_frame_png` not available in current Penpot MCP The Penpot MCP installed in this container (`penpot-mcp-server` v0.4.0 fork) does **not** expose an export tool. Tools available: `list_teams`, `list_projects`, `list_files`, `get_file_info`, `get_design_tokens`, and shape/token creation tools. No `export_frame_png` or equivalent. Without PNG export the multimodal visual inspection step cannot execute. **Fix (infra):** the fork at `penpot-mcp-server/` needs an `export_frame_png` tool wired to Penpot's `export-file-object` RPC (returns a PNG byte stream). This is a prerequisite for `design-reviewer` to do any real work. --- ### Findings from what _was_ inspectable #### `typography` · `claude-hooks — dashboard` · file-level **Finding:** design tokens are **empty** (`get_design_tokens` → `{}`). The file contains no declared colour, dimension, or typography tokens. According to the M17-4 acceptance criteria, `design-reviewer` must flag any raw hex in the PR diff that isn't in `design/tokens.json`. If the Penpot file itself carries no tokens, designers are working without a token reference and raw-hex violations will proliferate undetected. **Fix:** import the `design/tokens.json` DTCG token set into the Penpot file via `import_tokens_dtcg` and ensure the `designer` agent does this on first file creation. Add a check in the `design-review.md` skill: if `get_design_tokens` returns `{}`, flag it as a blocker before proceeding. --- ### Summary | Category | Findings | |---|---| | overflow | — (could not inspect) | | contrast | — (could not inspect) | | alignment | — (could not inspect) | | typography | ⚠️ Empty token set in Penpot file | | UX | ⚠️ No handoff comment → reviewer has no frame list | | suggestion | Add `export_frame_png` to penpot-mcp-server fork; enforce handoff comment in designer skill | **Verdict:** cannot issue APPROVED / REQUEST_CHANGES — visual inspection blocked. Two infrastructure gaps must be closed first (MCP export tool, designer handoff comment). The empty-token finding stands independently and should be addressed before the next design iteration.
Sign in to join this conversation.
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks#155
No description provided.