feat(release): release-only runner label + post-push rollback #115
No reviewers
Labels
No labels
area:agents
area:dashboard
area:database
area:design
area:design-review
area:flows
area:infra
area:meta
area:security
area:sessions
area:webhook
area:workdir
security
type:bug
type:chore
type:meta
type:user-story
No milestone
No project
No assignees
4 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
charles/claude-hooks!115
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "boss/26"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Closes #26.
Summary
publish-imagenow gates onruns-on: [docker, release]. PR CI(
qa.yml) stays on plaindocker, so it can never land on asocket-mounted runner by accident. Only the dedicated release runner
advertises the
releaselabel.Rollback on smoke-test failurestep onpublish-image. Firesvia
if: failure() && steps.push.outcome == 'success', deletes the:${VERSION}and:latestcontainer package versions from theForgejo registry, and PATCHes the release to
prerelease=truewith a"DO NOT USE" banner. Both API calls are idempotent (404 = already gone
= ok).
docs/runner-setup.mdcovers runner registration, thecontainer.options: -v /var/run/docker.sock:/var/run/docker.sockconfig.yml snippet, the security rationale (socket ≡ root-on-host →
repo-scoped token + release-label gating), and a release-candidate
dry-run procedure (
v0.0.0-rc1) for smoking the pipeline end-to-endbefore cutting a real release.
README.md— brief "Release pipeline" section that points atdocs/runner-setup.md.Acceptance-criteria map (issue #26)
container.options: -v /var/run/docker.sock:...). Seedocs/runner-setup.md.release.ymljobs pick it up(
runs-on: [docker, release]onpublish-image; every otherjob stays on plain
docker).docs/runner-setup.md+ README pointer.from the earlier pass, preserved here:
bun --version/git --version/forgejo-mcp --help/claude --versionagainst the pushed image.release to prerelease with DO NOT USE banner (new).
qa.yml's staticdockerfilejob untouched.v0.0.0-rc1documented indocs/runner-setup.mdso the first real release doesn't landblind.
Test plan
docs/runner-setup.md(labelsdocker,release, socket bind-mount inconfig.yml).git tag v0.0.0-rc1 && git push origin v0.0.0-rc1; watchrelease.ymlexercisepublish-imageend-to-end on the releaserunner.
:v0.0.0-rc1container package appears in the Forgejoregistry and the release notes get a "Container image" section
with a sha256 digest.
temporarily renaming
claudein the Dockerfile) and confirm therollback step deletes
:v0.0.0-rc2+:latestand flips therelease to prerelease with the DO NOT USE banner.
publish-image now gates on `runs-on: [docker, release]` so PR CI (which stays on plain `docker`) can never land on a socket-mounted runner. Only the dedicated release runner — registered with both labels and `container.options: -v /var/run/docker.sock:/var/run/docker.sock` in its config.yml — advertises `release`. Add a `Rollback on smoke-test failure` step that fires via `if: failure() && steps.push.outcome == 'success'`: deletes the `:${VERSION}` and `:latest` container package versions from the Forgejo registry and PATCHes the release to prerelease=true with a DO NOT USE banner. Both calls are best-effort — a 404 on package delete is treated as success so re-runs are idempotent. docs/runner-setup.md covers registration, config.yml, security posture (socket = root-on-host, therefore repo-scoped + release-label-gated), and the rc-tag dry-run procedure for exercising the pipeline before cutting a real release. Closes #26. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>Review of PR #115
See detailed findings below.
CI: ✅ Green (run #1671)
❌ Rollback fires on release-notes annotation failure
File:
.forgejo/workflows/release.yml—Rollback on smoke-test failurestepThe condition
if: failure() && steps.push.outcome == "success"fires whenever any prior step fails, includingUpdate release notes with image digest. That step usescurl -fsSLwhich exits non-zero on any 4xx/5xx API response (transient outage, rate-limit).Result: a temporary Forgejo API error in the annotation step triggers the rollback, deleting a perfectly valid image and marking the release DO NOT USE — even though smoke tests and the credential audit both passed.
Fix: add
continue-on-error: truetoUpdate release notes with image digest:⚠️ Minor —
forgejo-mcp --help 2>&1 || truemasks missing binaryFile:
.forgejo/workflows/release.yml—Smoke test imagestep|| trueswallows exit 127 (command not found). In practice low-risk (missing binary would be caught at build time), but the forgejo-mcp check is weaker than the other three smoke-test lines. Considerdocker run --rm "$IMAGE" sh -c "command -v forgejo-mcp && forgejo-mcp --help; :".✅ Everything else
runs-on: [docker, release]onpublish-imageonly; all other jobs stay on plaindocker. ✓steps.push.outcome == "success"guard is correct. ✓set -u+ case-based 404 tolerance in rollback script is appropriate. ✓|| echo ""fallback + empty-check withexit 0is correct. ✓:latestdeletion trade-off acknowledged and documented indocs/runner-setup.md. ✓docs/runner-setup.mdis comprehensive: registration,config.ymlsnippet, security rationale, dry-run procedure, troubleshooting. ✓Both findings addressed in
fd1d624.Rollback fires on release-notes annotation failure
Agreed — the annotation is cosmetic, the image is already pushed, and the rollback should not be on the hook for a Forgejo API blip on
PATCH /releases/{id}. Addedcontinue-on-error: trueto theUpdate release notes with image digeststep with an inline comment explaining the rationale.forgejo-mcp --help 2>&1 || truemasks missing binaryAgreed — applied your suggestion verbatim:
command -vprints the binary's PATH location to the log on success, so a missing binary leaves a visible gap rather than being silently swallowed by|| true. The trailing; :keeps the step green if--helpitself returns non-zero (some help implementations do).Review — PR #115: release-only runner label + post-push rollback
CI
Run #1673 (
fd1d624) — ✅ success (2m38s).Acceptance-criteria coverage
All items from issue #26 are met in principle — runner label isolation, runtime smoke tests, credential audit, rollback, documentation. Good work overall. One correctness issue blocks approval.
❌
forgejo-mcpbinary-absent case is silently swallowedFile:
.forgejo/workflows/release.yml—Smoke test imagestepIn POSIX sh the trailing
; :is a separate command that always exits 0. The exit status of the whole shell string is therefore always 0, regardless of whethercommand -v forgejo-mcpfails. Concretely:forgejo-mcppresent →command -vprints path,--helpruns (exit code ignored by; :), step green. ✓forgejo-mcpabsent →command -vexits 1,&&short-circuits,; :runs, step exits 0 and the release proceeds with a broken image. ✗The in-code comment says "a missing binary still leaves a visible 'no path printed' trail in the log instead of being silently swallowed by
|| true" — that is true for observability, but the job does not fail. This contradicts issue #26 AC: "Fail the release if any command exits non-zero."Fix — separate the "binary exists" check (must be fatal) from the "--help exit code" tolerance:
Or keep it as a single
docker runbut scope the; :to the--helpinvocation only:Either form lets
command -vfailure propagate and fail the step, while still tolerating a non-zero exit from--help.Everything else ✅
runs-on: [docker, release]isolation is correct;qa.ymlstays on plaindockerand is not changed in this PR.continue-on-error: trueon the annotation step correctly prevents a transient Forgejo API blip from triggering the rollback (the rollback gates onfailure(), whichcontinue-on-errorsuppresses for that step). Logic is sound.:lateston rollback is documented as a conscious trade-off indocs/runner-setup.md— acceptable.docs/runner-setup.mdis thorough: labels table,config.ymlsnippet, security rationale, dry-run procedure, troubleshooting.FORGEJO_TOKENandREGISTRY_TOKENscopes documented as repo-scoped — good security posture.Addressed in
36623ad.forgejo-mcpbinary-absent case silently swallowedAgreed — you're right,
; :was sibling tocommand -v, not subordinate to--help. Scoped it to a subshell around--helponly, as you suggested:Now
command -vfailure (exit 1) propagates out ofsh -cand fails the step, while a non-zero--helpexit stays tolerated inside the subshell. Updated the inline comment to match the new semantics.Review: feat(release): release-only runner label + post-push rollback
CI: ✅ Green (
fix(release): isolate annotation failures from rollback; sharpen smoke test, run #1673, 2m38s, success)Acceptance criteria coverage: All criteria from #26 are addressed architecturally, but one smoke-test check has a logic bug that causes it to never fail even when the binary is absent — violating the AC requirement to "fail the release if any command exits non-zero."
🔴 Bug:
forgejo-mcpsmoke test never fails — binary absence is undetectableFile:
.forgejo/workflows/release.yml,Smoke test imagestepThe problem:
The shell grammar here is:
The
;has lower precedence than&&, so:(a no-op that always exits 0) runs unconditionally as a separate statement regardless of what the&&chain returned. Whenforgejo-mcpis absent from the image:command -v forgejo-mcp→ exits 1&&short-circuits —forgejo-mcp --helpskipped;separates next statement:runs → exits 0docker runexits 0 → step is greenA Docker image with no
forgejo-mcpbinary passes the smoke test. The rollback is never triggered. The acceptance criterion says: "Fail the release if any command exits non-zero." A missing binary is clearly the highest-severity failure mode.The comment says "a missing binary still leaves a visible trail in the log" — but the trail is just a blank line from
command -vprinting nothing. That does not constitute failure.Fix: wrap the
--helpinvocation in a subshell so&&short-circuits the entire right-hand side when the binary is absent:With this fix:
command -vexits 1 →&&skips the subshell →shexits 1 → step FAILS → rollback triggered ✓--helpexits non-zero:command -vprints path → subshell runs →; :exits 0 → step SUCCEEDS ✓The other three checks (
bun --version,git --version,claude --version) correctly fail when their binary is missing because they invoke the binary directly without a; :guard. Theforgejo-mcpcheck should behave consistently.Everything else looks solid
runs-on: [docker, release]correctly gatespublish-imageto the dedicated runner;qa.ymlstays on plaindocker✓continue-on-error: trueon the annotation step is correct — a transient Forgejo API blip on the cosmetic digest update must not trigger the rollback ✓if: failure() && steps.push.outcome == 'success'is the right guard ✓set -uin rollback catches undefined variable expansions ✓RELEASE_IDfetch uses|| echo ''+ empty-string guard before the PATCH — correct ✓:latestrollback tradeoff (could delete a good:latestfrom a prior real release) is explicitly documented indocs/runner-setup.md— accepted ✓docs/runner-setup.mdis thorough: registration steps,config.ymlsnippet, security rationale (socket = root on host), repo-scoped token guidance, dry-run procedure ✓Looks like this review raced the fix — commit
36623ad(tagged on the review itself) already contains the subshell form you're asking for. The quoted buggy line is fromfd1d624(the previous commit).Current state of
.forgejo/workflows/release.ymlline 197 on36623ad:Which matches your recommended fix verbatim:
command -vexits 1 →&&short-circuits the subshell →sh -cexits 1 → step fails → rollback triggers ✓--helpnon-zero:command -vprints path → subshell runsforgejo-mcp --help; :→ exits 0 → step passes ✓Inline comment above the line was also rewritten to describe the new semantics ("must fail the step if the binary is missing …
; :is scoped inside a subshell around--helponly").Nothing to push — could you re-review
36623addirectly?36623adbae17e621d12bReview — REQUEST_CHANGES (CI pending)
CI was still running at review time — push any trivial change (or wait) and I will re-review when it completes.
Code review findings (independent of CI)
The implementation is well-structured and I found no correctness bugs. All acceptance criteria from issue #26 are met.
.forgejo/workflows/release.ymlpublish-imagerunner label —runs-on: [docker, release]is the right mechanism. Every other job keeps plaindocker, so PR CI can never land on the socket-mounted runner even if they share the same host. ✅Rollback trigger condition —
if: failure() && steps.push.outcome == 'success'is precise: it only fires when there is something to roll back (push succeeded) and something went wrong after the push. ✅continue-on-error: trueon digest annotation — Correct design. A transient API blip on the cosmetic digest annotation must not trigger rollback of a good image. The comment explains the reasoning clearly. ✅Package DELETE loop — Avoids
-fon curl, captures HTTP status, treats 2xx and 404 as success. Non-fatal for other codes. The original failure is never masked. ✅Empty RELEASE_ID guard in rollback —
set -uactive, but|| echo ''fallback +[ -z "${RELEASE_ID}" ]early-exit handle the missing-release case cleanly. ✅Python JSON body encoding —
json.dumps(sys.stdin.read())properly escapes the PATCH body, no injection risk from release body content. ✅docs/runner-setup.mdClear, complete, operator-friendly. Covers registration,
config.ymlsnippet, security rationale (socket ≡ root on host → repo-scoped token + label gating), and the rc dry-run procedure. The:latestrollback caveat is called out explicitly. ✅README.mdBrief "Release pipeline" section with a pointer to
docs/runner-setup.md. Appropriate level of detail. ✅Acceptance-criteria map
releaselabel gatespublish-image; PR CI stays on plaindockerdocs/runner-setup.md+ README pointerbun/git/forgejo-mcp/claudesmoke tests after pushqa.ymlstaticdockerfilejob untouchedv0.0.0-rc1documentedReady to approve once CI goes green.
Review: REQUEST_CHANGES
CI is green (run #1678, 2m39s ✅). All acceptance criteria from issue #26 are met. One correctness bug in the rollback step needs fixing before merge.
🐛 Bug — rollback package DELETE will silently fail with a 403
File:
.forgejo/workflows/release.yml—Rollback on smoke-test failurestepThe rollback deletes the bad image from the registry using
FORGEJO_TOKEN:But
docs/runner-setup.mdsaysFORGEJO_TOKENonly needswrite:repositoryscope. The Forgejo packages DELETE endpoint (/api/v1/packages/…) requireswrite:packagescope — which is whatREGISTRY_TOKENhas per the same doc. A token with onlywrite:repositorywill get a 403, which thecasestatement treats as*(non-fatal): it logs "non-fatal" and continues to patch the release. The image stays in the registry. The rollback's primary safety measure — preventing operators from pulling a busted image — silently fails.Fix: add
REGISTRY_TOKENto the rollback step'senv:and use it for the DELETE calls:Keep
FORGEJO_TOKENfor the release PATCH call (which needswrite:repository). This preserves least-privilege for both tokens.📝 Minor typo (non-blocking, fix while you're in there)
File:
.forgejo/workflows/release.yml— comment in theBuild and pushstep:Should be
immediately.✅ Acceptance criteria checklist
publish-imagegates on[docker, release]dockerdocs/runner-setup.md— runner registration,container.optionssnippet, security rationalev0.0.0-rc1documenteddocs/runner-setup.mdbun,git,forgejo-mcp,claudesmoke tests after pushqa.ymluntouchedReview: REQUEST_CHANGES
CI is green (run #1678, 2m39s ✅). All acceptance criteria from issue #26 are met. One correctness bug in the rollback step needs fixing before merge.
🐛 Bug — rollback package DELETE will silently fail with a 403
File:
.forgejo/workflows/release.yml—Rollback on smoke-test failurestepThe rollback deletes the bad image from the registry using
FORGEJO_TOKEN:But
docs/runner-setup.mdsaysFORGEJO_TOKENonly needswrite:repositoryscope. The Forgejo packages DELETE endpoint (/api/v1/packages/…) requireswrite:packagescope — which is whatREGISTRY_TOKENhas per the same doc. A token with onlywrite:repositorywill get a 403, which thecasestatement treats as*(non-fatal): it logs "non-fatal" and continues to patch the release. The image stays in the registry. The rollback's primary safety measure — preventing operators from pulling a busted image — silently fails.Fix: add
REGISTRY_TOKENto the rollback step'senv:and use it for the DELETE calls:Keep
FORGEJO_TOKENfor the release PATCH calls (which needwrite:repository). This preserves least-privilege for both tokens.📝 Minor typo (non-blocking, fix while you're in there)
File:
.forgejo/workflows/release.yml— comment in theBuild and pushstep:Should be
immediately.✅ Acceptance criteria checklist
publish-imagegates on[docker, release]dockerdocs/runner-setup.md— runner registration,container.optionssnippet, security rationalev0.0.0-rc1documenteddocs/runner-setup.mdbun,git,forgejo-mcp,claudesmoke tests after pushqa.ymluntouchedManual push on
boss/26(d7f91db) applying R5's token-scope fix — boss was stuck in address-review retries on a stale worktree and the fix is small enough that pushing directly was faster than unsticking the loop.REGISTRY_TOKENadded to the rollback step's env; package-DELETE swapped fromFORGEJO_TOKEN(write:repository) toREGISTRY_TOKEN(write:package). Release PATCH calls keepFORGEJO_TOKEN. Least-privilege preserved.immedialtelytypo was already fixed on the branch before this commit (R5 must have been reading a stale view).Re-review when CI goes green.
Review — PR #115:
feat(release): release-only runner label + post-push rollbackCI: ✅ green on head commit
d7f91db(run #1679, 2m54s).Acceptance criteria (issue #26) — all met
docs/runner-setup.md§3 —container.options: -v /var/run/docker.sock:/var/run/docker.sockpublish-imagegated on[docker, release]; all other jobs stay on plaindockerdocs/runner-setup.md+ README "Release pipeline" sectionbun --version,git --version,forgejo-mcp --help,claude --version(forgejo-mcp --help; :)pattern to tolerate non-zero--helpexit/home/claude/.claude.json,.credentials.json,.config/claude-hooks,/root/.claude.json,/root/.credentials.json:${VERSION}+:latestfrom registry; flips release toprerelease=truewith DO NOT USE bannerqa.ymlstaticdockerfilejob untoucheddocs/runner-setup.md§5What looks good
publish-imagerequires bothdockerANDrelease;qa.ymlonly ever requestsdocker, so PR CI can never land on the privileged runner even if the same host runs both.REGISTRY_TOKEN(write:package) deletes the container images;FORGEJO_TOKEN(write:repository) patches the release. The comment in the rollbackenv:block explains exactly why — Packages API silently 403s onwrite:repository. The final commit fixing this is the right call.continue-on-error: trueon "Update release notes" correctly keeps the job green for a transient API blip so the rollback step stays dormant. Well-commented.if: failure() && steps.push.outcome == 'success'correctly limits rollback to the case where there is actually something to clean up.set -uin the rollback script and|| echo '000'on the DELETE curl call guard against the most likely failure modes.:latestrollback trade-off is honestly documented indocs/runner-setup.md(the "⚠️ :latest and rollback" callout).docs/runner-setup.mdis clear and actionable (socket = root on host, repo-scoped token, don't register org-wide).One non-blocking observation
release.yml, rollback step —CURRENT_BODYfetch has no error fallback.Actions runs with
-eo pipefailby default. If the Forgejo API is unreachable at this point (afterRELEASE_IDwas fetched successfully),curl -fexits non-zero,-ekills the script before the PATCH, and the release doesn't get the DO NOT USE banner. The primary safety measure — image deletion — runs earlier and IS protected with|| echo '000', so the acceptance criterion is met. Adding|| echo ''as a fallback here would make the failure mode consistent with the rest of the script. Suggestion only, not a blocker.Verdict: APPROVED. All acceptance criteria met, code is correct and well-commented, CI is green.