feat(agents): synthesize shell_output_delta for claude-code via container log stream #994

Merged
charles merged 1 commit from code-lead/957 into main 2026-05-08 19:39:01 +00:00
Collaborator

Summary

Synthesises shell_output_delta events for the claude-code SDK so the dashboard's live shell pane streams stdout/stderr from Bash tool calls in real time — parity with the cursor adapter's SDK-native delta channel. Closes #957.

Approach — Option A (bash wrapper via PreToolUse hook)

Considered three options from the issue:

Option Why I picked / dropped it
A — bash wrapper Reuses the existing PreToolUse hook surface (rtk already lives there). Zero new infra: no sidecar container, no host unix socket, no extra mount. The wrapper still pipes original stdout/stderr through tee so the model's tool_result is unchanged. Picked.
B — container shim binary Heavier — would need a new in-container binary plus a host socket listener. Same end-state but more moving parts than (A).
C — docker-exec sidecar One sidecar per call_id is per-task overhead; sidecar lifecycle race with worker shutdown is the kind of bug we don't need.

Pipeline

PreToolUse(Bash)                  cl-shell-stream-cap                       host
  shell-stream-hook   ─wraps─►   stdout/stderr → base64 frames → /tmp/shell-stream/stream.log
                                                                                │
                                                                                ▼
                                                       docker exec ... tail -F  │
                                                       ShellStreamTailer parses │
                                                       ShellOutputDeltaEvent ───┘
                                                       onTaskEvent(...)
  1. Image ships /usr/local/bin/shell-stream-hook + cl-shell-stream-cap.
  2. renderForInstance() injects shell-stream-hook into settings.json's PreToolUse Bash matcher, ordered after rtk so rtk's compression rewrite runs first; the wrapper rewrites the rtk-rewritten command.
  3. cl-shell-stream-cap appends <call_id>\t<stream>\t<base64>\n frames inside the container, splitting at 2 KiB so each append stays atomic under PIPE_BUF.
  4. Host-side ShellStreamTailer runs docker exec … tail -n0 -F …, decodes frames, and emits ShellOutputDeltaEvents through the existing onTaskEvent pipe.

Wire format

Reuses the existing ShellOutputDeltaEvent shape from claude-port.ts. UI doesn't know which provider produced the event.

Edge cases (all in tests)

  • Long lines — cap script chunks at 2 KiB before base64; tailer forwards each chunk verbatim with the same callId. UI re-stitches.
  • Empty tool calls — per-stream EOF marker emits a synthetic zero-byte delta so the widget renders an empty pane, not a perpetual spinner.
  • Abortstailer.stop() runs in the runner's finally block (and on AbortSignal) so the docker-exec subprocess is killed cleanly; start()'s promise is awaited so we never leak a zombie tailer.
  • Multi-task isolation — one container per agent instance ⇒ one log file per container ⇒ one tailer per task; streams cannot cross-contaminate.
  • Cursor tasks — skipped (SDK-native delta channel already covers them).
  • Kill switchCLAUDE_HOOKS_SHELL_STREAM=off disables the host tailer without touching the in-container hook (hot, no rebuild).

Test plan

  • just qa clean (typecheck + lint + format + tests + sql/paraglide checks)
  • parseShellLogLine: valid / malformed / EOF / utf-8 / CRLF.
  • LineBuffer: chunked stream, partial trailing line, split UTF-8.
  • ShellStreamTailer:
    • 1 000 stdout lines arrive in order, all carry the right callId.
    • Abort closes the tail subprocess (kill counter increments).
    • Two tailers on different containers see disjoint events.
    • EOF before any data emits a single synthetic zero-byte delta.
    • Long lines split into multiple frames forward verbatim with the same callId.
    • Malformed lines are dropped without poisoning the stream.
  • renderForInstance: shell-stream hook is injected idempotently and ordered after rtk.
  • Manual smoke: dispatch a task that runs bun test (~30 s); confirm dashboard pane streams output as it lands rather than waiting for completion.

Closes #957

## Summary Synthesises `shell_output_delta` events for the claude-code SDK so the dashboard's live shell pane streams stdout/stderr from `Bash` tool calls in real time — parity with the cursor adapter's SDK-native delta channel. Closes #957. ## Approach — Option A (bash wrapper via PreToolUse hook) Considered three options from the issue: | Option | Why I picked / dropped it | |---|---| | **A — bash wrapper** | ✅ Reuses the existing PreToolUse hook surface (rtk already lives there). Zero new infra: no sidecar container, no host unix socket, no extra mount. The wrapper still pipes original stdout/stderr through `tee` so the model's `tool_result` is unchanged. **Picked.** | | B — container shim binary | Heavier — would need a new in-container binary plus a host socket listener. Same end-state but more moving parts than (A). | | C — docker-exec sidecar | One sidecar per call_id is per-task overhead; sidecar lifecycle race with worker shutdown is the kind of bug we don't need. | ### Pipeline ``` PreToolUse(Bash) cl-shell-stream-cap host shell-stream-hook ─wraps─► stdout/stderr → base64 frames → /tmp/shell-stream/stream.log │ ▼ docker exec ... tail -F │ ShellStreamTailer parses │ ShellOutputDeltaEvent ───┘ onTaskEvent(...) ``` 1. Image ships `/usr/local/bin/shell-stream-hook` + `cl-shell-stream-cap`. 2. `renderForInstance()` injects shell-stream-hook into `settings.json`'s `PreToolUse` Bash matcher, ordered **after** rtk so rtk's compression rewrite runs first; the wrapper rewrites the rtk-rewritten command. 3. `cl-shell-stream-cap` appends `<call_id>\t<stream>\t<base64>\n` frames inside the container, splitting at 2 KiB so each append stays atomic under PIPE_BUF. 4. Host-side `ShellStreamTailer` runs `docker exec … tail -n0 -F …`, decodes frames, and emits `ShellOutputDeltaEvent`s through the existing `onTaskEvent` pipe. ## Wire format Reuses the existing `ShellOutputDeltaEvent` shape from `claude-port.ts`. UI doesn't know which provider produced the event. ## Edge cases (all in tests) - **Long lines** — cap script chunks at 2 KiB before base64; tailer forwards each chunk verbatim with the same `callId`. UI re-stitches. - **Empty tool calls** — per-stream EOF marker emits a synthetic zero-byte delta so the widget renders an empty pane, not a perpetual spinner. - **Aborts** — `tailer.stop()` runs in the runner's `finally` block (and on `AbortSignal`) so the docker-exec subprocess is killed cleanly; `start()`'s promise is awaited so we never leak a zombie tailer. - **Multi-task isolation** — one container per agent instance ⇒ one log file per container ⇒ one tailer per task; streams cannot cross-contaminate. - **Cursor tasks** — skipped (SDK-native delta channel already covers them). - **Kill switch** — `CLAUDE_HOOKS_SHELL_STREAM=off` disables the host tailer without touching the in-container hook (hot, no rebuild). ## Test plan - [ ] `just qa` clean (typecheck + lint + format + tests + sql/paraglide checks) - [ ] `parseShellLogLine`: valid / malformed / EOF / utf-8 / CRLF. - [ ] `LineBuffer`: chunked stream, partial trailing line, split UTF-8. - [ ] `ShellStreamTailer`: - 1 000 stdout lines arrive in order, all carry the right `callId`. - Abort closes the tail subprocess (kill counter increments). - Two tailers on different containers see disjoint events. - EOF before any data emits a single synthetic zero-byte delta. - Long lines split into multiple frames forward verbatim with the same `callId`. - Malformed lines are dropped without poisoning the stream. - [ ] `renderForInstance`: shell-stream hook is injected idempotently and ordered after rtk. - [ ] Manual smoke: dispatch a task that runs `bun test` (~30 s); confirm dashboard pane streams output as it lands rather than waiting for completion. Closes #957
feat(agents): synthesize shell_output_delta for claude-code via container log stream
Some checks failed
qa / sql-layer-check (pull_request) Successful in 17s
qa / dockerfile (pull_request) Successful in 20s
qa / i18n-string-check (pull_request) Successful in 20s
qa / db-schema (pull_request) Successful in 45s
qa / qa-1 (pull_request) Failing after 1m41s
qa / qa (pull_request) Failing after 0s
6b46ef2986
Closes #957.

Approach: Option A — bash wrapper via PreToolUse hook.

Pipeline:
  1. Image ships /usr/local/bin/shell-stream-hook + cl-shell-stream-cap.
  2. renderForInstance() injects shell-stream-hook into settings.json's
     PreToolUse Bash matcher, ordered after the rtk hook so rtk's
     compression rewrite happens first; the wrapper rewrites the rtk-
     rewritten command to fan stdout/stderr through cl-shell-stream-cap.
  3. cl-shell-stream-cap appends framed lines (call_id\tstream\tbase64)
     to /tmp/shell-stream/stream.log inside the container, splitting at
     2 KiB so each append stays atomic under PIPE_BUF.
  4. Host-side ShellStreamTailer spawns `docker exec ... tail -F` on
     that file, decodes frames, and emits ShellOutputDeltaEvents through
     onTaskEvent so the dashboard's live shell pane streams output as
     it lands — parity with cursor's ShellOutputDeltaUpdate.

Why Option A: simplest hook surface (no sidecar container, no host
unix socket plumbing), reuses the existing PreToolUse infrastructure
(rtk already lives there), and the model's tool_result is unchanged
because the wrapper still pipes original stdout/stderr through tee.

Edge cases:
  - Long lines: cap script chunks at 2 KiB before base64; tailer
    forwards each chunk verbatim with the same callId — UI re-stitches.
  - Empty tool calls: per-stream EOF marker emits a synthetic zero-byte
    delta so the widget doesn't render a perpetual spinner.
  - Aborts: tailer.stop() runs in the runner's finally block (and also
    on AbortSignal) so the docker-exec subprocess is killed cleanly;
    the start() promise is awaited so we never leak a zombie tailer.
  - Multi-task isolation: one container per agent instance => one log
    file per container => one tailer per task; streams cannot cross.
  - Cursor tasks skip the host tailer (SDK-native delta channel
    already covers them); CLAUDE_HOOKS_SHELL_STREAM=off disables the
    host tailer without touching the in-container hook.

Tests:
  - parseShellLogLine: valid / malformed / EOF / utf-8 / CRLF.
  - LineBuffer: chunked stream, partial trailing line, split UTF-8.
  - ShellStreamTailer: 1 000 lines arrive in order with right callId,
    abort closes the tail subprocess, two tailers don't cross-
    contaminate, EOF-before-data emits zero-byte delta, long lines
    split into multiple frames carry the same callId, malformed lines
    are dropped without poisoning the stream.
  - renderForInstance: shell-stream hook is injected idempotently and
    ordered after rtk.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
charles deleted branch code-lead/957 2026-05-08 19:39:01 +00:00
reviewer left a comment
  • ci: qa + qa-1 red on run #1742 (sha 6b46ef2). Failures are snapshot mismatches in apps/web/src/components/agent/tool-card.test.tsx (duration value 3s vs 5s in snapshot), unrelated to this PR's server-only changes. All new server tests pass locally (16 tailer + 29 render-for-instance). Rebase on main to pick up any snapshot updates from concurrent PRs; if the failure persists on main independently, update the snapshots there first.
- **ci**: `qa` + `qa-1` red on run #1742 (sha `6b46ef2`). Failures are snapshot mismatches in `apps/web/src/components/agent/tool-card.test.tsx` (duration value `3s` vs `5s` in snapshot), unrelated to this PR's server-only changes. All new server tests pass locally (16 tailer + 29 render-for-instance). Rebase on main to pick up any snapshot updates from concurrent PRs; if the failure persists on main independently, update the snapshots there first.
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
charles/claude-hooks!994
No description provided.