agents: synthesize shell_output_delta for claude-code via container log stream #957

Closed
opened 2026-05-08 12:06:30 +00:00 by claude-desktop · 6 comments
Collaborator

User story

As an operator watching a long-running shell command (e.g. bun test, just qa) inside a claude-code agent, I want to see live stdout/stderr just like I would for cursor's ShellToolCall, so the dashboard doesn't pretend the agent is idle while a 5-minute test suite is actually running.

Context

The Claude Code SDK does not expose stdout deltas for Bash invocations — the dashboard sees one event at start and one at completion. Cursor's SDK exposes ShellOutputDeltaUpdate for the same situation. We own the docker container the agent runs in, so we can synthesize equivalent delta events for claude-code by tailing the container's process tree or piping the shell wrapper's output through a sidecar.

This is invasive and should land after the cursor delta + tool-kind-taxonomy work — render path needs to exist first.

Acceptance criteria

Approach (decide during design phase, document the choice in the PR)

  • Option A — bash wrapper. The PreToolUse hook intercepts every Bash invocation, replaces the command with bash -c '<cmd>' 2> >(tee /dev/stderr | logger -t claude-hooks-shell-${call_id}) | tee /dev/stdout | logger -t claude-hooks-shell-${call_id} (or simpler: redirect to a per-call FIFO). Server tails journalctl --user -t claude-hooks-shell-${call_id} (or the FIFO) and emits shell_output_delta events with the right callId.
  • Option B — container shim binary. Replace the in-container shell wrapper with a small Bun script that streams stdout/stderr lines to a unix socket on the host, indexed by call_id. Server reads the socket.
  • Option C — docker exec sidecar. Per-task sidecar container docker exec -it claude-hooks-${agent} sh -c 'tail -f /tmp/shell-${call_id}.log', tailed by the runner.

The PR proposes one, motivates the choice on simplicity / robustness / hook compatibility, and benchmarks at least one realistic workload (e.g. bun test --watch for 30 s).

Wire format

  • Reuses the same ShellOutputDeltaEvent shape from the cursor delta-streaming issue. UI doesn't know which provider produced it.

Edge cases

  • Lines longer than the safe buffer size are split — UI re-stitches by call_id.
  • Tool calls that exit before any stdout is captured emit a single zero-byte synthetic delta + the existing completion event so the widget renders an empty pane, not a perpetually-loading spinner.
  • Aborted tasks must close the stream cleanly — no zombie tailers per cancelled call_id.

Tests

  • Synthetic shell command emitting 1k lines, asserting all lines arrive in order with the right call_id.
  • Abort test: cancel mid-stream, assert tailer goroutine/process exits.
  • Multi-task isolation: two tasks running concurrent shells must not cross-contaminate streams.

Out of scope

  • Streaming structured tool inputs (Edit, Write, etc.) for claude-code. Only Bash benefits from live output.
  • Replacing the existing extractProgress short text — that stays as the "single line of progress" complement to the live pane.

References

  • Parent: #950
  • Cursor analog: ShellOutputDeltaUpdate in delta-types.d.ts
  • Existing PreToolUse hook context: docs/plugins.md + apps/server/src/... (search for "rtk" and "PreToolUse").
## User story As an operator watching a long-running shell command (e.g. `bun test`, `just qa`) inside a claude-code agent, I want to see live stdout/stderr just like I would for cursor's `ShellToolCall`, so the dashboard doesn't pretend the agent is idle while a 5-minute test suite is actually running. ## Context The Claude Code SDK does not expose stdout deltas for `Bash` invocations — the dashboard sees one event at start and one at completion. Cursor's SDK exposes `ShellOutputDeltaUpdate` for the same situation. We own the docker container the agent runs in, so we can synthesize equivalent delta events for claude-code by tailing the container's process tree or piping the shell wrapper's output through a sidecar. This is invasive and should land *after* the cursor delta + tool-kind-taxonomy work — render path needs to exist first. ## Acceptance criteria ### Approach (decide during design phase, document the choice in the PR) - [ ] **Option A** — bash wrapper. The PreToolUse hook intercepts every `Bash` invocation, replaces the command with `bash -c '<cmd>' 2> >(tee /dev/stderr | logger -t claude-hooks-shell-${call_id}) | tee /dev/stdout | logger -t claude-hooks-shell-${call_id}` (or simpler: redirect to a per-call FIFO). Server tails `journalctl --user -t claude-hooks-shell-${call_id}` (or the FIFO) and emits `shell_output_delta` events with the right `callId`. - [ ] **Option B** — container shim binary. Replace the in-container shell wrapper with a small Bun script that streams stdout/stderr lines to a unix socket on the host, indexed by call_id. Server reads the socket. - [ ] **Option C** — docker exec sidecar. Per-task sidecar container `docker exec -it claude-hooks-${agent} sh -c 'tail -f /tmp/shell-${call_id}.log'`, tailed by the runner. The PR proposes one, motivates the choice on simplicity / robustness / hook compatibility, and benchmarks at least one realistic workload (e.g. `bun test --watch` for 30 s). ### Wire format - [ ] Reuses the same `ShellOutputDeltaEvent` shape from the cursor delta-streaming issue. UI doesn't know which provider produced it. ### Edge cases - [ ] Lines longer than the safe buffer size are split — UI re-stitches by call_id. - [ ] Tool calls that exit before any stdout is captured emit a single zero-byte synthetic delta + the existing completion event so the widget renders an empty pane, not a perpetually-loading spinner. - [ ] Aborted tasks must close the stream cleanly — no zombie tailers per cancelled call_id. ### Tests - [ ] Synthetic shell command emitting 1k lines, asserting all lines arrive in order with the right call_id. - [ ] Abort test: cancel mid-stream, assert tailer goroutine/process exits. - [ ] Multi-task isolation: two tasks running concurrent shells must not cross-contaminate streams. ## Out of scope - Streaming structured tool inputs (Edit, Write, etc.) for claude-code. Only `Bash` benefits from live output. - Replacing the existing `extractProgress` short text — that stays as the "single line of progress" complement to the live pane. ## References - Parent: #950 - Cursor analog: `ShellOutputDeltaUpdate` in `delta-types.d.ts` - Existing PreToolUse hook context: `docs/plugins.md` + `apps/server/src/...` (search for "rtk" and "PreToolUse").
Collaborator

🤖 Auto-assigned to code-lead (heuristic: area:agents → code-lead (architecture-touching)). Reply /unassign to reroute.

🤖 Auto-assigned to **code-lead** (heuristic: area:agents → code-lead (architecture-touching)). Reply `/unassign` to reroute.
Collaborator

🧹 janitor: this ticket has been idle-assigned since 2026-05-08T15:36:12.000Z. Re-dispatching.

🧹 janitor: this ticket has been idle-assigned since 2026-05-08T15:36:12.000Z. Re-dispatching.
Collaborator

🦵 @charles kicked the queue — re-running implement on @code-lead.

🦵 @charles kicked the queue — re-running implement on @code-lead.
Collaborator

🦵 @charles kicked the queue — re-running implement on @code-lead.

🦵 @charles kicked the queue — re-running implement on @code-lead.
Collaborator

🦵 @charles kicked the queue — re-running implement on @code-lead.

🦵 @charles kicked the queue — re-running implement on @code-lead.
Collaborator

⚠️ duplicate dispatch — issue was closed by PR #994 (merged at 2026-05-08T19:39:01Z) while a parallel code-lead/957 implementation was in progress.

A fully tested implementation landed locally on branch code-lead/957 (b904196 feat(agents): synthesise shell_output_delta for claude-code via container tail) covering all acceptance criteria — Option A bash wrapper via PreToolUse, ShellDeltaSource abstraction with MemoryShellDeltaSource test double + DockerExecShellDeltaSource production, per-agent shell-tee-hook.py rendered by agent-env-sync/render-for-instance.ts, integration into SdkClaudeAgent.runTask, 50 passing tests covering 1k-line ordering, abort cleanup, multi-call isolation, and the python hook's safety guards.

Not pushing the branch — PR #994 already shipped the same surface (different layout: single /tmp/shell-stream/stream.log + base64 frames + ShellStreamTailer vs. per-call /state/shell-deltas/<callId>.{stdout,stderr} + python hook + ShellDeltaSource). Leaving the local commit intact for reference; operator can git diff between branches if a comparative review of the two designs is useful.

Reporting and stopping.

⚠️ duplicate dispatch — issue was closed by PR #994 (merged at 2026-05-08T19:39:01Z) while a parallel `code-lead/957` implementation was in progress. A fully tested implementation landed locally on branch `code-lead/957` (`b904196 feat(agents): synthesise shell_output_delta for claude-code via container tail`) covering all acceptance criteria — Option A bash wrapper via PreToolUse, `ShellDeltaSource` abstraction with `MemoryShellDeltaSource` test double + `DockerExecShellDeltaSource` production, per-agent `shell-tee-hook.py` rendered by `agent-env-sync/render-for-instance.ts`, integration into `SdkClaudeAgent.runTask`, 50 passing tests covering 1k-line ordering, abort cleanup, multi-call isolation, and the python hook's safety guards. Not pushing the branch — PR #994 already shipped the same surface (different layout: single `/tmp/shell-stream/stream.log` + base64 frames + `ShellStreamTailer` vs. per-call `/state/shell-deltas/<callId>.{stdout,stderr}` + python hook + `ShellDeltaSource`). Leaving the local commit intact for reference; operator can `git diff` between branches if a comparative review of the two designs is useful. Reporting and stopping.
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
charles/claude-hooks#957
No description provided.